Unresponsive system and 100% hypervisor CPU usage due to misaligned PMD CPU pinning with OVS DPDK in Red Hat OpenStack Platform 10
Issue
The host OS is unresponsive and the CPUs which should be reserved for the hypervisor show 100% CPU usage in user space.
From a look at the CPU affinity for Open vSwitch's PMD threads, one can see that the affinity overlaps with the cores which should be reserved for the host OS:
[root@overcloud-compute-2 ~]# grep Aff /etc/systemd/system.conf
#CPUAffinity=1 2
CPUAffinity=0 2 4 6 56 58 60 62
[root@overcloud-compute-2 ~]# ps -Tp `pidof ovs-vswitchd` | grep pmd | awk '{print $2}' | xargs -I {} taskset -c -p {}
pid 7421's current affinity list: 0,2,4,6,56,58,60,62
pid 7422's current affinity list: 0,2,4,6,56,58,60,62
pid 7423's current affinity list: 0,2,4,6,56,58,60,62
pid 7424's current affinity list: 0,2,4,6,56,58,60,62
pid 7425's current affinity list: 0,2,4,6,56,58,60,62
pid 7426's current affinity list: 0,2,4,6,56,58,60,62
pid 7427's current affinity list: 0,2,4,6,56,58,60,62
pid 7428's current affinity list: 0,2,4,6,56,58,60,62
pid 7429's current affinity list: 0,2,4,6,56,58,60,62
pid 7430's current affinity list: 0,2,4,6,56,58,60,62
pid 7431's current affinity list: 0,2,4,6,56,58,60,62
pid 7432's current affinity list: 0,2,4,6,56,58,60,62
pid 7433's current affinity list: 0,2,4,6,56,58,60,62
pid 7434's current affinity list: 0,2,4,6,56,58,60,62
pid 7435's current affinity list: 0,2,4,6,56,58,60,62
pid 7436's current affinity list: 0,2,4,6,56,58,60,62
pid 7437's current affinity list: 0,2,4,6,56,58,60,62
pid 7438's current affinity list: 0,2,4,6,56,58,60,62
pid 7439's current affinity list: 0,2,4,6,56,58,60,62
pid 7440's current affinity list: 0,2,4,6,56,58,60,62
pid 7441's current affinity list: 0,2,4,6,56,58,60,62
pid 7442's current affinity list: 0,2,4,6,56,58,60,62
pid 7443's current affinity list: 0,2,4,6,56,58,60,62
pid 7444's current affinity list: 0,2,4,6,56,58,60,62
Symptoms may be unresponsive commands such as lsof
or ps
. In a customer environment, the neutron-openvswitch-agent
showed the following error message:
2018-08-03 22:49:22.073 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Running command: ['ps', '--ppid', '11919', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2018-08-03 22:49:23.728 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Exit code: 0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:150
2018-08-03 22:49:23.729 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Running command: ['ps', '--ppid', '11921', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2018-08-03 22:49:25.239 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Running command: ['ps', '--ppid', '11919', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2018-08-03 22:49:25.385 11877 CRITICAL neutron [-] Timeout: 5 seconds
2018-08-03 22:49:25.385 11877 ERROR neutron Traceback (most recent call last):
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/bin/neutron-openvswitch-agent", line 10, in
2018-08-03 22:49:25.385 11877 ERROR neutron sys.exit(main())
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main
2018-08-03 22:49:25.385 11877 ERROR neutron agent_main.main()
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 51, in main
2018-08-03 22:49:25.385 11877 ERROR neutron mod.main()
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main
2018-08-03 22:49:25.385 11877 ERROR neutron 'neutron.plugins.ml2.drivers.openvswitch.agent.'
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 375, in run_apps
2018-08-03 22:49:25.385 11877 ERROR neutron hub.joinall(services)
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 97, in joinall
2018-08-03 22:49:25.385 11877 ERROR neutron t.wait()
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait
2018-08-03 22:49:25.385 11877 ERROR neutron return self._exit_event.wait()
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 125, in wait
2018-08-03 22:49:25.385 11877 ERROR neutron current.throw(*self._exc)
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
2018-08-03 22:49:25.385 11877 ERROR neutron result = function(*args, **kwargs)
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 59, in _launch
2018-08-03 22:49:25.385 11877 ERROR neutron raise e
2018-08-03 22:49:25.385 11877 ERROR neutron Timeout: 5 seconds
2018-08-03 22:49:25.385 11877 ERROR neutron
2018-08-03 22:49:25.502 11877 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=11906
And neutron agents flapped in neutron agent-list
:
| <UUID> | Open vSwitch agent | overcloud-compute-0.localdomain | | xxx | True | neutron-openvswitch-agent |
| <UUID> | Open vSwitch agent | overcloud-compute-0.localdomain | | :-) | True | neutron-openvswitch-agent |
Environment
Red Hat OpenStack Platform 10
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.