Unresponsive system and 100% hypervisor CPU usage due to misaligned PMD CPU pinning with OVS DPDK in Red Hat OpenStack Platform 10

Solution In Progress - Updated -

Issue

The host OS is unresponsive and the CPUs which should be reserved for the hypervisor show 100% CPU usage in user space.

From a look at the CPU affinity for Open vSwitch's PMD threads, one can see that the affinity overlaps with the cores which should be reserved for the host OS:

[root@overcloud-compute-2 ~]# grep Aff /etc/systemd/system.conf
#CPUAffinity=1 2
CPUAffinity=0 2 4 6 56 58 60 62
[root@overcloud-compute-2 ~]# ps -Tp `pidof ovs-vswitchd` | grep pmd | awk '{print $2}' | xargs -I {} taskset -c -p {}
pid 7421's current affinity list: 0,2,4,6,56,58,60,62
pid 7422's current affinity list: 0,2,4,6,56,58,60,62
pid 7423's current affinity list: 0,2,4,6,56,58,60,62
pid 7424's current affinity list: 0,2,4,6,56,58,60,62
pid 7425's current affinity list: 0,2,4,6,56,58,60,62
pid 7426's current affinity list: 0,2,4,6,56,58,60,62
pid 7427's current affinity list: 0,2,4,6,56,58,60,62
pid 7428's current affinity list: 0,2,4,6,56,58,60,62
pid 7429's current affinity list: 0,2,4,6,56,58,60,62
pid 7430's current affinity list: 0,2,4,6,56,58,60,62
pid 7431's current affinity list: 0,2,4,6,56,58,60,62
pid 7432's current affinity list: 0,2,4,6,56,58,60,62
pid 7433's current affinity list: 0,2,4,6,56,58,60,62
pid 7434's current affinity list: 0,2,4,6,56,58,60,62
pid 7435's current affinity list: 0,2,4,6,56,58,60,62
pid 7436's current affinity list: 0,2,4,6,56,58,60,62
pid 7437's current affinity list: 0,2,4,6,56,58,60,62
pid 7438's current affinity list: 0,2,4,6,56,58,60,62
pid 7439's current affinity list: 0,2,4,6,56,58,60,62
pid 7440's current affinity list: 0,2,4,6,56,58,60,62
pid 7441's current affinity list: 0,2,4,6,56,58,60,62
pid 7442's current affinity list: 0,2,4,6,56,58,60,62
pid 7443's current affinity list: 0,2,4,6,56,58,60,62
pid 7444's current affinity list: 0,2,4,6,56,58,60,62

Symptoms may be unresponsive commands such as lsof or ps. In a customer environment, the neutron-openvswitch-agent showed the following error message:

2018-08-03 22:49:22.073 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Running command: ['ps', '--ppid', '11919', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2018-08-03 22:49:23.728 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Exit code: 0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:150
2018-08-03 22:49:23.729 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Running command: ['ps', '--ppid', '11921', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2018-08-03 22:49:25.239 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Running command: ['ps', '--ppid', '11919', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2018-08-03 22:49:25.385 11877 CRITICAL neutron [-] Timeout: 5 seconds
2018-08-03 22:49:25.385 11877 ERROR neutron Traceback (most recent call last):
2018-08-03 22:49:25.385 11877 ERROR neutron   File "/usr/bin/neutron-openvswitch-agent", line 10, in 
2018-08-03 22:49:25.385 11877 ERROR neutron     sys.exit(main())
2018-08-03 22:49:25.385 11877 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main
2018-08-03 22:49:25.385 11877 ERROR neutron     agent_main.main()
2018-08-03 22:49:25.385 11877 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 51, in main
2018-08-03 22:49:25.385 11877 ERROR neutron     mod.main()
2018-08-03 22:49:25.385 11877 ERROR neutron   File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main
2018-08-03 22:49:25.385 11877 ERROR neutron     'neutron.plugins.ml2.drivers.openvswitch.agent.'
2018-08-03 22:49:25.385 11877 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 375, in run_apps
2018-08-03 22:49:25.385 11877 ERROR neutron     hub.joinall(services)
2018-08-03 22:49:25.385 11877 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 97, in joinall
2018-08-03 22:49:25.385 11877 ERROR neutron     t.wait()
2018-08-03 22:49:25.385 11877 ERROR neutron   File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait
2018-08-03 22:49:25.385 11877 ERROR neutron     return self._exit_event.wait()
2018-08-03 22:49:25.385 11877 ERROR neutron   File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 125, in wait
2018-08-03 22:49:25.385 11877 ERROR neutron     current.throw(*self._exc)
2018-08-03 22:49:25.385 11877 ERROR neutron   File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
2018-08-03 22:49:25.385 11877 ERROR neutron     result = function(*args, **kwargs)
2018-08-03 22:49:25.385 11877 ERROR neutron   File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 59, in _launch
2018-08-03 22:49:25.385 11877 ERROR neutron     raise e
2018-08-03 22:49:25.385 11877 ERROR neutron Timeout: 5 seconds
2018-08-03 22:49:25.385 11877 ERROR neutron
2018-08-03 22:49:25.502 11877 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=11906

And neutron agents flapped in neutron agent-list:

| <UUID> | Open vSwitch agent | overcloud-compute-0.localdomain |                   | xxx   | True           | neutron-openvswitch-agent |
| <UUID> | Open vSwitch agent | overcloud-compute-0.localdomain |                   | :-)   | True           | neutron-openvswitch-agent |

Environment

Red Hat OpenStack Platform 10

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content