Red Hat OpenStack Platform 10 で、OVS DPDK と PMD CPU ピンニングの不整合によりシステムが応答しなくなり、ハイパーバイザーの CPU 使用率が 100% になる
Issue
ホスト OS が応答せず、ハイパーバイザー用に予約されているはずの CPU がユーザー空間で CPU 使用率 100% を示します。
Open vSwitch の PMD スレッドの CPU アフィニティーを見ると、アフィニティーがホスト OS 用に予約されるべきコアと重複していることがわかります。
[root@overcloud-compute-2 ~]# grep Aff /etc/systemd/system.conf
#CPUAffinity=1 2
CPUAffinity=0 2 4 6 56 58 60 62
[root@overcloud-compute-2 ~]# ps -Tp `pidof ovs-vswitchd` | grep pmd | awk '{print $2}' | xargs -I {} taskset -c -p {}
pid 7421's current affinity list: 0,2,4,6,56,58,60,62
pid 7422's current affinity list: 0,2,4,6,56,58,60,62
pid 7423's current affinity list: 0,2,4,6,56,58,60,62
pid 7424's current affinity list: 0,2,4,6,56,58,60,62
pid 7425's current affinity list: 0,2,4,6,56,58,60,62
pid 7426's current affinity list: 0,2,4,6,56,58,60,62
pid 7427's current affinity list: 0,2,4,6,56,58,60,62
pid 7428's current affinity list: 0,2,4,6,56,58,60,62
pid 7429's current affinity list: 0,2,4,6,56,58,60,62
pid 7430's current affinity list: 0,2,4,6,56,58,60,62
pid 7431's current affinity list: 0,2,4,6,56,58,60,62
pid 7432's current affinity list: 0,2,4,6,56,58,60,62
pid 7433's current affinity list: 0,2,4,6,56,58,60,62
pid 7434's current affinity list: 0,2,4,6,56,58,60,62
pid 7435's current affinity list: 0,2,4,6,56,58,60,62
pid 7436's current affinity list: 0,2,4,6,56,58,60,62
pid 7437's current affinity list: 0,2,4,6,56,58,60,62
pid 7438's current affinity list: 0,2,4,6,56,58,60,62
pid 7439's current affinity list: 0,2,4,6,56,58,60,62
pid 7440's current affinity list: 0,2,4,6,56,58,60,62
pid 7441's current affinity list: 0,2,4,6,56,58,60,62
pid 7442's current affinity list: 0,2,4,6,56,58,60,62
pid 7443's current affinity list: 0,2,4,6,56,58,60,62
pid 7444's current affinity list: 0,2,4,6,56,58,60,62
事象としては、lsof や ps などのコマンドが応答しなくなります。お客様環境では、neutron-openvswitch-agent に次のエラーメッセージが表示されます。
2018-08-03 22:49:22.073 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Running command: ['ps', '--ppid', '11919', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2018-08-03 22:49:23.728 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Exit code: 0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:150
2018-08-03 22:49:23.729 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Running command: ['ps', '--ppid', '11921', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2018-08-03 22:49:25.239 11877 DEBUG neutron.agent.linux.utils [req-7c49d003-275b-4361-97be-e4a2d7200c29 - - - - -] Running command: ['ps', '--ppid', '11919', '-o', 'pid='] create_process /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2018-08-03 22:49:25.385 11877 CRITICAL neutron [-] Timeout: 5 seconds
2018-08-03 22:49:25.385 11877 ERROR neutron Traceback (most recent call last):
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/bin/neutron-openvswitch-agent", line 10, in
2018-08-03 22:49:25.385 11877 ERROR neutron sys.exit(main())
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main
2018-08-03 22:49:25.385 11877 ERROR neutron agent_main.main()
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 51, in main
2018-08-03 22:49:25.385 11877 ERROR neutron mod.main()
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main
2018-08-03 22:49:25.385 11877 ERROR neutron 'neutron.plugins.ml2.drivers.openvswitch.agent.'
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 375, in run_apps
2018-08-03 22:49:25.385 11877 ERROR neutron hub.joinall(services)
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 97, in joinall
2018-08-03 22:49:25.385 11877 ERROR neutron t.wait()
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait
2018-08-03 22:49:25.385 11877 ERROR neutron return self._exit_event.wait()
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 125, in wait
2018-08-03 22:49:25.385 11877 ERROR neutron current.throw(*self._exc)
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
2018-08-03 22:49:25.385 11877 ERROR neutron result = function(*args, **kwargs)
2018-08-03 22:49:25.385 11877 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 59, in _launch
2018-08-03 22:49:25.385 11877 ERROR neutron raise e
2018-08-03 22:49:25.385 11877 ERROR neutron Timeout: 5 seconds
2018-08-03 22:49:25.385 11877 ERROR neutron
2018-08-03 22:49:25.502 11877 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=11906
neutron agent-list で neutron エージェントが不安定になります。
| <UUID> | Open vSwitch agent | overcloud-compute-0.localdomain | | xxx | True | neutron-openvswitch-agent |
| <UUID> | Open vSwitch agent | overcloud-compute-0.localdomain | | :-) | True | neutron-openvswitch-agent |
Environment
Red Hat OpenStack Platform 10
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.