Following a fresh deployment of RHOSP 17.1.2 with ovs-dpdk, during workload testing, VNF vms under load testing are crashing and causing openvswitch restart on the hypervisor.
Issue
-
Following a fresh deployment of RHOSP 17.1.2 with ovs-dpdk, during workload testing, VNF vms under load testing (pushing traffic to PMD threads) are crashing and causing openvswitch restart on the ovs-dpdk hypervisor.
-
We see ovs-vswitchd coredumping in
/var/log/messages
:
Apr 4 11:33:08 overcloud-computedpdk-0 systemd-coredump[612204]: Process 4227 (ovs-vswitchd) of user 989 dumped core.
Stack trace of thread 4779:
#0 0x00007fdb80aa154c __pthread_kill_implementation (libc.so.6 + 0xa154c)
#1 0x00007fdb80a54d46 raise (libc.so.6 + 0x54d46)
#2 0x00007fdb80a287f3 abort (libc.so.6 + 0x287f3)
#3 0x00007fdb80a29130 __libc_message.cold (libc.so.6 + 0x29130)
#4 0x00007fdb80aab617 malloc_printerr (libc.so.6 + 0xab617)
#5 0x00007fdb80aacf23 _int_free (libc.so.6 + 0xacf23)
#6 0x00007fdb80aaf955 free (libc.so.6 + 0xaf955)
#7 0x00005574291ef211 dp_netdev_del_pmd.lto_priv.0 (/usr/sbin/ovs-vswitchd (deleted) + 0x9bd211)
Stack trace of thread 4228:
#0 0x00007fdb80b4e83e epoll_wait (libc.so.6 + 0x14e83e)
#1 0x0000557429149a84 eal_memalloc_free_seg_bulk (/usr/sbin/ovs-vswitchd (deleted) + 0x917a84)
#2 0x0000557400000000 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Apr 4 11:33:09 overcloud-computedpdk-0 systemd[1]: systemd-coredump@194-612202-0.service: Deactivated successfully.
Apr 4 11:33:09 overcloud-computedpdk-0 systemd[1]: systemd-coredump@194-612202-0.service: Consumed 5.615s CPU time.
Apr 4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=dumped, status=6/ABRT
Apr 4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'core-dump'.
Apr 4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 1y 5month 3w 2d 10h 14min 46.265s CPU time.
Apr 4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 1.
--
Apr 4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=dumped, status=6/ABRT
Apr 4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'core-dump'.
Apr 4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 1y 5month 3w 2d 10h 14min 46.265s CPU time.
Apr 4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 1.
Apr 4 11:33:10 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
--
Apr 4 14:02:18 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=killed, status=6/ABRT
Apr 4 14:02:18 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'signal'.
Apr 4 14:02:18 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 1d 15h 52min 35.831s CPU time.
Apr 4 14:02:18 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 2.
Apr 4 14:02:18 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
--
Apr 4 14:02:55 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=killed, status=6/ABRT
Apr 4 14:02:55 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'signal'.
Apr 4 14:02:55 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 6min 13.820s CPU time.
Apr 4 14:02:55 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 3.
Apr 4 14:02:55 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
--
Apr 4 15:09:03 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=killed, status=6/ABRT
Apr 4 15:09:03 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'signal'.
Apr 4 15:09:03 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 17h 38min 26.786s CPU time.
Apr 4 15:09:03 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 4.
Apr 4 15:09:03 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
--
Apr 4 15:10:07 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=killed, status=6/ABRT
Apr 4 15:10:07 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'signal'.
Apr 4 15:10:07 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 13min 36.214s CPU time.
Apr 4 15:10:08 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 5.
Apr 4 15:10:08 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
--
Apr 4 15:34:05 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=killed, status=11/SEGV
Apr 4 15:34:05 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'signal'.
Apr 4 15:34:05 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 6h 21min 4.379s CPU time.
Apr 4 15:34:05 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 6.
Apr 4 15:34:05 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
- The above stack trace is invalid due to the fact that the running ovs-vswitchd version was an older one after installing openvswitch3.1-test to install ovs-tcpdump. This is the real stack trace:
#0 0x00007fdb80aa154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007fdb80a54d46 in raise () from /lib64/libc.so.6
#2 0x00007fdb80a287f3 in abort () from /lib64/libc.so.6
#3 0x00007fdb80a29130 in __libc_message.cold () from /lib64/libc.so.6
#4 0x00007fdb80aab617 in malloc_printerr () from /lib64/libc.so.6
#5 0x00007fdb80aacf23 in _int_free () from /lib64/libc.so.6
#6 0x00007fdb80aaf955 in free () from /lib64/libc.so.6
#7 0x00005574291ef211 in dp_packet_uninit (b=0x7fdb0b005df0) at ../lib/dp-packet.h:581
#8 dp_packet_delete (b=0x7fdb0b005df0) at ../lib/dp-packet.h:259
#9 dp_packet_delete (b=0x7fdb0b005df0) at ../lib/dp-packet.h:246
#10 dp_packet_delete_batch (should_steal=<optimized out>, batch=<optimized out>) at ../lib/dp-packet.h:874
#11 dp_packet_delete_batch (should_steal=<optimized out>, batch=<optimized out>) at ../lib/dp-packet.h:868
#12 dp_execute_cb (aux_=aux_@entry=0x7fdb34e0ba60, packets_=packets_@entry=0x7fdb34e0b390, a=a@entry=0x7fdb0b137a14, should_steal=should_steal@entry=true) at ../lib/dpif-netdev.c:9214
#13 0x0000557429232e87 in odp_execute_actions (dp=<optimized out>, batch=0x7fdb34e0b390, steal=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, dp_execute_action=<optimized out>) at ../lib/odp-execute.c:997
#14 0x00005574291ed401 in dp_netdev_execute_actions (actions_len=<optimized out>, actions=<optimized out>, flow=<optimized out>, should_steal=true, packets=0x7fdb34e0b390, pmd=0x7fdb34e10010) at ../lib/dpif-netdev.c:9225
#15 packet_batch_per_flow_execute (pmd=0x7fdb34e10010, batch=0x7fdb34e0b380) at ../lib/dpif-netdev.c:7999
#16 dp_netdev_input__ (pmd=pmd@entry=0x7fdb34e10010, packets=packets@entry=0x7fdb34e0ca30, md_is_valid=md_is_valid@entry=true, port_no=port_no@entry=0) at ../lib/dpif-netdev.c:8627
#17 0x00005574291efdf0 in dp_netdev_recirculate (packets=0x7fdb34e0ca30, pmd=0x7fdb34e10010) at ../lib/dpif-netdev.c:8644
#18 dp_execute_cb (aux_=aux_@entry=0x7fdb34e0d5a0, packets_=<optimized out>, packets_@entry=0x7fdb34e0ca30, a=a@entry=0x7fdb0a5637e0, should_steal=should_steal@entry=true) at ../lib/dpif-netdev.c:9039
#19 0x0000557429232e87 in odp_execute_actions (dp=<optimized out>, batch=0x7fdb34e0ca30, steal=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, dp_execute_action=<optimized out>) at ../lib/odp-execute.c:997
#20 0x00005574291ed401 in dp_netdev_execute_actions (actions_len=<optimized out>, actions=<optimized out>, flow=<optimized out>, should_steal=true, packets=0x7fdb34e0ca30, pmd=0x7fdb34e10010) at ../lib/dpif-netdev.c:9225
#21 packet_batch_per_flow_execute (pmd=0x7fdb34e10010, batch=0x7fdb34e0ca20) at ../lib/dpif-netdev.c:7999
#22 dp_netdev_input__ (pmd=<optimized out>, packets=<optimized out>, md_is_valid=md_is_valid@entry=false, port_no=<optimized out>) at ../lib/dpif-netdev.c:8627
#23 0x00005574291eed41 in dp_netdev_input (pmd=<optimized out>, packets=<optimized out>, port_no=<optimized out>) at ../lib/dpif-netdev.c:8636
#24 0x00005574291ede43 in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fdb34e10010, rxq=0x557432e61500, port_no=29) at ../lib/dpif-netdev.c:5419
#25 0x00005574291ee371 in pmd_thread_main (f_=<optimized out>) at ../lib/dpif-netdev.c:7053
#26 0x000055742929cd33 in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:423
#27 0x00007fdb80a9f802 in start_thread () from /lib64/libc.so.6
#28 0x00007fdb80a3f450 in clone3 () from /lib64/libc.so.6
Environment
- Red Hat OpenStack Platform 17.1 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.