Following a fresh deployment of RHOSP 17.1.2 with ovs-dpdk, during workload testing, VNF vms under load testing are crashing and causing openvswitch restart on the hypervisor.

Solution In Progress - Updated -

Issue

  • Following a fresh deployment of RHOSP 17.1.2 with ovs-dpdk, during workload testing, VNF vms under load testing (pushing traffic to PMD threads) are crashing and causing openvswitch restart on the ovs-dpdk hypervisor.

  • We see ovs-vswitchd coredumping in /var/log/messages:

Apr  4 11:33:08 overcloud-computedpdk-0 systemd-coredump[612204]: Process 4227 (ovs-vswitchd) of user 989 dumped core.

Stack trace of thread 4779:
#0  0x00007fdb80aa154c __pthread_kill_implementation (libc.so.6 + 0xa154c)
#1  0x00007fdb80a54d46 raise (libc.so.6 + 0x54d46)
#2  0x00007fdb80a287f3 abort (libc.so.6 + 0x287f3)
#3  0x00007fdb80a29130 __libc_message.cold (libc.so.6 + 0x29130)
#4  0x00007fdb80aab617 malloc_printerr (libc.so.6 + 0xab617)
#5  0x00007fdb80aacf23 _int_free (libc.so.6 + 0xacf23)
#6  0x00007fdb80aaf955 free (libc.so.6 + 0xaf955)
#7  0x00005574291ef211 dp_netdev_del_pmd.lto_priv.0 (/usr/sbin/ovs-vswitchd (deleted) + 0x9bd211)

Stack trace of thread 4228:
#0  0x00007fdb80b4e83e epoll_wait (libc.so.6 + 0x14e83e)
#1  0x0000557429149a84 eal_memalloc_free_seg_bulk (/usr/sbin/ovs-vswitchd (deleted) + 0x917a84)
#2  0x0000557400000000 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Apr  4 11:33:09 overcloud-computedpdk-0 systemd[1]: systemd-coredump@194-612202-0.service: Deactivated successfully.
Apr  4 11:33:09 overcloud-computedpdk-0 systemd[1]: systemd-coredump@194-612202-0.service: Consumed 5.615s CPU time.
Apr  4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=dumped, status=6/ABRT
Apr  4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'core-dump'.
Apr  4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 1y 5month 3w 2d 10h 14min 46.265s CPU time.
Apr  4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 1.
--
Apr  4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=dumped, status=6/ABRT
Apr  4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'core-dump'.
Apr  4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 1y 5month 3w 2d 10h 14min 46.265s CPU time.
Apr  4 11:33:10 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 1.
Apr  4 11:33:10 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
--
Apr  4 14:02:18 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=killed, status=6/ABRT
Apr  4 14:02:18 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'signal'.
Apr  4 14:02:18 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 1d 15h 52min 35.831s CPU time.
Apr  4 14:02:18 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 2.
Apr  4 14:02:18 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
--
Apr  4 14:02:55 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=killed, status=6/ABRT
Apr  4 14:02:55 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'signal'.
Apr  4 14:02:55 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 6min 13.820s CPU time.
Apr  4 14:02:55 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 3.
Apr  4 14:02:55 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
--
Apr  4 15:09:03 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=killed, status=6/ABRT
Apr  4 15:09:03 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'signal'.
Apr  4 15:09:03 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 17h 38min 26.786s CPU time.
Apr  4 15:09:03 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 4.
Apr  4 15:09:03 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
--
Apr  4 15:10:07 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=killed, status=6/ABRT
Apr  4 15:10:07 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'signal'.
Apr  4 15:10:07 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 13min 36.214s CPU time.
Apr  4 15:10:08 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 5.
Apr  4 15:10:08 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
--
Apr  4 15:34:05 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Main process exited, code=killed, status=11/SEGV
Apr  4 15:34:05 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Failed with result 'signal'.
Apr  4 15:34:05 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Consumed 6h 21min 4.379s CPU time.
Apr  4 15:34:05 overcloud-computedpdk-0 systemd[1]: ovs-vswitchd.service: Scheduled restart job, restart counter is at 6.
Apr  4 15:34:05 overcloud-computedpdk-0 systemd[1]: Stopping Open vSwitch...
  • The above stack trace is invalid due to the fact that the running ovs-vswitchd version was an older one after installing openvswitch3.1-test to install ovs-tcpdump. This is the real stack trace:
#0  0x00007fdb80aa154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fdb80a54d46 in raise () from /lib64/libc.so.6
#2  0x00007fdb80a287f3 in abort () from /lib64/libc.so.6
#3  0x00007fdb80a29130 in __libc_message.cold () from /lib64/libc.so.6
#4  0x00007fdb80aab617 in malloc_printerr () from /lib64/libc.so.6
#5  0x00007fdb80aacf23 in _int_free () from /lib64/libc.so.6
#6  0x00007fdb80aaf955 in free () from /lib64/libc.so.6
#7  0x00005574291ef211 in dp_packet_uninit (b=0x7fdb0b005df0) at ../lib/dp-packet.h:581
#8  dp_packet_delete (b=0x7fdb0b005df0) at ../lib/dp-packet.h:259
#9  dp_packet_delete (b=0x7fdb0b005df0) at ../lib/dp-packet.h:246
#10 dp_packet_delete_batch (should_steal=<optimized out>, batch=<optimized out>) at ../lib/dp-packet.h:874
#11 dp_packet_delete_batch (should_steal=<optimized out>, batch=<optimized out>) at ../lib/dp-packet.h:868
#12 dp_execute_cb (aux_=aux_@entry=0x7fdb34e0ba60, packets_=packets_@entry=0x7fdb34e0b390, a=a@entry=0x7fdb0b137a14, should_steal=should_steal@entry=true) at ../lib/dpif-netdev.c:9214
#13 0x0000557429232e87 in odp_execute_actions (dp=<optimized out>, batch=0x7fdb34e0b390, steal=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, dp_execute_action=<optimized out>) at ../lib/odp-execute.c:997
#14 0x00005574291ed401 in dp_netdev_execute_actions (actions_len=<optimized out>, actions=<optimized out>, flow=<optimized out>, should_steal=true, packets=0x7fdb34e0b390, pmd=0x7fdb34e10010) at ../lib/dpif-netdev.c:9225
#15 packet_batch_per_flow_execute (pmd=0x7fdb34e10010, batch=0x7fdb34e0b380) at ../lib/dpif-netdev.c:7999
#16 dp_netdev_input__ (pmd=pmd@entry=0x7fdb34e10010, packets=packets@entry=0x7fdb34e0ca30, md_is_valid=md_is_valid@entry=true, port_no=port_no@entry=0) at ../lib/dpif-netdev.c:8627
#17 0x00005574291efdf0 in dp_netdev_recirculate (packets=0x7fdb34e0ca30, pmd=0x7fdb34e10010) at ../lib/dpif-netdev.c:8644
#18 dp_execute_cb (aux_=aux_@entry=0x7fdb34e0d5a0, packets_=<optimized out>, packets_@entry=0x7fdb34e0ca30, a=a@entry=0x7fdb0a5637e0, should_steal=should_steal@entry=true) at ../lib/dpif-netdev.c:9039
#19 0x0000557429232e87 in odp_execute_actions (dp=<optimized out>, batch=0x7fdb34e0ca30, steal=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, dp_execute_action=<optimized out>) at ../lib/odp-execute.c:997
#20 0x00005574291ed401 in dp_netdev_execute_actions (actions_len=<optimized out>, actions=<optimized out>, flow=<optimized out>, should_steal=true, packets=0x7fdb34e0ca30, pmd=0x7fdb34e10010) at ../lib/dpif-netdev.c:9225
#21 packet_batch_per_flow_execute (pmd=0x7fdb34e10010, batch=0x7fdb34e0ca20) at ../lib/dpif-netdev.c:7999
#22 dp_netdev_input__ (pmd=<optimized out>, packets=<optimized out>, md_is_valid=md_is_valid@entry=false, port_no=<optimized out>) at ../lib/dpif-netdev.c:8627
#23 0x00005574291eed41 in dp_netdev_input (pmd=<optimized out>, packets=<optimized out>, port_no=<optimized out>) at ../lib/dpif-netdev.c:8636
#24 0x00005574291ede43 in dp_netdev_process_rxq_port (pmd=pmd@entry=0x7fdb34e10010, rxq=0x557432e61500, port_no=29) at ../lib/dpif-netdev.c:5419
#25 0x00005574291ee371 in pmd_thread_main (f_=<optimized out>) at ../lib/dpif-netdev.c:7053
#26 0x000055742929cd33 in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:423
#27 0x00007fdb80a9f802 in start_thread () from /lib64/libc.so.6
#28 0x00007fdb80a3f450 in clone3 () from /lib64/libc.so.6

Environment

  • Red Hat OpenStack Platform 17.1 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content