ovn_controller down on 2 compute and cannot be started again
Issue
- On our platform, 2 ovn controller agents are in state UP but 'Alive' is XXX:
| 06eca484-ee8b-4c63-b939-f9300394b87f | OVN Controller agent | overcloud-controller-2 | n/a | XXX | UP | ovn-controller |
| 4451bce4-76a5-4331-b4af-b0dcd282b828 | OVN Controller agent | overcloud-controller-2 | n/a | XXX | UP | ovn-controller |
- A sudo podman
ps -a | grep ovn_controller
shows the containers are "Exited" on those 2 compute Hosts. Forcing a start with podman start ovn_controller ends quickly (a second or less) with the same state :
Exited (139) About a minute ago ovn_controller
-
No error found in ovn-controller.log
-
A look a
/var/log/messages
shows a segfault :
May 14 11:20:10 cpt-hci-01 podman[342547]: 2020-05-14 11:20:10.727507433 +0200 CEST m=+0.159828810 container init 4a92a9553824fafe44f994cf19b00e5d4932d45903254ab7fb296856fd3b3ccf (image=undercloud:8787/osp16_containers-ovn-controller:16.0, name=ovn_controller)
May 14 11:20:10 cpt-hci-01 podman[342547]: 2020-05-14 11:20:10.741916628 +0200 CEST m=+0.174238019 container start 4a92a9553824fafe44f994cf19b00e5d4932d45903254ab7fb296856fd3b3ccf (image=undercloud:8787/osp16_containers-ovn-controller:16.0, name=ovn_controller)
May 14 11:20:11 cpt-hci-01 kernel: ovn-controller[342642]: segfault at 0 ip 00007f3506a741e2 sp 00007ffd056188f8 error 4 in libc-2.28.so[7f350691b000+1b9000]
May 14 11:20:11 cpt-hci-01 kernel: Code: 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 89 f8 31 d2 c5 c5 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 52 03 00 00 c5 fe 6f 0f <c5> f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 74 7a f3 0f
May 14 11:20:11 cpt-hci-01 systemd[1]: Started Process Core Dump (PID 342655/UID 0).
May 14 11:20:11 cpt-hci-01 systemd-coredump[342656]: Process 342642 (ovn-controller) of user 0 dumped core.#012#012Stack trace of thread 7:#012#0 0x00007f3506a741e2 __strcmp_avx2 (libc.so.6)#012#1 0x0000564c53a3b655 n/a (/usr/bin/ovn-controller)
- We migrated some instances to see if that was a memory problem : containers still refused to start.
Environment
- Red Hat OpenStack Platform 16.0 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.