update run commands fails after one node crashed during execution
Issue
-
We are running an RHOSP16.1 minor update to 16.1.5, one node (overcloud-compute-0 in logs) crashed during step3 (Wait for containers to start for step 3 using paunch) of the update run command ( §4.4 in update documentation) because the node was unreachable (fatal: [overcloud-compute-0]: UNREACHABLE!).
-
Node was unreachable then it restarted. After restart, we connected to the node to check containers, here is the result :
[heat-admin@overcloud-compute-0 ~]$ sudo podman ps
CONTAINER ID IMAGE COMMAND
CREATED STATUS PORTS NAMES
6bfa594b6854 undercloud:8787/rhosp_containers-ovn-controller:16.1.5 kolla_start
8 hours ago Up 11 minutes ago ovn_controller
2c6e459f2774 undercloud:8787/rhosp_containers-nova-compute:16.1.3 kolla_start
4 months ago Up 11 minutes ago nova_compute
233e72a92bea undercloud:8787/rhosp_containers-neutron-metadata-agent-ovn:16.1.3 kolla_start
4 months ago Up 11 minutes ago ovn_metadata_agent
5e8bdec4be4e undercloud:8787/rhosp_containers-nova-compute:16.1.3 kolla_start
4 months ago Up 11 minutes ago nova_migration_target
019507f23794 undercloud:8787/rhosp_containers-cron:16.1.3 kolla_start
4 months ago Up 11 minutes ago logrotate_crond
85f22370a275 undercloud:8787/rhosp_containers-iscsid:16.1.3 kolla_start
4 months ago Up 11 minutes ago iscsid
ea23db85d9dd undercloud:8787/rhosp_containers-nova-libvirt:16.1.3 kolla_start
4 months ago Up 11 minutes ago nova_libvirt
13096a88b8ac undercloud:8787/rhosp_containers-multipathd:16.1.3 kolla_start
4 months ago Up 11 minutes ago multipathd
- Since the node seemed fine, we re-ran the update run command, then error is different and happends earlier :
fatal: [overcloud-compute-0]: FAILED! =>
[...]
"Did not find container with \"['podman', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--format', '{{.Names}}']\"", "Error executing ['podman', 'create', '--name', 'iscsid'
[...]
"stdout: ", "stderr: f /var/lib/containers/storage/overlay/l/IDY6SYOC6UPILIDFBMFAX5YUEY: no such file or directory"
Same for containers nova_virtlogd, multipathd, nova_libvirt, iscsid.
- We check the podman ps result, no iscsid found :
[heat-admin@overcloud-compute-0 ~]$ sudo podman ps
CONTAINER ID IMAGE COMMAND
CREATED STATUS PORTS NAMES
6bfa594b6854 undercloud:8787/rhosp_containers-ovn-controller:16.1.5 kolla_start
8 hours ago Up 27 minutes ago ovn_controller
2c6e459f2774 undercloud:8787/rhosp_containers-nova-compute:16.1.3 kolla_start
4 months ago Up 27 minutes ago nova_compute
233e72a92bea undercloud:8787/rhosp_containers-neutron-metadata-agent-ovn:16.1.3 kolla_start
4 months ago Up 27 minutes ago ovn_metadata_agent
5e8bdec4be4e undercloud:8787/rhosp_containers-nova-compute:16.1.3 kolla_start
4 months ago Up 27 minutes ago nova_migration_target
019507f23794 undercloud:8787/rhosp_containers-cron:16.1.3 kolla_start
4 months ago Up 27 minutes ago logrotate_crond
[heat-admin@overcloud-compute-0 ~]$ sudo podman ps -a | grep iscsi
-
A third attempt ends up with same result.
-
We cannot go through the end of the update which is needed to solve other tickets and then scale up our platform.
Environment
- Red Hat OpenStack Platform 16.1 (RHOSP)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.