update run commands fails after one node crashed during execution

Solution In Progress - Updated -

Issue

  • We are running an RHOSP16.1 minor update to 16.1.5, one node (overcloud-compute-0 in logs) crashed during step3 (Wait for containers to start for step 3 using paunch) of the update run command ( §4.4 in update documentation) because the node was unreachable (fatal: [overcloud-compute-0]: UNREACHABLE!).

  • Node was unreachable then it restarted. After restart, we connected to the node to check containers, here is the result :

[heat-admin@overcloud-compute-0 ~]$ sudo podman ps
CONTAINER ID  IMAGE                                                                                           COMMAND
CREATED       STATUS             PORTS  NAMES
6bfa594b6854  undercloud:8787/rhosp_containers-ovn-controller:16.1.5              kolla_start
8 hours ago   Up 11 minutes ago         ovn_controller
2c6e459f2774  undercloud:8787/rhosp_containers-nova-compute:16.1.3                kolla_start
4 months ago  Up 11 minutes ago         nova_compute
233e72a92bea  undercloud:8787/rhosp_containers-neutron-metadata-agent-ovn:16.1.3  kolla_start
4 months ago  Up 11 minutes ago         ovn_metadata_agent
5e8bdec4be4e  undercloud:8787/rhosp_containers-nova-compute:16.1.3                kolla_start
4 months ago  Up 11 minutes ago         nova_migration_target
019507f23794  undercloud:8787/rhosp_containers-cron:16.1.3                        kolla_start
4 months ago  Up 11 minutes ago         logrotate_crond
85f22370a275  undercloud:8787/rhosp_containers-iscsid:16.1.3                      kolla_start
4 months ago  Up 11 minutes ago         iscsid
ea23db85d9dd  undercloud:8787/rhosp_containers-nova-libvirt:16.1.3                kolla_start
4 months ago  Up 11 minutes ago         nova_libvirt
13096a88b8ac  undercloud:8787/rhosp_containers-multipathd:16.1.3                  kolla_start
4 months ago  Up 11 minutes ago         multipathd
  • Since the node seemed fine, we re-ran the update run command, then error is different and happends earlier :
fatal: [overcloud-compute-0]: FAILED! =>
[...]
"Did not find container with \"['podman', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--format', '{{.Names}}']\"", "Error executing ['podman', 'create', '--name', 'iscsid'
[...]
"stdout: ", "stderr: f /var/lib/containers/storage/overlay/l/IDY6SYOC6UPILIDFBMFAX5YUEY: no such file or directory"
Same for containers nova_virtlogd, multipathd, nova_libvirt, iscsid.
  • We check the podman ps result, no iscsid found :
[heat-admin@overcloud-compute-0 ~]$ sudo podman ps
CONTAINER ID  IMAGE                                                                                           COMMAND
CREATED       STATUS             PORTS  NAMES
6bfa594b6854  undercloud:8787/rhosp_containers-ovn-controller:16.1.5              kolla_start
8 hours ago   Up 27 minutes ago         ovn_controller
2c6e459f2774  undercloud:8787/rhosp_containers-nova-compute:16.1.3                kolla_start
4 months ago  Up 27 minutes ago         nova_compute
233e72a92bea  undercloud:8787/rhosp_containers-neutron-metadata-agent-ovn:16.1.3  kolla_start
4 months ago  Up 27 minutes ago         ovn_metadata_agent
5e8bdec4be4e  undercloud:8787/rhosp_containers-nova-compute:16.1.3                kolla_start
4 months ago  Up 27 minutes ago         nova_migration_target
019507f23794  undercloud:8787/rhosp_containers-cron:16.1.3                        kolla_start
4 months ago  Up 27 minutes ago         logrotate_crond
[heat-admin@overcloud-compute-0 ~]$ sudo podman ps -a | grep iscsi
  • A third attempt ends up with same result.

  • We cannot go through the end of the update which is needed to solve other tickets and then scale up our platform.

Environment

  • Red Hat OpenStack Platform 16.1 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In