update run commands fails after one node crashed during execution

Solution In Progress - Updated -

Issue

  • We are running an RHOSP16.1 minor update to 16.1.5, one node (overcloud-compute-0 in logs) crashed during step3 (Wait for containers to start for step 3 using paunch) of the update run command ( §4.4 in update documentation) because the node was unreachable (fatal: [overcloud-compute-0]: UNREACHABLE!).

  • Node was unreachable then it restarted. After restart, we connected to the node to check containers, here is the result :

[heat-admin@overcloud-compute-0 ~]$ sudo podman ps
CONTAINER ID  IMAGE                                                                                           COMMAND
CREATED       STATUS             PORTS  NAMES
6bfa594b6854  undercloud:8787/rhosp_containers-ovn-controller:16.1.5              kolla_start
8 hours ago   Up 11 minutes ago         ovn_controller
2c6e459f2774  undercloud:8787/rhosp_containers-nova-compute:16.1.3                kolla_start
4 months ago  Up 11 minutes ago         nova_compute
233e72a92bea  undercloud:8787/rhosp_containers-neutron-metadata-agent-ovn:16.1.3  kolla_start
4 months ago  Up 11 minutes ago         ovn_metadata_agent
5e8bdec4be4e  undercloud:8787/rhosp_containers-nova-compute:16.1.3                kolla_start
4 months ago  Up 11 minutes ago         nova_migration_target
019507f23794  undercloud:8787/rhosp_containers-cron:16.1.3                        kolla_start
4 months ago  Up 11 minutes ago         logrotate_crond
85f22370a275  undercloud:8787/rhosp_containers-iscsid:16.1.3                      kolla_start
4 months ago  Up 11 minutes ago         iscsid
ea23db85d9dd  undercloud:8787/rhosp_containers-nova-libvirt:16.1.3                kolla_start
4 months ago  Up 11 minutes ago         nova_libvirt
13096a88b8ac  undercloud:8787/rhosp_containers-multipathd:16.1.3                  kolla_start
4 months ago  Up 11 minutes ago         multipathd
  • Since the node seemed fine, we re-ran the update run command, then error is different and happends earlier :
fatal: [overcloud-compute-0]: FAILED! =>
[...]
"Did not find container with \"['podman', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--format', '{{.Names}}']\"", "Error executing ['podman', 'create', '--name', 'iscsid'
[...]
"stdout: ", "stderr: f /var/lib/containers/storage/overlay/l/IDY6SYOC6UPILIDFBMFAX5YUEY: no such file or directory"
Same for containers nova_virtlogd, multipathd, nova_libvirt, iscsid.
  • We check the podman ps result, no iscsid found :
[heat-admin@overcloud-compute-0 ~]$ sudo podman ps
CONTAINER ID  IMAGE                                                                                           COMMAND
CREATED       STATUS             PORTS  NAMES
6bfa594b6854  undercloud:8787/rhosp_containers-ovn-controller:16.1.5              kolla_start
8 hours ago   Up 27 minutes ago         ovn_controller
2c6e459f2774  undercloud:8787/rhosp_containers-nova-compute:16.1.3                kolla_start
4 months ago  Up 27 minutes ago         nova_compute
233e72a92bea  undercloud:8787/rhosp_containers-neutron-metadata-agent-ovn:16.1.3  kolla_start
4 months ago  Up 27 minutes ago         ovn_metadata_agent
5e8bdec4be4e  undercloud:8787/rhosp_containers-nova-compute:16.1.3                kolla_start
4 months ago  Up 27 minutes ago         nova_migration_target
019507f23794  undercloud:8787/rhosp_containers-cron:16.1.3                        kolla_start
4 months ago  Up 27 minutes ago         logrotate_crond
[heat-admin@overcloud-compute-0 ~]$ sudo podman ps -a | grep iscsi
  • A third attempt ends up with same result.

  • We cannot go through the end of the update which is needed to solve other tickets and then scale up our platform.

Environment

  • Red Hat OpenStack Platform 16.1 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content