Rabbit-mq bundle will not start on one of the controllers after timed out Openstack minor update controller run

Solution In Progress - Updated -

Issue

  • After the first somewhat successful controller update run (first one to make it to the expected 4 hour timeout failure) rabbit-mq-bundle was stopped on the controller that hadn't been touched by the update yet. There are no Failed Resource Actions. Cleanup says it did it but really doesn't do anything. Restart didn't either. Note the anomaly on the last two lines of this cleanup output:
[heat-admin@ocd01-controller-2 ~]$ sudo pcs resource cleanup rabbitmq-bundle
Cleaned up rabbitmq-bundle-docker-0 on ocd01-controller-2
Cleaned up rabbitmq-bundle-docker-0 on ocd01-controller-1
Cleaned up rabbitmq-bundle-docker-0 on ocd01-controller-0
Cleaned up rabbitmq-bundle-0 on ocd01-controller-2
Cleaned up rabbitmq-bundle-0 on ocd01-controller-1
Cleaned up rabbitmq-bundle-0 on ocd01-controller-0
Cleaned up rabbitmq-bundle-docker-1 on ocd01-controller-2
Cleaned up rabbitmq-bundle-docker-1 on ocd01-controller-1
Cleaned up rabbitmq-bundle-docker-1 on ocd01-controller-0
Cleaned up rabbitmq-bundle-1 on ocd01-controller-2
Cleaned up rabbitmq-bundle-1 on ocd01-controller-1
Cleaned up rabbitmq-bundle-1 on ocd01-controller-0
Cleaned up rabbitmq-bundle-docker-2 on ocd01-controller-2
Cleaned up rabbitmq-bundle-docker-2 on ocd01-controller-1
Cleaned up rabbitmq-bundle-docker-2 on ocd01-controller-0
Cleaned up rabbitmq-bundle-2 on ocd01-controller-2
Cleaned up rabbitmq:0 on rabbitmq-bundle-0
Cleaned up rabbitmq:1 on rabbitmq-bundle-1
  • This KB doesn't match the situation:

  • We see the following in /var/log/messages:

Jul 19 18:01:58 overcloud-controller-2 crmd[313631]: warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check
Jul 19 18:01:58 overcloud-controller-2 corosync[313564]: [TOTEM ] A new membership (10.224.8.20:7732) was formed. Members left: 1
Jul 19 18:01:58 overcloud-controller-2 corosync[313564]: [QUORUM] Members[2]: 2 3
Jul 19 18:01:58 overcloud-controller-2 corosync[313564]: [MAIN  ] Completed service synchronization, ready to provide service.
Jul 19 18:01:58 overcloud-controller-2 pacemakerd[313576]:  notice: Node overcloud-controller-0 state is now lost
Jul 19 18:01:59 overcloud-controller-2 dnsmasq[180147]: read /var/lib/neutron/dhcp/90292b43-3cd9-4c98-b008-013208e4d9e4/addn_hosts - 4 addresses
Jul 19 18:01:59 overcloud-controller-2 dnsmasq-dhcp[180147]: read /var/lib/neutron/dhcp/90292b43-3cd9-4c98-b008-013208e4d9e4/host
Jul 19 18:01:59 overcloud-controller-2 dnsmasq-dhcp[180147]: read /var/lib/neutron/dhcp/90292b43-3cd9-4c98-b008-013208e4d9e4/opts
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node galera-bundle-2 state is now member
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node redis-bundle-0 state is now lost
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]: warning: No reason to expect node redis-bundle-0 to be down
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Stonith/shutdown of redis-bundle-0 not matched
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node rabbitmq-bundle-1 state is now lost
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]: warning: No reason to expect node rabbitmq-bundle-1 to be down
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Stonith/shutdown of rabbitmq-bundle-1 not matched
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node galera-bundle-0 state is now lost
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]: warning: No reason to expect node galera-bundle-0 to be down
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Stonith/shutdown of galera-bundle-0 not matched
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node redis-bundle-2 state is now member
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node rabbitmq-bundle-0 state is now member
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Node overcloud-controller-0 state is now lost
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]: warning: No reason to expect node 1 to be down
Jul 19 18:02:00 overcloud-controller-2 crmd[313631]:  notice: Stonith/shutdown of overcloud-controller-0 not matched
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]: warning: Blind faith: not fencing unseen nodes
Jul 19 18:02:00 overcloud-controller-2 cib[313626]: warning: A-Sync reply to crmd failed: No message of desired type
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      rabbitmq-bundle-1     ( overcloud-controller-2 )   due to unrunnable rabbitmq-bundle-docker-1 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      rabbitmq:1            (  rabbitmq-bundle-1 )   due to unrunnable rabbitmq-bundle-docker-1 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      rabbitmq-bundle-2     ( overcloud-controller-2 )   due to unrunnable rabbitmq-bundle-docker-2 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      rabbitmq:2            (  rabbitmq-bundle-2 )   due to unrunnable rabbitmq-bundle-docker-2 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      galera-bundle-0       ( overcloud-controller-1 )   due to unrunnable galera-bundle-docker-0 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      galera:0              (    galera-bundle-0 )   due to unrunnable galera-bundle-docker-0 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      redis-bundle-0        ( overcloud-controller-2 )   due to unrunnable redis-bundle-docker-0 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice:  * Start      redis:0               (     redis-bundle-0 )   due to unrunnable redis-bundle-docker-0 start (blocked)
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:   error: Could not determine address for bundle connection rabbitmq-bundle-1
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:   error: Could not determine address for bundle connection rabbitmq-bundle-2
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:   error: Could not determine address for bundle connection galera-bundle-0
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:   error: Could not determine address for bundle connection redis-bundle-0
Jul 19 18:02:00 overcloud-controller-2 pengine[313630]:  notice: Calculated transition 0, saving inputs in /var/lib/pacemaker/pengine/pe-input-2328.bz2

Environment

  • Red Hat OpenStack Platform 13.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content