One RabbitMQ server can't join the cluster.
Issue
This article is built from a point of view of OpenStack but it could impact you in other setups/situations.
This happened after a minor update from 16.1 to 16.2.
The minor update was finished but rabbitmq couldn't start and join the cluster.
In pacemaker one of the server is in stopped state and you see this at the bottom of a 'pcs status'
Failed Resource Actions:
* rabbitmq_start_0 on rabbitmq-bundle-1 'error' (1): call=13, status='Timed Out', exitreason='', last-rc-change='2022-08-17 11:27:29 +02:00', queued=0ms, exec=200067ms
In /var/log/messages you see something like this:
Aug 17 11:28:09 controller03 rabbitmq-cluster(rabbitmq)[328]: WARNING: Re-detect available rabbitmq nodes and try to start again
Aug 17 11:28:10 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: RabbitMQ server could not get cluster status from mnesia
Aug 17 11:28:10 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: wiping data directory before joining
Aug 17 11:28:12 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: Forgetting rabbit@controller03 via nodes [ rabbit@controller02 ].
Aug 17 11:28:13 controller03 rabbitmq-cluster(rabbitmq)[328]: ERROR: Failed to forget node rabbit@controller03 via rabbit@controller02.
Aug 17 11:28:13 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: Joining existing cluster with [ rabbit@controller02 ] nodes.
Aug 17 11:28:13 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: Waiting for server to start
Aug 17 11:28:24 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: Attempting to join cluster with target node rabbit@controller02
Aug 17 11:28:25 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: Join process incomplete, shutting down.
Aug 17 11:28:25 controller03 rabbitmq-cluster(rabbitmq)[328]: WARNING: Failed to join the RabbitMQ cluster from nodes rabbit@controller02. Stopping local unclustered rabbitmq
On a good node, you will find the following in rabbitmq logs:
2022-08-17 11:28:59.101 [error] <0.1076.0> ** Connection attempt from node 'rabbitmqcli-99004-rabbit@controller03' rejected. Invalid challenge reply. **
Environment
- Red Hat OpenStack Platform 16
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.