One RabbitMQ server can't join the cluster.

Solution Verified - Updated -

Issue

This article is built from a point of view of OpenStack but it could impact you in other setups/situations.

This happened after a minor update from 16.1 to 16.2.
The minor update was finished but rabbitmq couldn't start and join the cluster.

In pacemaker one of the server is in stopped state and you see this at the bottom of a 'pcs status'

Failed Resource Actions:
* rabbitmq_start_0 on rabbitmq-bundle-1 'error' (1): call=13, status='Timed Out', exitreason='', last-rc-change='2022-08-17 11:27:29 +02:00', queued=0ms, exec=200067ms

In /var/log/messages you see something like this:

Aug 17 11:28:09 controller03 rabbitmq-cluster(rabbitmq)[328]: WARNING: Re-detect available rabbitmq nodes and try to start again
Aug 17 11:28:10 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: RabbitMQ server could not get cluster status from mnesia
Aug 17 11:28:10 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: wiping data directory before joining
Aug 17 11:28:12 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: Forgetting rabbit@controller03 via nodes [ rabbit@controller02  ].
Aug 17 11:28:13 controller03 rabbitmq-cluster(rabbitmq)[328]: ERROR: Failed to forget node rabbit@controller03 via rabbit@controller02.
Aug 17 11:28:13 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: Joining existing cluster with [ rabbit@controller02  ] nodes.
Aug 17 11:28:13 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: Waiting for server to start
Aug 17 11:28:24 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: Attempting to join cluster with target node rabbit@controller02
Aug 17 11:28:25 controller03 rabbitmq-cluster(rabbitmq)[328]: INFO: Join process incomplete, shutting down.
Aug 17 11:28:25 controller03 rabbitmq-cluster(rabbitmq)[328]: WARNING: Failed to join the RabbitMQ cluster from nodes rabbit@controller02. Stopping local unclustered rabbitmq

On a good node, you will find the following in rabbitmq logs:

2022-08-17 11:28:59.101 [error] <0.1076.0> ** Connection attempt from node 'rabbitmqcli-99004-rabbit@controller03' rejected. Invalid challenge reply. **

Environment

  • Red Hat OpenStack Platform 16

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content