rgmanager, clvmd, cmirrord, or other DLM-using daemons in a RHEL 5 or 6 High Availability cluster become blocked after "dlm: connecting to X" or "dlm: got connection from X" is seen in the logs
Issue
rgmanager
is not responding to any requests to move a service, and doesn't seem to be running status checks any longer. The last thing we see in the logs isdlm
reporting connections to other nodes:
Nov 3 12:20:35 node1 kernel: dlm: got connection from 6
Nov 3 12:20:35 node2 kernel: dlm: got connection from 4
Nov 3 12:20:35 node3 kernel: dlm: got connection from 5
Nov 3 12:20:35 node4 kernel: dlm: got connection from 3
clustat
is reporting "Service states unavailable" and not showing any services, andclusvcadm
can't manage any services. The logs showdlm: connecting to X
right before this happened- LVM commands in my cluster are not finishing and just hang forever. The logs report
dlm
connections between nodes that were already connected
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add
- One or more daemons in use that utilizes DLM for locking
rgmanager
,clvmd
,cmirrord
and GFS2 file systems all utilize DLM and may be affected
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.