rgmanager, clvmd, cmirrord, or other DLM-using daemons in a RHEL 5 or 6 High Availability cluster become blocked after "dlm: connecting to X" or "dlm: got connection from X" is seen in the logs

Solution Unverified - Updated -

Issue

  • rgmanager is not responding to any requests to move a service, and doesn't seem to be running status checks any longer. The last thing we see in the logs is dlm reporting connections to other nodes:
  Nov  3 12:20:35 node1 kernel: dlm: got connection from 6
  Nov  3 12:20:35 node2 kernel: dlm: got connection from 4
  Nov  3 12:20:35 node3 kernel: dlm: got connection from 5
  Nov  3 12:20:35 node4 kernel: dlm: got connection from 3
  • clustat is reporting "Service states unavailable" and not showing any services, and clusvcadm can't manage any services. The logs show dlm: connecting to X right before this happened
  • LVM commands in my cluster are not finishing and just hang forever. The logs report dlm connections between nodes that were already connected

Environment

  • Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add
  • One or more daemons in use that utilizes DLM for locking
    • rgmanager, clvmd, cmirrord and GFS2 file systems all utilize DLM and may be affected

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content