clvmd or other services hang while node rejoins cluster, `cman_tool services` shows services in FAIL_ALL_STOPPED, FAIL_START_WAIT, or JOIN_START_WAIT throughout cluster

Solution Unverified - Updated -

Issue

  • After a node was removed from the cluster, attempted to rejoin, and removed again shortly after, clvmd won't start or GFS/GFS2 file systems can't be mounted. cman_tool services shows one or more services in bad states:
# cman_tool services
type             level name       id       state       
fence            0     default    0001000a FAIL_ALL_STOPPED
[1 2 3 4 5 6 7 8 9 10 11 12 13 14]
dlm              1     clvmd      00010003 none        
[1 2 3 4 5 6 7 8 9 10 11 12 13]
dlm              1     gfsdata    00020005 none        
[1 2 3 4 5 6 7 8 9 10 11 12 13]
dlm              1     rgmanager  00010002 none        
[1 2 3 4 5 6 7 8 9 10 11 12 13]
gfs              2     gfsdata    00010005 FAIL_START_WAIT
[1 2 3 4 5 6 7 8 9 10 11 12 13]
  • After a node is removed from the cluster that node is unable to mount GFS filesystems. When starting clvmd it hangs.

Environment

  • Red Hat Enterprise Linux Server 5 (with the High Availability Add on)
  • A node has been removed from the cluster, rejoined, and then removed again shortly after

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content