LVM commands hang and clvmd hangs when starting after cluster node reboots during ifdown testing in a RHEL 6 High Availability Cluster

Solution Verified - Updated 2024-08-05T07:05:49+00:00 -

Issue

We discovered an issue in which clvmd will hang for both nodes as soon as the fenced node returns online. Basically, running "service clvmd status" or any lvm-related commands (lvs, pvs, vgs, etc.) will hang as soon as they are run in any of both nodes. However, when both nodes are restarted clean at the same time, cman/clvmd/rgmanager start flawlessly.
When I run ifdown to simulate a network failure in my cluster, a node gets fenced, reboots, and rejoins. When it starts clvmd, startup times out and further lvm commands throughout the cluster hang.
A node is fenced following ifdown, and I see the remaining node report that its closing the connection to itself. After the node rejoins, the connection to the node that left is now closed, and DLM-based services cease functioning:

May 20 15:34:39 rhel6-node2 corosync[2986]:   [TOTEM ] A processor failed, forming new configuration.
May 20 15:34:39 rhel6-node2 corosync[2986]:   [TOTEM ] The network interface is down.
May 20 15:34:41 rhel6-node2 corosync[2986]:   [QUORUM] Members[1]: 1
May 20 15:34:41 rhel6-node2 corosync[2986]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 20 15:34:41 rhel6-node2 corosync[2986]:   [QUORUM] Members[2]: 1 2
May 20 15:34:41 rhel6-node2 corosync[2986]:   [QUORUM] Members[2]: 1 2
May 20 15:34:41 rhel6-node2 kernel: dlm: closing connection to node 2
May 20 15:34:41 rhel6-node2 corosync[2986]:   [CPG   ] chosen downlist: sender r(0) ip(127.0.0.1) ; members(old:2 left:1)
May 20 15:34:41 rhel6-node2 fenced[3219]: fencing node rhel6-node1.example.com
May 20 15:34:43 rhel6-node2 fenced[3219]: fence rhel6-node1.example.com success
May 20 15:34:53 rhel6-node2 qdiskd[3036]: Writing eviction notice for node 1
May 20 15:34:54 rhel6-node2 qdiskd[3036]: Node 1 evicted
May 20 15:35:27 rhel6-node2 corosync[2986]:   [TOTEM ] The network interface [192.168.143.62] is now up.
May 20 15:35:29 rhel6-node2 corosync[2986]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 20 15:35:29 rhel6-node2 corosync[2986]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.143.62) ; members(old:1 left:0)
May 20 15:35:29 rhel6-node2 corosync[2986]:   [MAIN  ] Completed service synchronization, ready to provide service.
May 20 15:36:22 rhel6-node2 corosync[2986]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 20 15:36:22 rhel6-node2 kernel: dlm: closing connection to node 1

Environment

Red Hat Enterprise Linux Server 6 with the High Availability or Resilient Storage Add On
Red Hat High Availability cluster with 2 or more nodes
- Issue only present on clusters with more than 2 nodes if using a quorum device
lvm2-cluster (clvmd) in use
- locking_type = 3 in /etc/lvm/lvm.conf
Issue occurs after running ifdown on one node, causing fencing of another node which then proceeds to reboot, rejoin the cluster, and start clvmd

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

LVM commands hang and clvmd hangs when starting after cluster node reboots during ifdown testing in a RHEL 6 High Availability Cluster

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links