clvmd start timed out with dlm socket error

Solution Verified - Updated -

Issue

  • On RHEL 6, a node is able to rejoin the cluster briefly after fencing or qdisk eviction, but then it is immediately kicked out by another node.
Jun 14 02:39:33 node1 qdiskd[30861]: qdisk cycle took more than 5 seconds to complete (6.070000)
Jun 14 02:43:06 node1 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Jun 14 02:43:12 node1 corosync[4659]:   [MAIN  ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Jun 14 02:43:13 node1 corosync[4659]:   [QUORUM] Members[3]: 1 2 3
Jun 14 02:43:13 node1 corosync[4659]:   [CPG   ] chosen downlist: sender r(0) ip(10.238.11.147) ; members(old:1 left:0)
Jun 14 02:43:13 node1 corosync[4659]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 14 02:43:13 node1 corosync[4659]: cman killed by node 2 because we were killed by cman_tool or other application

Jun 14 02:42:46 node2 qdiskd[8000]: Writing eviction notice for node 1
Jun 14 02:42:51 node2 qdiskd[8000]: Node 1 evicted

Jun 14 02:43:10 node3 kernel: dlm: node 3: socket error sending to node 1 at 10.0.0.1, port 21064, sk_err=104/113
Jun 14 02:43:13 node3 corosync[11504]:   [QUORUM] Members[2]: 2 3
Jun 14 02:43:13 node3 corosync[11504]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 14 02:43:13 node3 rgmanager[16234]: State change: node1.example.com DOWN
Jun 14 02:43:13 node3 corosync[11504]:   [QUORUM] Members[3]: 1 2 3
Jun 14 02:43:13 node3 corosync[11504]:   [QUORUM] Members[3]: 1 2 3
Jun 14 02:43:13 node3 corosync[11504]:   [CPG   ] chosen downlist: sender r(0) ip(10.0.0.2) ; members(old:3 left:1)
Jun 14 02:43:13 node3 corosync[11504]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 14 02:43:13 node3 kernel: dlm: closing connection to node 1
  • On RHEL 7, after a node is fenced, clvmd startup times out after a dlm socket error.
Oct 26 08:07:16 node2 kernel: dlm: Using TCP for communications
Oct 26 08:07:16 node2 kernel: dlm: connecting to 1
Oct 26 08:07:16 node2 kernel: dlm: node 2: socket error sending to node 1, port 21064, sk_err=104/0
...
Oct 26 08:12:20 node2 lrmd[1209]: warning: clvmd_start_0 process (PID 1593) timed out
Oct 26 08:12:20 node2 lrmd[1209]: warning: clvmd_start_0:1593 - timed out after 300000ms
Oct 26 08:12:20 node2 crmd[1212]:   error: Result of start operation for clvmd on node2.example.com: Timed Out

Environment

  • Red Hat Enterprise Linux 6 or 7 (with the High Availability Add-on)
  • dlm and clvmd

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content