clvmd blocks on one node while rejoining the cluster and the kernel shows backtraces for it waiting in dlm_new_lockspace in RHEL 6

Solution Unverified - Updated 2024-08-05T06:45:14+00:00 -

Issue

A node was fenced and when starting back up clvmd becomes blocked and never completes activation of volumes
I can't mount GFS2 file systems and lvm devices can't be found on the system, and /var/log/messages shows clvmd backtraces
clvmd is stuck waiting in dlm_new_lockspace when starting, and lvm commands block throughout the cluster

Jul  8 05:25:18 node1 kernel: dlm: Using TCP for communications
Jul  8 05:25:18 node1 kernel: dlm: connecting to 4
Jul  8 05:25:18 node1 kernel: dlm: connecting to 3
Jul  8 05:25:18 node1 kernel: dlm: connecting to 2
Jul  8 05:27:50 node1 kernel: INFO: task clvmd:28851 blocked for more than 120 seconds.
Jul  8 05:27:50 node1 kernel:      Tainted: P           ---------------    2.6.32-431.17.1.el6.x86_64 #1
Jul  8 05:27:50 node1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  8 05:27:50 node1 kernel: clvmd         D 0000000000000011     0 28851      1 0x00000080
Jul  8 05:27:50 node1 kernel: ffff880829fe7c98 0000000000000086 0000000000000000 ffffffff810699d3
Jul  8 05:27:50 node1 kernel: ffff880829fe7c38 ffff88082ea6f538 ffff88082f340ae8 ffff88185c4168a8
Jul  8 05:27:50 node1 kernel: ffff88082ea6fab8 ffff880829fe7fd8 000000000000fbc8 ffff88082ea6fab8
Jul  8 05:27:50 node1 kernel: Call Trace:
Jul  8 05:27:50 node1 kernel: [<ffffffff810699d3>] ? dequeue_entity+0x113/0x2e0
Jul  8 05:27:50 node1 kernel: [<ffffffff81528a95>] schedule_timeout+0x215/0x2e0
Jul  8 05:27:50 node1 kernel: [<ffffffff81527bfe>] ? thread_return+0x4e/0x760
Jul  8 05:27:50 node1 kernel: [<ffffffff81285172>] ? kobject_uevent_env+0x202/0x620
Jul  8 05:27:50 node1 kernel: [<ffffffff81528713>] wait_for_common+0x123/0x180
Jul  8 05:27:50 node1 kernel: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
Jul  8 05:27:50 node1 kernel: [<ffffffff8152882d>] wait_for_completion+0x1d/0x20
Jul  8 05:27:50 node1 kernel: [<ffffffffa054cf79>] dlm_new_lockspace+0x999/0xa30 [dlm]
Jul  8 05:27:50 node1 kernel: [<ffffffffa0554ff1>] device_write+0x311/0x720 [dlm]
Jul  8 05:27:50 node1 kernel: [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
Jul  8 05:27:50 node1 kernel: [<ffffffff81226056>] ? security_file_permission+0x16/0x20
Jul  8 05:27:50 node1 kernel: [<ffffffff81188c38>] vfs_write+0xb8/0x1a0
Jul  8 05:27:50 node1 kernel: [<ffffffff81189531>] sys_write+0x51/0x90
Jul  8 05:27:50 node1 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Environment

Red Hat Enterprise Linux (RHEL) 6 with the Resilient Storage Add On
lvm2-cluster
- clvmd is running
- locking_type = 3' in /etc/lvm/lvm.conf
There are no ongoing problems in the cluster that would cause cluster services to become blocked, such as a loss of quorum or failed fencing
One or more nodes shows a consistent, greater than 0 value for Recv-Q bytes for the DLM connection between itself and the node that is attempting to rejoin, as described in the Diagnostic Steps below

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

clvmd blocks on one node while rejoining the cluster and the kernel shows backtraces for it waiting in dlm_new_lockspace in RHEL 6

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links