Cluster processes like rgmanager, gfs2_quotad, etc are reported as "blocked for more than 120 seconds" in /var/log/messages when using fence_kdump in a RHEL 6 or 7 High Availability cluster
Issue
- Why do I see hung tasks in
/var/log/messages
when usingfence_kdump
? - When
fence_kdump
is configured, it repeatedly times out and never falls back to the other configured fence device
Apr 4 13:33:14 node1 fenced[6379]: fencing node node2
Apr 4 13:33:14 node1 fence_kdump[27549]: waiting for message from '192.168.2.12'
Apr 4 13:35:14 node1 fence_kdump[27549]: timeout after 120 seconds
Apr 4 13:35:14 node1 fenced[6379]: fence node2 dev 0.0 agent fence_kdump result: error from agent
Apr 4 13:35:14 node1 fenced[6379]: fence node2 failed
Apr 4 13:35:17 node1 fenced[6379]: fencing node node2
- Cluster processes become blocked whenever a node must be fenced
Apr 4 13:35:54 node1 kernel: INFO: task gfs2_quotad:6826 blocked for more than 120 seconds.
Apr 4 13:35:54 node1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 4 13:35:54 node1 kernel: gfs2_quotad D 0000000000000005 0 6826 2 0x00000080
Apr 4 13:35:54 node1 kernel: ffff88031dee5a18 0000000000000046 ffff88033585c080 0000000000000002
Apr 4 13:35:54 node1 kernel: 0000000000000000 ffff88031dee59c0 ffffffff81090d8d 0000000000000078
Apr 4 13:35:54 node1 kernel: ffff88031dee1098 ffff88031dee5fd8 000000000000fb88 ffff88031dee1098
Apr 4 13:35:54 node1 kernel: Call Trace:
Environment
- Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add On
- Applicable for both
cman
andpacemaker
/corosync
-based clusters - One or more nodes configured to use
fence_kdump
as the agent for one of itsfencedevices
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.