Multiple nodes in the cluster panicked after hung task warnings
Issue
- Multiple servers in GFS2 cluster crashed
- Several nodes panicked after fencing failed for several minutes.
- Several nodes kernel panicked after reporting hung task warnings such as:
Jan 5 23:30:45 node1 kernel: INFO: task gfs2_quotad:25161 blocked for more than 120 seconds.
Jan 5 23:30:45 node1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 5 23:30:45 node1 kernel: gfs2_quotad D ffffffff80156347 0 25161 1291 25169 25160 (L-TLB)
Jan 5 23:30:45 node1 kernel: ffff81105c2cdcc0 0000000000000046 0000000000000000 ffff810838eb3000
Jan 5 23:30:45 node1 kernel: 0000000000000018 000000000000000a ffff81102eefb7e0 ffff81103f88c040
Jan 5 23:30:45 node1 kernel: 00040df462aee4fd 0000000000007eb8 ffff81102eefb9c8 00000028889bb97c
Jan 5 23:30:45 node1 kernel: Call Trace:
Jan 5 23:30:45 node1 kernel: [<ffffffff889ba00f>] :dlm:dlm_lock+0x117/0x129
[...]
Environment
- Red Hat Enterprise Linux (RHEL) 5 (Update 5 or later)
- Red Hat Enterprise Linux (RHEL) 6, 7, 8, 9
- High Availability Add On
sysctl
parameterkernel.hung_task_panic = 1
(/proc/sys/kernel/hung_task_panic
contains value1
)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.