gfs2:gfs2_dirent_find error and system get hang on RHEL5
Issue
- The load on one of our nodes went up to about 41 and Java processes on the GFS2 file system would hang (local apps started ok), but I could touch files on the shared file system.
-
The other node was not having any issues.
- Rebooting the troublesome node did not seem to resolve the issue even though it seemed to rejoin the cluster successfully.
- I then rebooted the whole cluster and the application on the host started correctly.
-
- When a Java application would start, it would hang and not even a kill -9 would terminate it (leading me to think it was IO bound).
- The following error was output to STDOUT:
kernel: INFO: task cmuat1:17670 blocked for more than 120 seconds. kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: cmuat1 D ffff81021b56ed80 0 17670 1 12861 (NOTLB) kernel: ffff810213bcbc58 0000000000000086 ffff810217d9e810 ffff810218cbdba0 kernel: ffffffff88821bfe 0000000000000001 ffff810213dfb0c0 ffff8104276a9820 kernel: 0000008f0f1e9c56 0000000000086ab1 ffff810213dfb2a8 000000008002d0ee kernel: Call Trace: kernel: [<ffffffff88821bfe>] :gfs2:gfs2_dirent_find+0x0/0x4e kernel: [<ffffffff88826dfc>] :gfs2:do_promote+0xfa/0x188 kernel: [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b kernel: [<ffffffff888368ba>] :gfs2:gfs2_permission+0xaf/0xd5 kernel: [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14 kernel: [<ffffffff8000cfb9>] do_lookup+0x90/0x1e6 kernel: [<ffffffff8000a2cb>] __link_path_walk+0xa2a/0xfb9 kernel: [<ffffffff8000ea7a>] link_path_walk+0x42/0xb2 kernel: [<ffffffff8000cda9>] do_path_lookup+0x275/0x2f1 kernel: [<ffffffff800237e6>] __path_lookup_intent_open+0x56/0x97 kernel: [<ffffffff8003c1fd>] open_exec+0x24/0xc0 kernel: [<ffffffff8005d116>] system_call+0x7e/0x83 kernel: [<ffffffff8003eece>] do_execve+0x46/0x1ed kernel: [<ffffffff80055086>] sys_execve+0x36/0x4c kernel: [<ffffffff8005d4d3>] stub_execve+0x67/0xb0
Environment
- Red Hat Enterprise Linux 5
- Kernel 2.6.18-238.1.1.el5
- Red Hat Cluster Suite - 2 node cluster
- Red Hat GFS2 filesystem
- Mounted on both nodes simulaneously
- Active/active workload.
- On one node, processes accessing the GFS2 filesystem begin going into "D" state.
- Able to create files on the GFS2 filesystem.
- There are no issues with filesystem access on the other node that has the filesystem mounted.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.