gfs2:gfs2_dirent_find error and system get hang on RHEL5
Issue
- The load on one of our nodes went up to about 41 and Java processes on the GFS2 file system would hang (local apps started ok), but I could touch files on the shared file system.
-
The other node was not having any issues.
- Rebooting the troublesome node did not seem to resolve the issue even though it seemed to rejoin the cluster successfully.
- I then rebooted the whole cluster and the application on the host started correctly.
-
- When a Java application would start, it would hang and not even a kill -9 would terminate it (leading me to think it was IO bound).
- The following error was output to STDOUT:
kernel: INFO: task cmuat1:17670 blocked for more than 120 seconds. kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: cmuat1 D ffff81021b56ed80 0 17670 1 12861 (NOTLB) kernel: ffff810213bcbc58 0000000000000086 ffff810217d9e810 ffff810218cbdba0 kernel: ffffffff88821bfe 0000000000000001 ffff810213dfb0c0 ffff8104276a9820 kernel: 0000008f0f1e9c56 0000000000086ab1 ffff810213dfb2a8 000000008002d0ee kernel: Call Trace: kernel: [<ffffffff88821bfe>] :gfs2:gfs2_dirent_find+0x0/0x4e kernel: [<ffffffff88826dfc>] :gfs2:do_promote+0xfa/0x188 kernel: [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b kernel: [<ffffffff888368ba>] :gfs2:gfs2_permission+0xaf/0xd5 kernel: [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14 kernel: [<ffffffff8000cfb9>] do_lookup+0x90/0x1e6 kernel: [<ffffffff8000a2cb>] __link_path_walk+0xa2a/0xfb9 kernel: [<ffffffff8000ea7a>] link_path_walk+0x42/0xb2 kernel: [<ffffffff8000cda9>] do_path_lookup+0x275/0x2f1 kernel: [<ffffffff800237e6>] __path_lookup_intent_open+0x56/0x97 kernel: [<ffffffff8003c1fd>] open_exec+0x24/0xc0 kernel: [<ffffffff8005d116>] system_call+0x7e/0x83 kernel: [<ffffffff8003eece>] do_execve+0x46/0x1ed kernel: [<ffffffff80055086>] sys_execve+0x36/0x4c kernel: [<ffffffff8005d4d3>] stub_execve+0x67/0xb0
Environment
- Red Hat Enterprise Linux 5
- Kernel 2.6.18-238.1.1.el5
- Red Hat Cluster Suite - 2 node cluster
- Red Hat GFS2 filesystem
- Mounted on both nodes simulaneously
- Active/active workload.
- On one node, processes accessing the GFS2 filesystem begin going into "D" state.
- Able to create files on the GFS2 filesystem.
- There are no issues with filesystem access on the other node that has the filesystem mounted.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
