Mounting a GFS2 file system blocks after a reboot of one node, and one or more other nodes do not show the gfs mountgroup for that file system in 'cman_tool services' in RHEL 6
Issue
mount.gfs2
blocked 120 seconds and showed a backtrace in the logs
Mar 7 23:49:32 node2 kernel: INFO: task mount.gfs2:8129 blocked for more than 120 seconds.
Mar 7 23:49:32 node2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 7 23:49:32 node2 kernel: mount.gfs2 D 0000000000000000 0 8129 1 0x00000080
Mar 7 23:49:32 node2 kernel: ffff880229ebd9d8 0000000000000086 ffff880200000008 ffffffffa0490520
Mar 7 23:49:32 node2 kernel: ffff88023a385950 ffffffffa04904d0 ffff880200000005 ffff88023a385a00
Mar 7 23:49:32 node2 kernel: ffff88023b7efaf8 ffff880229ebdfd8 000000000000fb88 ffff88023b7efaf8
Mar 7 23:49:32 node2 kernel: Call Trace:
Mar 7 23:49:32 node2 kernel: [<ffffffffa0490520>] ? gdlm_ast+0x0/0x210 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffffa04904d0>] ? gdlm_bast+0x0/0x50 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffffa0470870>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffffa047087e>] gfs2_glock_holder_wait+0xe/0x20 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffff8150e82f>] __wait_on_bit+0x5f/0x90
Mar 7 23:49:32 node2 kernel: [<ffffffffa0470870>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffff8150e8d8>] out_of_line_wait_on_bit+0x78/0x90
Mar 7 23:49:32 node2 kernel: [<ffffffff81096cc0>] ? wake_bit_function+0x0/0x50
Mar 7 23:49:32 node2 kernel: [<ffffffffa0472ae5>] gfs2_glock_wait+0x45/0x90 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffffa0473f83>] gfs2_glock_nq+0x2d3/0x3e0 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffffa0474271>] gfs2_glock_nq_num+0x61/0xa0 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffffa047fbff>] init_journal+0x14f/0x4d0 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffffa047fa24>] ? gfs2_jindex_hold+0x1a4/0x230 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffffa047ffb7>] init_inodes+0x37/0x170 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffffa0480b98>] gfs2_get_sb+0x828/0xa00 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffffa0474269>] ? gfs2_glock_nq_num+0x59/0xa0 [gfs2]
Mar 7 23:49:32 node2 kernel: [<ffffffff8116087a>] ? alloc_pages_current+0xaa/0x110
Mar 7 23:49:32 node2 kernel: [<ffffffff8118381b>] vfs_kern_mount+0x7b/0x1b0
Mar 7 23:49:32 node2 kernel: [<ffffffff811839c2>] do_kern_mount+0x52/0x130
Mar 7 23:49:32 node2 kernel: [<ffffffff811a3c12>] do_mount+0x2d2/0x8d0
Mar 7 23:49:32 node2 kernel: [<ffffffff811a42a0>] sys_mount+0x90/0xe0
Mar 7 23:49:32 node2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
- After a hard reboot of one node, it gets stuck when trying to mount a GFS2 file system after rejoining the cluster. The last message in the log is:
Mar 7 23:46:50 node2 kernel: GFS2: fsid=myCluster:fs1.0: Joined cluster. Now mounting FS...
- One or more nodes in the cluster have a GFS2 file system mounted and show an entry for it under "dlm lockspaces" in the
cman_tool services
output, but do not show an entry undergfs mountgroups
.
# cman_tool services
fence domain
member count 2
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2
dlm lockspaces
name clvmd
id 0x4104eefa
flags 0x00000000
change member 2 joined 1 remove 0 failed 0 seq 1,1
members 1 2
name fs1
id 0x58aa977e
flags 0x00000008 fs_reg
change member 2 joined 1 remove 0 failed 0 seq 4,4
members 1 2
name fs2
id 0xe22d3136
flags 0x00000000
change member 2 joined 1 remove 0 failed 0 seq 4,4
members 1 2
gfs mountgroups
name fs2
id 0x9661e92d
flags 0x00000048 mounted
change member 2 joined 1 remove 0 failed 0 seq 4,4
members 1 2
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the Resilient Storage Add On
- GFS2
gfs2-utils
andcman
releases prior to3.0.12.1-59.el6_5.3
in RHEL 6 Update 5, or prior to3.0.12.1-68.el6
in other RHEL 6 updates
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.