RHEL6: system hang with one process holding s_umount and waiting for unfreeze in __sb_start_write, prevents unfreeze due to thaw_bdev needing s_umount
Issue
- Hang while doing a backup / snapshot procedure which freezes / unfreezes filesystems, with symptoms similar to what is described in RHEL6: Deadlock on frozen ext4 filesystem, one process stuck in thaw_bdev waiting on semaphore held by flush thread doing writeback and stuck in start_this_handle, but on a kernel later than kernel-2.6.32-358.el6
- System hang due to a process holding the
s_umount
semaphore and getting stuck in__sb_start_write
, which will prevent any other process from completingthaw_bdev
. This process may be a variety of things, such askswapd
, a process doing async
, or a process doingdrop_caches
. - Here is an example of a kswapd backtrace:
PID: 40 TASK: ffff880079f08ae0 CPU: 1 COMMAND: "kswapd0"
#0 [ffff880079f0d970] schedule at ffffffff81529990
#1 [ffff880079f0da48] __sb_start_write at ffffffff811900ec
#2 [ffff880079f0dad8] ext4_delete_inode at ffffffffa00a1ef4 [ext4]
#3 [ffff880079f0daf8] generic_delete_inode at ffffffff811ac07e
#4 [ffff880079f0db28] generic_drop_inode at ffffffff811ac1d5
#5 [ffff880079f0db48] iput at ffffffff811ab022
#6 [ffff880079f0db68] dentry_iput at ffffffff811a7c10
#7 [ffff880079f0db88] d_kill at ffffffff811a7d71
#8 [ffff880079f0dba8] __shrink_dcache_sb at ffffffff811a8106
#9 [ffff880079f0dc48] shrink_dcache_memory at ffffffff811a8289
#10 [ffff880079f0dca8] shrink_slab at ffffffff8113d4ba
#11 [ffff880079f0dd08] balance_pgdat at ffffffff8114082a
#12 [ffff880079f0de28] kswapd at ffffffff81140be4
#13 [ffff880079f0dee8] kthread at ffffffff8109e66e
- Here is an example of a 'sync' process backtrace
PID: 23562 TASK: ffff8800253c4040 CPU: 2 COMMAND: "sync_proc"
#0 [ffff880026a7bc78] schedule at ffffffff81534790
#1 [ffff880026a7bd50] __sb_start_write at ffffffff8119322c
#2 [ffff880026a7bde0] ext4_delete_inode at ffffffffa01ddf84 [ext4]
#3 [ffff880026a7be00] generic_delete_inode at ffffffff811af25e
#4 [ffff880026a7be30] generic_drop_inode at ffffffff811af3b5
#5 [ffff880026a7be50] iput at ffffffff811ae202
#6 [ffff880026a7be70] sync_inodes_sb at ffffffff811bd61c
#7 [ffff880026a7bf00] __sync_filesystem at ffffffff811c3d02
#8 [ffff880026a7bf20] sync_filesystem at ffffffff811c3f0b
#9 [ffff880026a7bf40] sys_syncfs at ffffffff811c3f80
#10 [ffff880026a7bf80] tracesys at ffffffff8100b288 (via system_call)
Environment
- Red Hat Enterprise Linux 6
- kernels prior to 2.6.32-642.el6
- kernels prior to 2.6.32-573.18.1.el6
- seen on kernel 2.6.32-504*el6
- seen with ext4 (but any filesystem is likely affected)
- Any tool which does freeze / thaw ioctl may be affected by this problem
- seen with VMware snapshots / vmtoolsd
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.