System locked up when I/O to "/" was frozen for a long time.

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • kernel-2.6.18-238.45.1.el5

Issue

  • A series of hung_task_timeout_secs messages were found in log file, e.g.:
INFO: task java:7132 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D ffff810009006420     0  7132   4485          7169  7090 (NOTLB)
 ffff81200cd7bde8 0000000000200082 ffff81200d6f6ff8 0000000000000002
 000000000cd7be48 000000000000000a ffff81201f42e820 ffffffff80312b60
 0000461a9731e5c5 0000000000013b63 ffff81201f42ea08 000000008806190d
Call Trace:
 [<ffffffff8803badd>] :jbd2:start_this_handle+0x128/0x3b3
 [<ffffffff800a2964>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8012e3ee>] file_has_perm+0x94/0xa3
 [<ffffffff8803be09>] :jbd2:jbd2_journal_start+0xa1/0xd8
 [<ffffffff8805e0d8>] :ext4:ext4_dirty_inode+0x1a/0x46
 [<ffffffff80013d67>] __mark_inode_dirty+0x29/0x16e
 [<ffffffff800fc1a1>] compat_filldir+0x0/0xc9
 [<ffffffff80035566>] vfs_readdir+0x8c/0xa9
 [<ffffffff800fdb8a>] compat_sys_getdents+0x75/0xbd
 [<ffffffff8006153d>] sysenter_tracesys+0x48/0x83
 [<ffffffff8006149d>] sysenter_do_call+0x1e/0x76

Resolution

Check if system is taking part in any backup related activities (in particular snapshots) and see if there is any third party utility which is intentionally freezing the filesystem.

Root Cause

Super block in question was intentionally frozen. So any subsequent I/O to the fs would have to queue, until a "thaw" operation takes place.

Diagnostic Steps

  • Let's take a look at random blocked process:
crash> bt ffff811d64897820
PID: 26106  TASK: ffff811d64897820  CPU: 3   COMMAND: "perl"
 #0 [ffff81201a7b7ab8] schedule at ffffffff80062f8e
 #1 [ffff81201a7b7b90] start_this_handle at ffffffff8803badd [jbd2]
 #2 [ffff81201a7b7c10] jbd2_journal_start at ffffffff8803be09 [jbd2]
 #3 [ffff81201a7b7c40] ext4_dirty_inode at ffffffff8805e0d8 [ext4]
 #4 [ffff81201a7b7c60] __mark_inode_dirty at ffffffff80013d67
 #5 [ffff81201a7b7c90] do_generic_mapping_read at ffffffff8000c5f5
 #6 [ffff81201a7b7d70] __generic_file_aio_read at ffffffff8000c753
 #7 [ffff81201a7b7de0] generic_file_aio_read at ffffffff80016ed0
 #8 [ffff81201a7b7e00] do_sync_read at ffffffff8000cfa2
 #9 [ffff81201a7b7f10] vfs_read at ffffffff8000b78d
#10 [ffff81201a7b7f40] sys_read at ffffffff80011d2f
#11 [ffff81201a7b7f80] tracesys at ffffffff8005d28d (via system_call)
    RIP: 00000034d9a0d9b0  RSP: 00007fff20f8c808  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
    RDX: 0000000000001000  RSI: 00000000208f5c80  RDI: 0000000000000003
    RBP: 00000000208f5c80   R8: 000000002085aa50   R9: 000000002044ac40
    R10: 000000002090ac18  R11: 0000000000000246  R12: 000000001fd0e010
    R13: 0000000000000003  R14: 0000000000000000  R15: 0000000020816800
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b
  • fd is 3.
crash> files ffff811d64897820
PID: 26106  TASK: ffff811d64897820  CPU: 3   COMMAND: "perl"
ROOT: /    CWD: /root
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff811f1439c680 ffff811fa0760c48 ffff811e3c8ae280 PIPE 
  1 ffff811f563e0680 ffff811fa07603d8 ffff812010e3b050 PIPE 
  2 ffff811f563e0680 ffff811fa07603d8 ffff812010e3b050 PIPE 
  3 ffff811dd6dbf2c0 ffff811e228e4228 ffff811ed98d7898 REG  /var/opt/SCpacct/SCpacct.201308141756.gpd-6f9-be50
  4 ffff811e344f6380 ffff8118d1646a98 ffff811e3c88c110 SOCK socket:/[516172]
crash> inode ffff811ed98d7898 | grep sb
  i_sb_list = {
  i_sb = 0xffff81203e103000, 
crash> super_block 0xffff81203e103000 | grep frozen
  s_frozen = 0x2, 
  s_wait_unfrozen = {
  • I/O to / was frozen, most likely by fsfreeze(8) or some other userspace tool. This is not a kernel problem. Whatever froze the I/O, should unfreeze it in short period of time.
  • Super block in question was intentionally frozen. So any subsequent I/O to the fs would have to queue, until a "thaw" operation takes place.
  • Check if system is taking part in any backup related activities (in particular snapshots) and filesystem was intentionally frozen before that.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.