Server went hung state
Environment
- Red Hat Enterprise 5.6
Issue
- Server goes on hung state and not able open putty session.
Resolution
Check the load on the host system.
Root Cause
What's happened here is that a process - in this case a filesystem journal thread - was delayed for more than 2 minutes while it waited for I/O to complete which caused the hung task watchdog to trigger. Since the hung_task_panic setting was enabled the system dumped core. The reason the process was delayed was because the device /dev/sda has a full queue and is very slow to complete I/O requests. There is also evidence that shows the guest is running low on available memory and is swapping to disk. The slow I/O is unlikely to be caused by the guest but rather a busy host system is starving the guest of throughput.
Diagnostic Steps
Looking at the vmcore:
KERNEL: usr/lib/debug/lib/modules/2.6.18-238.el5PAE/vmlinux
DUMPFILE: vmcore
CPUS: 4
DATE: Mon Feb 3 20:05:32 2014
UPTIME: 2 days, 18:13:05
LOAD AVERAGE: 6.74, 5.23, 3.42
TASKS: 193
NODENAME: <hostname>
RELEASE: 2.6.18-238.el5PAE
VERSION: #1 SMP Sun Dec 19 14:42:44 EST 2010
MACHINE: i686 (2666 Mhz)
MEMORY: 9 GB
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
INFO: task kjournald:654 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D 0000D823 2860 654 15 680 618 (L-TLB)
f7db9ed4 00000046 855989f2 0000d823 f1bc4e94 00000000 053de60d 0000000a
f7888550 85599e7a 0000d823 00001488 00000001 f788865c cd819744 f16d3e40
1adf714e f788ae84 00000000 f788ae9c 1adf714e c042d4db f788ae7c 1adf714e
Call Trace:
[<c042d4db>] getnstimeofday+0x30/0xb6
[<c061ede6>] io_schedule+0x36/0x59
[<c04786f5>] sync_buffer+0x30/0x33
[<c061efbd>] __wait_on_bit+0x33/0x58
[<c04786c5>] sync_buffer+0x0/0x33
[<c04786c5>] sync_buffer+0x0/0x33
[<c061f044>] out_of_line_wait_on_bit+0x62/0x6a
[<c0436fec>] wake_bit_function+0x0/0x3c
[<c0478672>] __wait_on_buffer+0x1c/0x1f
[<f8866422>] journal_commit_transaction+0x4be/0xefc [jbd]
[<c042e32b>] lock_timer_base+0x15/0x2f
[<c042e3aa>] try_to_del_timer_sync+0x65/0x6c
[<f8869c21>] kjournald+0xa1/0x1c2 [jbd]
[<c0436fbf>] autoremove_wake_function+0x0/0x2d
[<f8869b80>] kjournald+0x0/0x1c2 [jbd]
[<c0436efb>] kthread+0xc0/0xed
[<c0436e3b>] kthread+0x0/0xed
[<c0405c87>] kernel_thread_helper+0x7/0x10
=======================
Kernel panic - not syncing: hung_task: blocked tasks
crash> set 654
PID: 654
COMMAND: "kjournald"
TASK: f7888550 [THREAD_INFO: f7db9000]
CPU: 1
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 654 TASK: f7888550 CPU: 1 COMMAND: "kjournald"
#0 [f7db9e60] schedule at c061e742
#1 [f7db9ed8] io_schedule at c061ede1
#2 [f7db9ee4] sync_buffer at c04786f0
#3 [f7db9ee8] __wait_on_bit at c061efba
#4 [f7db9f00] out_of_line_wait_on_bit at c061f03f
#5 [f7db9f38] __wait_on_buffer at c047866d
#6 [f7db9f44] journal_commit_transaction at f886641d [jbd]
#7 [f7db9fa8] kjournald at f8869c1c [jbd]
#8 [f7db9fcc] kthread at c0436ef9
#9 [f7db9fe4] kernel_thread_helper at c0405c85
crash> task -R last_ran
PID: 654 TASK: f7888550 CPU: 1 COMMAND: "kjournald"
last_ran = 237647072697978,
crash> runq | grep "CPU 1"
CPU 1 RUNQUEUE: cd819744
crash> rq.timestamp_last_tick cd819744
timestamp_last_tick = 237773097130245
crash> pd (237773097130245-237647072697978)
$2 = 126024432267
crash> pd (237773097130245-237647072697978)/1000000000
$3 = 126
Stalled for 126 seconds, where?
/*
* Wait for all previously submitted IO to complete.
*/
spin_lock(&journal->j_list_lock);
while (commit_transaction->t_locked_list) {
struct buffer_head *bh;
jh = commit_transaction->t_locked_list->b_tprev;
bh = jh2bh(jh);
get_bh(bh);
if (buffer_locked(bh)) {
spin_unlock(&journal->j_list_lock);
wait_on_buffer(bh); <----
It's waiting for I/O to complete. It's flushing the dirty data on a file before it can commit the transaction that involves that inode.
Checkout what the storage is doing:
crash> whatis journal_commit_transaction
void journal_commit_transaction(journal_t *);
crash> dis -rl f8869c1c
...
/usr/src/debug/kernel-2.6.18/linux-2.6.18.i686/fs/jbd/journal.c: 151
0xf8869c1a <kjournald+154>: mov %esi,%eax
0xf8869c1c <kjournald+156>: call 0xf8865f64 <journal_commit_transaction>
crash> dis -rl f886641d
/usr/src/debug/kernel-2.6.18/linux-2.6.18.i686/fs/jbd/commit.c: 303
0xf8865f64 <journal_commit_transaction>: push %ebp
0xf8865f65 <journal_commit_transaction+1>: mov %eax,%ebp
0xf8865f67 <journal_commit_transaction+3>: push %edi
0xf8865f68 <journal_commit_transaction+4>: push %esi
...
crash> bt -f
PID: 654 TASK: f7888550 CPU: 1 COMMAND: "kjournald"
...
#6 [f7db9f44] journal_commit_transaction at f886641d [jbd]
[RA: f8869c21 SP: f7db9f48 FP: f7db9fa8 SIZE: 100]
f7db9f48: eedb21c0 00000000 cdb66000 00000000
f7db9f58: 00000000 dfda8034 00000fcc 00000000
f7db9f68: cdb66000 00000005 f7ca2650 00000000
f7db9f78: 00000286 c042e32b ffffffff f7ca26ec
f7db9f88: 00000286 c042e3aa 00000286 f7ca26ec
f7db9f98: f7ca2614 f7ca2600 f7db9fc0 00000000
f7db9fa8: f8869c21
#7 [f7db9fa8] kjournald at f8869c1c [jbd]
...
crash> journal_t.j_dev,j_fs_dev f7ca2600
j_dev = 0xcd90da40
j_fs_dev = 0xcd90da40
crash> block_device.bd_disk 0xcd90da40
bd_disk = 0xcdbd1780
crash> gendisk.disk_name,queue,in_flight 0xcdbd1780
disk_name = "sda\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
queue = 0xcdbd9be0
in_flight = 71
crash> request_queue.rq,queuedata,queue_flags,nr_sorted,in_flight 0xcdbd9be0
rq = {
count = {2, 69},
starved = {0, 0},
elvpriv = 71,
...
queuedata = 0xcdbd8c00
queue_flags = 32769
nr_sorted = 37
in_flight = 32
crash> dev -d
MAJOR GENDISK NAME REQUEST_QUEUE TOTAL READ WRITE DRV
8 cdbd1780 sda cdbd9be0 71 2 69 32
2 cda93ac0 fd0 cdbd90ac 0 0 0 0
3 cdaaa3c0 hda cda24ba0 0 0 0 0
9 cdaaa7c0 md0 cdbd9824 0 0 0 0
253 f1bc67c0 dm-0 f16ecb60 0 0 0 0
Looks like slow or stalled storage, will need to check the timestamp on the oldest inflight I/O.
Check the timestamps when the current inflight I/Os were issued:
crash> scsi_device.cmd_list,vendor 0xcdbd8c00
cmd_list = {
next = 0xf2cfb6c4,
prev = 0xcdbd36c4
}
vendor = 0xcdbcdbc8 "VMware Virtual disk 1.0 "
crash> list -H cdbd8c20 -o scsi_cmnd.list -s scsi_cmnd.jiffies_at_alloc,request
f2cfb6c0 // oldest command
jiffies_at_alloc = 238022696
request = 0xe52131d0
...
cdbd36c0 // newest command
jiffies_at_alloc = 238085483
request = 0xf1bc4328
crash> jiffies
jiffies = $16 = 238085642
crash> pd (238085642-238022696)/1000
$17 = 62
crash> pd (238085642-238085483)/1000
$18 = 0
Oldest command in the queue was issued 62 seconds ago, newest was just issued. 62 seconds is a long time for a queue depth of 32 - it implies the device is only satisfying 2 I/Os per second.
crash> request.start_time 0xe52131d0
start_time = 237848662
crash> request.start_time 0xf1bc4328
start_time = 237848876
crash> pd (238085642-237848662)/1000
$19 = 236
crash> pd (238085642-237848876)/1000
$20 = 236
All requests for inflight I/O were created 236 seconds ago. This means the requests were stalled for 3-4 minutes before finally being handed off to the device driver.
crash> request.bio 0xe52131d0
bio = 0xed15bec0
crash> bio.bi_end_io 0xed15bec0
bi_end_io = 0xc046b153 <end_swap_bio_write>
crash> request.bio 0xf1bc4328
bio = 0xe1e51c40
crash> bio.bi_end_io 0xe1e51c40
bi_end_io = 0xc046b153 <end_swap_bio_write>
Oldest and newest commands are both paging to the swap device so free memory is low on this system.
This is a virtual machine sending commands to a virtual storage device (probably a file-backed disk) and a high load on the host system may be starving this guest.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
