RHEL6.2: NFS stops responding for up to 20 seconds and then keeps going when copying large files.
Issue
- Customer encountering issue on RHEL6.2 (kernel version 2.6.32-220.23.1.el6.x86_64) with NFS stopping responding for up to 20 seconds and then keeps going when copying large files.
- During this time the network responds to pings but no NFS traffic is seen.
- This problem happened with bonded interfaces using tg3 driver only, bonded with one of tg3 and e1000e, and a standalone e1000e interface (no bonding).
- They usually see it happen once or twice a day. To try and provoke the issue faster they run the following command in a loop (I'm not sure if they've supplied the exact command or if that's just an illustration of the command used to try and trigger the issue):
cp -f /NFS/largefile /NFS/largefile.1
sleep 60
- Backtrace is as follows:
INFO: task cp:24962 blocked for more than 10 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cp D 0000000000000002 0 24962 2659 0x00000080
ffff88081fc4dc78 0000000000000082 0000000000000000 ffff8800366128a8
ffff88081fc4dbe8 ffffffff81012b59 ffff88081fc4dc28 ffffffff8109b949
ffff8808328805f8 ffff88081fc4dfd8 000000000000f4e8 ffff8808328805f8
Call Trace:
[<ffffffff81012b59>] ? read_tsc+0x9/0x20
[<ffffffff8109b949>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff81110d60>] ? sync_page+0x0/0x50
[<ffffffff814ed9e3>] io_schedule+0x73/0xc0
[<ffffffff81110d9d>] sync_page+0x3d/0x50
[<ffffffff814ee39f>] __wait_on_bit+0x5f/0x90
[<ffffffff81110f53>] wait_on_page_bit+0x73/0x80
[<ffffffff81090d70>] ? wake_bit_function+0x0/0x50
[<ffffffff811273f5>] ? pagevec_lookup_tag+0x25/0x40
[<ffffffff8111136b>] wait_on_page_writeback_range+0xfb/0x190
[<ffffffff81111538>] filemap_write_and_wait_range+0x78/0x90
[<ffffffff811a589e>] vfs_fsync_range+0x7e/0xe0
[<ffffffff811a596d>] vfs_fsync+0x1d/0x20
[<ffffffffa02f96d0>] nfs_file_flush+0x70/0xa0 [nfs]
[<ffffffff81173e4c>] filp_close+0x3c/0x90
[<ffffffff81173f45>] sys_close+0xa5/0x100
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 177, comm: khungtaskd Tainted: G W ---------------- 2.6.32-220.23.1.el6.x86_64 #1
Call Trace:
[<ffffffff814ecb34>] ? panic+0x78/0x143
[<ffffffff810d8bd7>] ? watchdog+0x217/0x220
[<ffffffff810d89c0>] ? watchdog+0x0/0x220
[<ffffffff810909c6>] ? kthread+0x96/0xa0
[<ffffffff8100c14a>] ? child_rip+0xa/0x20
[<ffffffff81090930>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
Environment
- Red Hat Enterprise Linux 6.2
- kernels from 6.2 (2.6.32-220.*el6); seen on 2.6.32-220.23.1.el6
- NFS client
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.