RHEL6 / MRG2.x: NFS client kernel panic in rpciod, kernel BUG at net/sunrpc/sched.c:616, RIP: __rpc_execute + 0x278
Issue
- NFS client kernel crash because async task already queued, rpciod hits
BUG_ON(RPC_IS_QUEUED(task));
in__rpc_execute
kernel BUG at net/sunrpc/sched.c:616!
invalid opcode: 0000 [#1] SMP
...
Pid: 2256, comm: rpciod/8 Not tainted 2.6.32-220.el6.x86_64 #1 HP ProLiant SL250s Gen8/
RIP: 0010:[<ffffffffa01fe458>] [<ffffffffa01fe458>] __rpc_execute+0x278/0x2a0 [sunrpc]
...
Process rpciod/8 (pid: 2256, threadinfo ffff882016152000, task ffff8820162e80c0)
...
Call Trace:
[<ffffffffa01fe4d0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[<ffffffffa01fe4e5>] rpc_async_schedule+0x15/0x20 [sunrpc]
[<ffffffff8108b2b0>] worker_thread+0x170/0x2a0
[<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
Code: db df 2e e1 f6 05 e0 26 02 00 40 0f 84 48 fe ff ff 0f b7 b3 d4 00 00 00 48 c7
c7 94 39 21 a0 31 c0 e8 b9 df 2e e1 e9 2e fe ff ff <0f> 0b eb fe 0f b7 b7 d4 00 00 00
31 c0 48 c7 c7 60 63 21 a0 e8
RIP [<ffffffffa01fe458>] __rpc_execute+0x278/0x2a0 [sunrpc]
- A second panic is similar in that rpciod thread panics, but at a different place, hitting
kernel BUG at kernel/workqueue.c
which is the followingBUG_ON(get_wq_data(work) != cwq);
. Also, prior to the oops, we see some warnings about list corruption, triggered from a__list_add
called fromxprt_reserve_xprt
. Based on the location in the code, the list corruption is being flagged on the rpc_xprt's 'sending' or 'resend' queue
------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x8f/0xa0() (Not tainted)
...
list_add corruption. prev->next should be next (ffff88201900e998), but was ffffe8efec8232c1. (prev=ffff881bc0b5c
150).
...
Pid: 16460, comm: 10.2.8.2-m Not tainted 2.6.32-220.el6.x86_64 #1
Call Trace:
[<ffffffff81069b77>] ? warn_slowpath_common+0x87/0xc0
[<ffffffff81069c66>] ? warn_slowpath_fmt+0x46/0x50
[<ffffffff8127b86f>] ? __list_add+0x8f/0xa0
[<ffffffffa01fe7db>] ? rpc_sleep_on+0x10b/0x2f0 [sunrpc]
[<ffffffffa01f8cf3>] ? xprt_reserve_xprt+0x83/0x120 [sunrpc]
[<ffffffffa01f8173>] ? xprt_prepare_transmit+0x63/0xb0 [sunrpc]
[<ffffffffa01f5ab7>] ? call_transmit+0x47/0x2c0 [sunrpc]
[<ffffffffa01fe23e>] ? __rpc_execute+0x5e/0x2a0 [sunrpc]
[<ffffffffa01fe4c3>] ? rpc_execute+0x43/0x50 [sunrpc]
[<ffffffffa01f6cc5>] ? rpc_run_task+0x75/0x90 [sunrpc]
[<ffffffffa01f6de2>] ? rpc_call_sync+0x42/0x70 [sunrpc]
[<ffffffffa0298d2d>] ? nfs4_proc_renew+0x4d/0xa0 [nfs]
[<ffffffffa02a935e>] ? nfs4_run_state_manager+0x3fe/0x5e0 [nfs]
[<ffffffffa02a8f60>] ? nfs4_run_state_manager+0x0/0x5e0 [nfs]
[<ffffffff81090886>] ? kthread+0x96/0xa0
[<ffffffff8100c14a>] ? child_rip+0xa/0x20
[<ffffffff810907f0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
---[ end------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:287!
...
Pid: 2338, comm: rpciod/1 Tainted: G W ---------------- 2.6.32-220.el6.x86_64 #1 HP ProLiant SL250s Ge
n8/
RIP: 0010:[<ffffffff8108b38d>] [<ffffffff8108b38d>] worker_thread+0x24d/0x2a0
...
Process rpciod/1 (pid: 2338, threadinfo ffff8810041f4000, task ffff8810182fa080)
...
Call Trace:
[<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff810907f0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 48 89 95 70 ff ff ff 4c 89 e7 ff d1 48 8b 85 68 ff ff ff 48 8b 95 70 ff ff ff 48 83 c0 08 48 8b 08 48 85 c
9 75 d0 e9 df fe ff ff <0f> 0b eb fe 48 8b 45 80 48 8b b5 78 ff ff ff 48 c7 c7 58 0e 79
RIP [<ffffffff8108b38d>] worker_thread+0x24d/0x2a0
RSP <ffff8810041f5e40>
- Here is another trace where the about list corruption warning is triggered from a
list_del
called fromrpc_wake_up_task_queue_locked
------------[ cut here ]------------
WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Tainted: G W --------------- )
...
list_del corruption. prev->next should be ffff8807aaf49150, but was ffff880762d71718
...
Pid: 2305, comm: rpciod/6 Tainted: G W --------------- 2.6.32-279.el6.x86_64 #1
Call Trace:
[<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0
[<ffffffff8106b836>] ? warn_slowpath_fmt+0x46/0x50
[<ffffffff81282f5e>] ? list_del+0x6e/0xa0
[<ffffffffa0247b42>] ? rpc_wake_up_task_queue_locked+0x172/0x270 [sunrpc]
[<ffffffffa023c480>] ? call_reserve+0x0/0x60 [sunrpc]
[<ffffffffa02482cb>] ? rpc_wake_up_next+0xfb/0x1e0 [sunrpc]
[<ffffffffa023f78f>] ? __xprt_lock_write_next_cong+0x4f/0x130 [sunrpc]
[<ffffffffa023fa05>] ? xprt_release_xprt_cong+0x35/0x40 [sunrpc]
[<ffffffffa023f5ab>] ? xprt_release_write+0x3b/0x60 [sunrpc]
[<ffffffffa023ff47>] ? xprt_reserve+0x1c7/0x370 [sunrpc]
[<ffffffffa023c480>] ? call_reserve+0x0/0x60 [sunrpc]
[<ffffffffa023c4b4>] ? call_reserve+0x34/0x60 [sunrpc]
[<ffffffffa0247e37>] ? __rpc_execute+0x77/0x350 [sunrpc]
[<ffffffffa02481b0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[<ffffffffa02481c5>] ? rpc_async_schedule+0x15/0x20 [sunrpc]
[<ffffffff8108c760>] ? worker_thread+0x170/0x2a0
[<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108c5f0>] ? worker_thread+0x0/0x2a0
[<ffffffff81091d66>] ? kthread+0x96/0xa0
[<ffffffff8100c14a>] ? child_rip+0xa/0x20
[<ffffffff81091cd0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
---[ end trace baaa6021e99dd240 ]---
------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x8f/0xa0() (Tainted: G W --------------- )
...
Pid: 2305, comm: rpciod/6 Tainted: G W --------------- 2.6.32-279.el6.x86_64 #1 Dell Inc.
...
RIP: 0010:[<ffffffffa02481b4>] [<ffffffffa02481b4>] rpc_async_schedule+0x4/0x20 [sunrpc]
...
Process rpciod/6 (pid: 2305, threadinfo ffff88081b0a0000, task ffff8808196fb540)
...
Call Trace:
[<ffffffff8108c760>] ? worker_thread+0x170/0x2a0
[<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108c5f0>] ? worker_thread+0x0/0x2a0
[<ffffffff81091d66>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff81091cd0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 74 b1 49 8b 45 00 49 83 c5 08 31 d2 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 e9 eb 94 0f 1f 84 00 00 00 00 00 b0 81 24 a0 <ff> ff ff ff b0 81 24 a0 ff ff ff ff e8 fb fb ff ff c9 c3 66 0f
RIP [<ffffffffa02481b4>] rpc_async_schedule+0x4/0x20 [sunrpc]
RSP <ffff88081b0a1e38>
- kernel 2.6.32-358.6.2.el6.x86_64 crashed due to kernel BUG at net/sunrpc/sched.c:695!
Environment
- Red Hat Enterprise Linux 6
- kernel prior to kernel-2.6.32-358.14.1.el6
- Seen on at least 2.6.32-220.el6, 2.6.32-279.9.1.el6, and 2.6.32-358.6.1.el6
- As well as on kernel - 2.6.32-279.14.1.el6.
- MRG 2.x
- Seen on kernel 2.6.33.9-rt31.66.el6rt
- NFS Client
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.