RHEL6 / MRG2.x: NFS client kernel panic in rpciod, kernel BUG at net/sunrpc/sched.c:616, RIP: __rpc_execute + 0x278
Issue
- NFS client kernel crash because async task already queued, rpciod hits
BUG_ON(RPC_IS_QUEUED(task));in__rpc_execute
kernel BUG at net/sunrpc/sched.c:616!
invalid opcode: 0000 [#1] SMP
...
Pid: 2256, comm: rpciod/8 Not tainted 2.6.32-220.el6.x86_64 #1 HP ProLiant SL250s Gen8/
RIP: 0010:[<ffffffffa01fe458>] [<ffffffffa01fe458>] __rpc_execute+0x278/0x2a0 [sunrpc]
...
Process rpciod/8 (pid: 2256, threadinfo ffff882016152000, task ffff8820162e80c0)
...
Call Trace:
[<ffffffffa01fe4d0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[<ffffffffa01fe4e5>] rpc_async_schedule+0x15/0x20 [sunrpc]
[<ffffffff8108b2b0>] worker_thread+0x170/0x2a0
[<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
Code: db df 2e e1 f6 05 e0 26 02 00 40 0f 84 48 fe ff ff 0f b7 b3 d4 00 00 00 48 c7
c7 94 39 21 a0 31 c0 e8 b9 df 2e e1 e9 2e fe ff ff <0f> 0b eb fe 0f b7 b7 d4 00 00 00
31 c0 48 c7 c7 60 63 21 a0 e8
RIP [<ffffffffa01fe458>] __rpc_execute+0x278/0x2a0 [sunrpc]
- A second panic is similar in that rpciod thread panics, but at a different place, hitting
kernel BUG at kernel/workqueue.cwhich is the followingBUG_ON(get_wq_data(work) != cwq);. Also, prior to the oops, we see some warnings about list corruption, triggered from a__list_addcalled fromxprt_reserve_xprt. Based on the location in the code, the list corruption is being flagged on the rpc_xprt's 'sending' or 'resend' queue
------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x8f/0xa0() (Not tainted)
...
list_add corruption. prev->next should be next (ffff88201900e998), but was ffffe8efec8232c1. (prev=ffff881bc0b5c
150).
...
Pid: 16460, comm: 10.2.8.2-m Not tainted 2.6.32-220.el6.x86_64 #1
Call Trace:
[<ffffffff81069b77>] ? warn_slowpath_common+0x87/0xc0
[<ffffffff81069c66>] ? warn_slowpath_fmt+0x46/0x50
[<ffffffff8127b86f>] ? __list_add+0x8f/0xa0
[<ffffffffa01fe7db>] ? rpc_sleep_on+0x10b/0x2f0 [sunrpc]
[<ffffffffa01f8cf3>] ? xprt_reserve_xprt+0x83/0x120 [sunrpc]
[<ffffffffa01f8173>] ? xprt_prepare_transmit+0x63/0xb0 [sunrpc]
[<ffffffffa01f5ab7>] ? call_transmit+0x47/0x2c0 [sunrpc]
[<ffffffffa01fe23e>] ? __rpc_execute+0x5e/0x2a0 [sunrpc]
[<ffffffffa01fe4c3>] ? rpc_execute+0x43/0x50 [sunrpc]
[<ffffffffa01f6cc5>] ? rpc_run_task+0x75/0x90 [sunrpc]
[<ffffffffa01f6de2>] ? rpc_call_sync+0x42/0x70 [sunrpc]
[<ffffffffa0298d2d>] ? nfs4_proc_renew+0x4d/0xa0 [nfs]
[<ffffffffa02a935e>] ? nfs4_run_state_manager+0x3fe/0x5e0 [nfs]
[<ffffffffa02a8f60>] ? nfs4_run_state_manager+0x0/0x5e0 [nfs]
[<ffffffff81090886>] ? kthread+0x96/0xa0
[<ffffffff8100c14a>] ? child_rip+0xa/0x20
[<ffffffff810907f0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
---[ end------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:287!
...
Pid: 2338, comm: rpciod/1 Tainted: G W ---------------- 2.6.32-220.el6.x86_64 #1 HP ProLiant SL250s Ge
n8/
RIP: 0010:[<ffffffff8108b38d>] [<ffffffff8108b38d>] worker_thread+0x24d/0x2a0
...
Process rpciod/1 (pid: 2338, threadinfo ffff8810041f4000, task ffff8810182fa080)
...
Call Trace:
[<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff810907f0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 48 89 95 70 ff ff ff 4c 89 e7 ff d1 48 8b 85 68 ff ff ff 48 8b 95 70 ff ff ff 48 83 c0 08 48 8b 08 48 85 c
9 75 d0 e9 df fe ff ff <0f> 0b eb fe 48 8b 45 80 48 8b b5 78 ff ff ff 48 c7 c7 58 0e 79
RIP [<ffffffff8108b38d>] worker_thread+0x24d/0x2a0
RSP <ffff8810041f5e40>
- Here is another trace where the about list corruption warning is triggered from a
list_delcalled fromrpc_wake_up_task_queue_locked
------------[ cut here ]------------
WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Tainted: G W --------------- )
...
list_del corruption. prev->next should be ffff8807aaf49150, but was ffff880762d71718
...
Pid: 2305, comm: rpciod/6 Tainted: G W --------------- 2.6.32-279.el6.x86_64 #1
Call Trace:
[<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0
[<ffffffff8106b836>] ? warn_slowpath_fmt+0x46/0x50
[<ffffffff81282f5e>] ? list_del+0x6e/0xa0
[<ffffffffa0247b42>] ? rpc_wake_up_task_queue_locked+0x172/0x270 [sunrpc]
[<ffffffffa023c480>] ? call_reserve+0x0/0x60 [sunrpc]
[<ffffffffa02482cb>] ? rpc_wake_up_next+0xfb/0x1e0 [sunrpc]
[<ffffffffa023f78f>] ? __xprt_lock_write_next_cong+0x4f/0x130 [sunrpc]
[<ffffffffa023fa05>] ? xprt_release_xprt_cong+0x35/0x40 [sunrpc]
[<ffffffffa023f5ab>] ? xprt_release_write+0x3b/0x60 [sunrpc]
[<ffffffffa023ff47>] ? xprt_reserve+0x1c7/0x370 [sunrpc]
[<ffffffffa023c480>] ? call_reserve+0x0/0x60 [sunrpc]
[<ffffffffa023c4b4>] ? call_reserve+0x34/0x60 [sunrpc]
[<ffffffffa0247e37>] ? __rpc_execute+0x77/0x350 [sunrpc]
[<ffffffffa02481b0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
[<ffffffffa02481c5>] ? rpc_async_schedule+0x15/0x20 [sunrpc]
[<ffffffff8108c760>] ? worker_thread+0x170/0x2a0
[<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108c5f0>] ? worker_thread+0x0/0x2a0
[<ffffffff81091d66>] ? kthread+0x96/0xa0
[<ffffffff8100c14a>] ? child_rip+0xa/0x20
[<ffffffff81091cd0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
---[ end trace baaa6021e99dd240 ]---
------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x8f/0xa0() (Tainted: G W --------------- )
...
Pid: 2305, comm: rpciod/6 Tainted: G W --------------- 2.6.32-279.el6.x86_64 #1 Dell Inc.
...
RIP: 0010:[<ffffffffa02481b4>] [<ffffffffa02481b4>] rpc_async_schedule+0x4/0x20 [sunrpc]
...
Process rpciod/6 (pid: 2305, threadinfo ffff88081b0a0000, task ffff8808196fb540)
...
Call Trace:
[<ffffffff8108c760>] ? worker_thread+0x170/0x2a0
[<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108c5f0>] ? worker_thread+0x0/0x2a0
[<ffffffff81091d66>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff81091cd0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 74 b1 49 8b 45 00 49 83 c5 08 31 d2 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 e9 eb 94 0f 1f 84 00 00 00 00 00 b0 81 24 a0 <ff> ff ff ff b0 81 24 a0 ff ff ff ff e8 fb fb ff ff c9 c3 66 0f
RIP [<ffffffffa02481b4>] rpc_async_schedule+0x4/0x20 [sunrpc]
RSP <ffff88081b0a1e38>
- kernel 2.6.32-358.6.2.el6.x86_64 crashed due to kernel BUG at net/sunrpc/sched.c:695!
Environment
- Red Hat Enterprise Linux 6
- kernel prior to kernel-2.6.32-358.14.1.el6
- Seen on at least 2.6.32-220.el6, 2.6.32-279.9.1.el6, and 2.6.32-358.6.1.el6
- As well as on kernel - 2.6.32-279.14.1.el6.
- MRG 2.x
- Seen on kernel 2.6.33.9-rt31.66.el6rt
- NFS Client
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
