RHEL6 / MRG2.x: NFS client kernel panic in rpciod, kernel BUG at net/sunrpc/sched.c:616, RIP: __rpc_execute + 0x278

Solution Verified - Updated -

Issue

  • NFS client kernel crash because async task already queued, rpciod hits BUG_ON(RPC_IS_QUEUED(task)); in __rpc_execute
kernel BUG at net/sunrpc/sched.c:616!
invalid opcode: 0000 [#1] SMP 
...
Pid: 2256, comm: rpciod/8 Not tainted 2.6.32-220.el6.x86_64 #1 HP ProLiant SL250s Gen8/
RIP: 0010:[<ffffffffa01fe458>]  [<ffffffffa01fe458>] __rpc_execute+0x278/0x2a0 [sunrpc]
...
Process rpciod/8 (pid: 2256, threadinfo ffff882016152000, task ffff8820162e80c0)
...
Call Trace:
 [<ffffffffa01fe4d0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
 [<ffffffffa01fe4e5>] rpc_async_schedule+0x15/0x20 [sunrpc]
 [<ffffffff8108b2b0>] worker_thread+0x170/0x2a0
 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
 [<ffffffff81090886>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
Code: db df 2e e1 f6 05 e0 26 02 00 40 0f 84 48 fe ff ff 0f b7 b3 d4 00 00 00 48 c7 
c7 94 39 21 a0 31 c0 e8 b9 df 2e e1 e9 2e fe ff ff <0f> 0b eb fe 0f b7 b7 d4 00 00 00 
31 c0 48 c7 c7 60 63 21 a0 e8 
RIP  [<ffffffffa01fe458>] __rpc_execute+0x278/0x2a0 [sunrpc]
  • A second panic is similar in that rpciod thread panics, but at a different place, hitting kernel BUG at kernel/workqueue.c which is the following BUG_ON(get_wq_data(work) != cwq);. Also, prior to the oops, we see some warnings about list corruption, triggered from a __list_add called from xprt_reserve_xprt. Based on the location in the code, the list corruption is being flagged on the rpc_xprt's 'sending' or 'resend' queue
------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x8f/0xa0() (Not tainted)
...
list_add corruption. prev->next should be next (ffff88201900e998), but was ffffe8efec8232c1. (prev=ffff881bc0b5c
150).
...
Pid: 16460, comm: 10.2.8.2-m Not tainted 2.6.32-220.el6.x86_64 #1
Call Trace:
 [<ffffffff81069b77>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff81069c66>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff8127b86f>] ? __list_add+0x8f/0xa0
 [<ffffffffa01fe7db>] ? rpc_sleep_on+0x10b/0x2f0 [sunrpc]
 [<ffffffffa01f8cf3>] ? xprt_reserve_xprt+0x83/0x120 [sunrpc]
 [<ffffffffa01f8173>] ? xprt_prepare_transmit+0x63/0xb0 [sunrpc]
 [<ffffffffa01f5ab7>] ? call_transmit+0x47/0x2c0 [sunrpc]
 [<ffffffffa01fe23e>] ? __rpc_execute+0x5e/0x2a0 [sunrpc]
 [<ffffffffa01fe4c3>] ? rpc_execute+0x43/0x50 [sunrpc]
 [<ffffffffa01f6cc5>] ? rpc_run_task+0x75/0x90 [sunrpc]
 [<ffffffffa01f6de2>] ? rpc_call_sync+0x42/0x70 [sunrpc]
 [<ffffffffa0298d2d>] ? nfs4_proc_renew+0x4d/0xa0 [nfs]
 [<ffffffffa02a935e>] ? nfs4_run_state_manager+0x3fe/0x5e0 [nfs]
 [<ffffffffa02a8f60>] ? nfs4_run_state_manager+0x0/0x5e0 [nfs]
 [<ffffffff81090886>] ? kthread+0x96/0xa0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff810907f0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
---[ end------------[ cut here ]------------

kernel BUG at kernel/workqueue.c:287!
...
Pid: 2338, comm: rpciod/1 Tainted: G        W  ----------------   2.6.32-220.el6.x86_64 #1 HP ProLiant SL250s Ge
n8/
RIP: 0010:[<ffffffff8108b38d>]  [<ffffffff8108b38d>] worker_thread+0x24d/0x2a0
...
Process rpciod/1 (pid: 2338, threadinfo ffff8810041f4000, task ffff8810182fa080)
...
Call Trace:
 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
 [<ffffffff81090886>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff810907f0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 48 89 95 70 ff ff ff 4c 89 e7 ff d1 48 8b 85 68 ff ff ff 48 8b 95 70 ff ff ff 48 83 c0 08 48 8b 08 48 85 c
9 75 d0 e9 df fe ff ff <0f> 0b eb fe 48 8b 45 80 48 8b b5 78 ff ff ff 48 c7 c7 58 0e 79 
RIP  [<ffffffff8108b38d>] worker_thread+0x24d/0x2a0
 RSP <ffff8810041f5e40>
  • Here is another trace where the about list corruption warning is triggered from a list_del called from rpc_wake_up_task_queue_locked
------------[ cut here ]------------
WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Tainted: G        W  ---------------   )
...
list_del corruption. prev->next should be ffff8807aaf49150, but was ffff880762d71718
...
Pid: 2305, comm: rpciod/6 Tainted: G        W  ---------------    2.6.32-279.el6.x86_64 #1
Call Trace:
 [<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106b836>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff81282f5e>] ? list_del+0x6e/0xa0
 [<ffffffffa0247b42>] ? rpc_wake_up_task_queue_locked+0x172/0x270 [sunrpc]
 [<ffffffffa023c480>] ? call_reserve+0x0/0x60 [sunrpc]
 [<ffffffffa02482cb>] ? rpc_wake_up_next+0xfb/0x1e0 [sunrpc]
 [<ffffffffa023f78f>] ? __xprt_lock_write_next_cong+0x4f/0x130 [sunrpc]
 [<ffffffffa023fa05>] ? xprt_release_xprt_cong+0x35/0x40 [sunrpc]
 [<ffffffffa023f5ab>] ? xprt_release_write+0x3b/0x60 [sunrpc]
 [<ffffffffa023ff47>] ? xprt_reserve+0x1c7/0x370 [sunrpc]
 [<ffffffffa023c480>] ? call_reserve+0x0/0x60 [sunrpc]
 [<ffffffffa023c4b4>] ? call_reserve+0x34/0x60 [sunrpc]
 [<ffffffffa0247e37>] ? __rpc_execute+0x77/0x350 [sunrpc]
 [<ffffffffa02481b0>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
 [<ffffffffa02481c5>] ? rpc_async_schedule+0x15/0x20 [sunrpc]
 [<ffffffff8108c760>] ? worker_thread+0x170/0x2a0
 [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108c5f0>] ? worker_thread+0x0/0x2a0
 [<ffffffff81091d66>] ? kthread+0x96/0xa0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81091cd0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
---[ end trace baaa6021e99dd240 ]---
------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x8f/0xa0() (Tainted: G        W  ---------------   )
...
Pid: 2305, comm: rpciod/6 Tainted: G        W  ---------------    2.6.32-279.el6.x86_64 #1 Dell Inc.
...
RIP: 0010:[<ffffffffa02481b4>]  [<ffffffffa02481b4>] rpc_async_schedule+0x4/0x20 [sunrpc]
...
Process rpciod/6 (pid: 2305, threadinfo ffff88081b0a0000, task ffff8808196fb540)
...
Call Trace:
 [<ffffffff8108c760>] ? worker_thread+0x170/0x2a0
 [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108c5f0>] ? worker_thread+0x0/0x2a0
 [<ffffffff81091d66>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81091cd0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 74 b1 49 8b 45 00 49 83 c5 08 31 d2 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 e9 eb 94 0f 1f 84 00 00 00 00 00 b0 81 24 a0 <ff> ff ff ff b0 81 24 a0 ff ff ff ff e8 fb fb ff ff c9 c3 66 0f 
RIP  [<ffffffffa02481b4>] rpc_async_schedule+0x4/0x20 [sunrpc]
 RSP <ffff88081b0a1e38>
  • kernel 2.6.32-358.6.2.el6.x86_64 crashed due to kernel BUG at net/sunrpc/sched.c:695!

Environment

  • Red Hat Enterprise Linux 6
    • kernel prior to kernel-2.6.32-358.14.1.el6
    • Seen on at least 2.6.32-220.el6, 2.6.32-279.9.1.el6, and 2.6.32-358.6.1.el6
    • As well as on kernel - 2.6.32-279.14.1.el6.
  • MRG 2.x
    • Seen on kernel 2.6.33.9-rt31.66.el6rt
  • NFS Client

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content