RHEL7: Server crashed with "kernel BUG at fs/nfs/pagelist.c:387!" and "NFS: Invalid unlock attempted"
Environment
- Red Hat Enterprise 7
- seen on 3.10.0-229.el7.x86_64
Issue
- Server crashed with "kernel BUG at fs/nfs/pagelist.c:387!"
- Server was logging "nfs
not responding, timed out" errors prior to crash - Server crashes with "NFS: Invalid unlock attempted"
Resolution
- There is currently no resolution to this issue. The issue is recorded in a private BUG, and is currently CLOSED WONTFIX; engineering evaluated possible solutions and found all to be too disruptive.
- RHEL 7 entered Maintenance Phase 2 in Aug 6th, 2020, we recommend migrating to newer versions of RHEL.
Root Cause
- Crash is due to a struct nfs_page being freed and then accessed after the free. The root cause for the use after free is still under investigation.
Diagnostic Steps
Simple analysis from oops / kernel log message
- Kernel oops / log message shows 'NFS: Invalid unlock attempted' followed by 'kernel BUG at fs/nfs/pagelist.c:387'
[1621342.293360] NFS: Invalid unlock attempted
[1621342.293410] ------------[ cut here ]------------
[1621342.293458] kernel BUG at fs/nfs/pagelist.c:387!
[1621342.293498] invalid opcode: 0000 [#1] SMP
[1621342.293538] Modules linked in: binfmt_misc nfsv3 nfs_acl tcp_lp bnep bluetooth rfkill fuse rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp dcdbas coretemp intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac lpc_ich edac_core mfd_core ipmi_si mei_me ipmi_msghandler mei wmi acpi_power_meter shpchp uinput xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt drm_kms_helper ttm ixgbe drm ahci igb libahci mdio libata ptp i2c_algo_bit megaraid_sas pps_core i2c_core dca dm_mirror dm_region_hash dm_log dm_mod
[1621342.294182] CPU: 3 PID: 820 Comm: kworker/3:1 Not tainted 3.10.0-229.el7.x86_64 #1
[1621342.294240] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.0.4 08/28/2014
[1621342.294316] Workqueue: nfsiod rpc_async_release [sunrpc]
[1621342.294361] task: ffff882024660b60 ti: ffff882024690000 task.ti: ffff882024690000
[1621342.294418] RIP: 0010:[<ffffffffa05de7d7>] [<ffffffffa05de7d7>] nfs_unlock_request+0x47/0x50 [nfs]
[1621342.294506] RSP: 0018:ffff882024693d60 EFLAGS: 00010246
[1621342.294548] RAX: 000000000000001d RBX: ffff8810264baa80 RCX: 0000000000000000
[1621342.294603] RDX: 0000000000000000 RSI: ffff88203ec2d488 RDI: 0000000000000246
[1621342.294657] RBP: ffff882024693d70 R08: 0000000000000092 R09: 0000000000000b70
[1621342.294711] R10: 0000000000000000 R11: ffff882024693a76 R12: ffff881f95114d00
[1621342.294765] R13: 0000000000008000 R14: ffff881f95114d10 R15: 00000000000000c0
[1621342.294820] FS: 0000000000000000(0000) GS:ffff88203ec20000(0000) knlGS:0000000000000000
[1621342.294881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1621342.294927] CR2: 00007fe8841eb018 CR3: 000000000190a000 CR4: 00000000001407e0
[1621342.294981] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1621342.295035] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[1621342.295089] Stack:
[1621342.295109] ffff8810264baa80 ffff881f95114d00 ffff882024693dd0 ffffffffa05e28e0
[1621342.295182] ffff880834a6f1b8 ffff880834a6f070 0000000000000000 0000000000000000
[1621342.295254] ffffffffa05ec2c0 000000001601239f ffff881f95114d00 0000000000000001
[1621342.295325] Call Trace:
[1621342.295367] [<ffffffffa05e28e0>] nfs_write_completion+0x90/0x150 [nfs]
[1621342.295430] [<ffffffffa05dd62e>] nfs_pgio_release+0x2e/0x40 [nfs]
[1621342.295493] [<ffffffffa057c87a>] rpc_free_task+0x2a/0x70 [sunrpc]
[1621342.295552] [<ffffffffa057c8d5>] rpc_async_release+0x15/0x20 [sunrpc]
[1621342.295612] [<ffffffff8108f0ab>] process_one_work+0x17b/0x470
[1621342.295661] [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
[1621342.295709] [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400
[1621342.295761] [<ffffffff8109726f>] kthread+0xcf/0xe0
[1621342.295804] [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
[1621342.295863] [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0
[1621342.295909] [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140
[1621342.295961] Code: 4c 8d 63 40 f0 80 63 40 fe e8 e6 1c ac e0 4c 89 e7 31 f6 e8 3c 9a ab e0 5b 41 5c 5d c3 48 c7 c7 58 0e 5f a0 31 c0 e8 a1 f1 01 e1 <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41
[1621342.296231] RIP [<ffffffffa05de7d7>] nfs_unlock_request+0x47/0x50 [nfs]
[1621342.296297] RSP <ffff882024693d60>
- If we map the BUG line to code we see where we crashed
[1621342.293458] kernel BUG at fs/nfs/pagelist.c:387!
379 /**
380 * nfs_unlock_request - Unlock request and wake up sleepers.
381 * @req:
382 */
383 void nfs_unlock_request(struct nfs_page *req)
384 {
385 if (!NFS_WBACK_BUSY(req)) {
386 printk(KERN_ERR "NFS: Invalid unlock attempted\n");
> 387 BUG();
388 }
389 smp_mb__before_clear_bit();
390 clear_bit(PG_BUSY, &req->wb_flags);
391 smp_mb__after_clear_bit();
392 wake_up_bit(&req->wb_flags, PG_BUSY);
393 }
Detailed analysis from vmcore
- Note the system which crashed due to a kernel assertion / BUG, and this is untainted vmcore. So unless this is something unusual such as a hardware issue it is likely a kernel bug.
crash> sys
CPUS: 48
DATE: Tue May 5 16:47:54 2015
UPTIME: 18 days, 18:11:59
LOAD AVERAGE: 4.89, 3.56, 3.11
TASKS: 1044
NODENAME: foo.bar.com
RELEASE: 3.10.0-229.el7.x86_64
VERSION: #1 SMP Thu Jan 29 18:37:38 EST 2015
MACHINE: x86_64 (2599 Mhz)
MEMORY: 127.9 GB
PANIC: "kernel BUG at fs/nfs/pagelist.c:387!"
crash> mod -t
no tainted modules
crash> sys -t
TAINTED_MASK: 0
- Map the crash to code
[1621342.293360] NFS: Invalid unlock attempted
[1621342.293410] ------------[ cut here ]------------
[1621342.293458] kernel BUG at fs/nfs/pagelist.c:387!
...
[1621342.294182] CPU: 3 PID: 820 Comm: kworker/3:1 Not tainted 3.10.0-229.el7.x86_64 #1
[1621342.294240] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.0.4 08/28/2014
[1621342.294316] Workqueue: nfsiod rpc_async_release [sunrpc]
[1621342.294361] task: ffff882024660b60 ti: ffff882024690000 task.ti: ffff882024690000
[1621342.294418] RIP: 0010:[<ffffffffa05de7d7>] [<ffffffffa05de7d7>] nfs_unlock_request+0x47/0x50 [nfs]
...
[1621342.295325] Call Trace:
[1621342.295367] [<ffffffffa05e28e0>] nfs_write_completion+0x90/0x150 [nfs]
[1621342.295430] [<ffffffffa05dd62e>] nfs_pgio_release+0x2e/0x40 [nfs]
[1621342.295493] [<ffffffffa057c87a>] rpc_free_task+0x2a/0x70 [sunrpc]
[1621342.295552] [<ffffffffa057c8d5>] rpc_async_release+0x15/0x20 [sunrpc]
[1621342.295612] [<ffffffff8108f0ab>] process_one_work+0x17b/0x470
[1621342.295661] [<ffffffff8108fe8b>] worker_thread+0x11b/0x400
include/linux/nfs_page.h
95
96-->#define NFS_WBACK_BUSY(req) (test_bit(PG_BUSY,&(req)->wb_flags))
97
fs/nfs/pagelist.c
378
379 /**
380 * nfs_unlock_request - Unlock request and wake up sleepers.
381 * @req:
382 */
383 void nfs_unlock_request(struct nfs_page *req)
384 {
385 if (!NFS_WBACK_BUSY(req)) {
386 printk(KERN_ERR "NFS: Invalid unlock attempted\n");
387--> BUG();
388 }
389 smp_mb__before_clear_bit();
390 clear_bit(PG_BUSY, &req->wb_flags);
391 smp_mb__after_clear_bit();
392 wake_up_bit(&req->wb_flags, PG_BUSY);
393 }
Now obtain the nfs_page * from the stack.
crash> dis -r ffffffffa05e28e0 | tail
0xffffffffa05e28c1 <nfs_write_completion+113>: mov 0x10(%r12),%rbx
0xffffffffa05e28c6 <nfs_write_completion+118>: cmp %rbx,%r14
0xffffffffa05e28c9 <nfs_write_completion+121>: je 0xffffffffa05e2881
0xffffffffa05e28cb <nfs_write_completion+123>: xor %r13d,%r13d
0xffffffffa05e28ce <nfs_write_completion+126>: jmp 0xffffffffa05e28fa
0xffffffffa05e28d0 <nfs_write_completion+128>: mov %rbx,%rdi
0xffffffffa05e28d3 <nfs_write_completion+131>: callq 0xffffffffa05e0eb0 <nfs_inode_remove_request>
0xffffffffa05e28d8 <nfs_write_completion+136>: mov %rbx,%rdi
0xffffffffa05e28db <nfs_write_completion+139>: callq 0xffffffffa05de790 <nfs_unlock_request>
0xffffffffa05e28e0 <nfs_write_completion+144>: mov %rbx,%rdi
crash> dis nfs_unlock_request | head
0xffffffffa05de790 <nfs_unlock_request>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa05de795 <nfs_unlock_request+5>: push %rbp
0xffffffffa05de796 <nfs_unlock_request+6>: mov %rsp,%rbp
0xffffffffa05de799 <nfs_unlock_request+9>: push %r12
0xffffffffa05de79b <nfs_unlock_request+11>: push %rbx
0xffffffffa05de79c <nfs_unlock_request+12>: mov 0x40(%rdi),%rax
0xffffffffa05de7a0 <nfs_unlock_request+16>: mov %rdi,%rbx
0xffffffffa05de7a3 <nfs_unlock_request+19>: test $0x1,%al
0xffffffffa05de7a5 <nfs_unlock_request+21>: je 0xffffffffa05de7c9
0xffffffffa05de7a7 <nfs_unlock_request+23>: callq 0xffffffff810a0490 <__smp_mb__before_atomic>
crash> bt -f | grep nfs_write_completion
#7 [ffff882024693d78] nfs_write_completion at ffffffffa05e28e0 [nfs]
crash> px (0xffff882024693d78 - 3*8)
$2 = 0xffff882024693d60
crash> rd 0xffff882024693d60
ffff882024693d60: ffff8810264baa80 ..K&....
crash> print nfs_unlock_request
$3 = {void (struct nfs_page *)} 0xffffffffa05de790
crash> kmem ffff8810264baa80
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff88103f007900 kmalloc-128 128 12564 17600 275 8k
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffea0040992e80 ffff8810264ba000 0 64 62 2
FREE / [ALLOCATED]
ffff8810264baa80
crash> p sizeof(struct nfs_page)
$4 = 96
crash> nfs_page ffff8810264baa80
struct nfs_page {
wb_list = {
next = 0xffff8810264ba880,
prev = 0xffff8810264baa80
},
wb_page = 0x0,
wb_context = 0x0,
wb_lock_context = 0x0,
wb_index = 465,
wb_offset = 0,
wb_pgbase = 0,
wb_bytes = 313,
wb_kref = {
refcount = {
counter = 0
}
},
wb_flags = 0,
wb_verf = {
data = "\000\000\000\000\000\000\000"
},
wb_this_page = 0xffff8810264baa80,
wb_head = 0xffff8810264baa80
}
The nfs_page * looks like it has been freed, and clearly wb_flags = 0 hence the BUG (the PG_BUSY bit is not set, nor is any other bit).
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments