Kernel crash in unhash_delegation while attempting to expire a client
Environment
- Red Hat Enterprise Linux 6
- seen on kernel 2.6.32-431.11.2.el6 (all RHEL6 kernels believed affected)
- NFS4
- 3rd party modules in the vmcore
crash> mod -t
NAME TAINTS
vxfs P(U)
vxodm P(U)
fdd P(U)
vxportal P(U)
vxdmp P(U)
vxio P(U)
vxspec P(U)
dmpap P(U)
dmpalua P(U)
dmpCLARiiON P(U)
dmpaa P(U)
dmpjbod P(U)
Issue
- kernel crashed with the following message
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa0cfcba6>] unhash_delegation+0x16/0xe0 [nfsd]
PGD 1951953067 PUD 1b2a207067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/system/cpu/online
CPU 9
Modules linked in: mptctl mptbase vxodm(P)(U) nfsd lockd nfs_acl auth_rpcgss sunrpc dmpjbod(P)(U) dmpaa(P)(U) dmpCLARiiON(P)(U) dmpalua(P)(U) dmpap(P)(U) vxspec(P)(U) vxio(P)(U) vxdmp(P)(U) bonding 8021q garp stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vxportal(P)(U) fdd(P)(U) vxfs(P)(U) exportfs serio_raw fam15h_power k10temp amd64_edac_mod edac_core edac_mce_amd sg power_meter hpilo hpwdt be2net i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic pata_atiixp ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 5347, comm: nfsd4 Tainted: P --------------- 2.6.32-431.11.2.el6.x86_64 #1 HP ProLiant BL685c G7
RIP: 0010:[<ffffffffa0cfcba6>] [<ffffffffa0cfcba6>] unhash_delegation+0x16/0xe0 [nfsd]
RSP: 0018:ffff887028c9bd80 EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffffffffa0d12180 RCX: ffff880f7e23b1d8
RDX: 0000000000000000 RSI: ffff880f7e23b1b8 RDI: ffffffffa0d12180
RBP: ffff887028c9bd90 R08: 0000000000000000 R09: 0000000000000000
R10: ffff880040409380 R11: 0000000000000000 R12: ffff881028db6800
R13: ffff887028c9bda0 R14: ffff881028db6830 R15: 000000000000005a
FS: 00007f474b9f5700(0000) GS:ffff882078900000(0000) knlGS:00000000f777c8e0
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000001a19c3f000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd4 (pid: 5347, threadinfo ffff887028c9a000, task ffff887028b84040)
Stack:
ffff881028db6830 ffff881028db6800 ffff887028c9bdd0 ffffffffa0d020a0
<d> ffff880f7e23b1d8 ffff880c6b619c98 ffff887028c9bdf0 ffff881028db6800
<d> ffff887028c9bdf0 00000000537ddd10 ffff887028c9be30 ffffffffa0d02597
Call Trace:
[<ffffffffa0d020a0>] expire_client+0xd0/0x250 [nfsd]
[<ffffffffa0d02597>] laundromat_main+0x1c7/0x3f0 [nfsd]
[<ffffffffa0d023d0>] ? laundromat_main+0x0/0x3f0 [nfsd]
[<ffffffff81094d10>] worker_thread+0x170/0x2a0
[<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81094ba0>] ? worker_thread+0x0/0x2a0
[<ffffffff8109aee6>] kthread+0x96/0xa0
[<ffffffff8100c20a>] child_rip+0xa/0x20
[<ffffffff8109ae50>] ? kthread+0x0/0xa0
[<ffffffff8100c200>] ? child_rip+0x0/0x20
Code: 89 fe 31 c0 48 c7 c7 38 af d0 a0 e8 60 ad 82 e0 eb 80 0f 1f 00 55 48 89 e5 41 54 53 0f 1f 44 00 00 48 8b 17 48 8b 47 08 48 89 fb <48> 89 42 08 48 89 10 48 8d 47 10 48 89 3b 48 89 7b 08 48 8b 4f
RIP [<ffffffffa0cfcba6>] unhash_delegation+0x16/0xe0 [nfsd]
RSP <ffff887028c9bd80>
CR2: 0000000000000008
Resolution
- Tracked by private Red Hat Bug 1076663 - "Fix race between expire_client and nfsd_break_deleg_cb".
- A patch to fix this issue is currently under development and testing. For more information, contact your support representative.
Workarounds
- The only known workarounds are:
1) Avoid using nfs4 delegations. These must be disabled on the NFS server. the following link provides steps on how to do this:
Is it possible to disable NFSv4 delegations on a RHEL NFS server?
2) Avoid using nfs4 (Mount with nfsv3 by using the "nfsvers=3" option when mounting the share on the client. See the nfs(5) man page for more information)
Root Cause
-
The following race in the NFS4 server implementation may cause a kernel crash
- NFS4 server's laundromat thread calls
expire_client, and an nfs4_delegation structure gets added to its localstruct list_headcalled 'reaplist` - NFS4 server's laundromat thread drops the
recall_lockspinlock, which protects adding/removing from the globalstruct list_headcalleddel_recall_lru - Some other thread calls
nfsd_break_deleg_cb, wherefile_lock.fl_owneris equal to the abovenfs4_delegationstructure pointer which was added to NFS4 server's laundromat thread'sreaplist. This results in the nfs4_delegation structure getting added to thedel_recall_lrulist, and the NFS4 server's laundromat 'reaplist' containing an entry pointing at the globaldel_recall_lrulist_head structure, which is not embedded inside a nfs4_delegation structure. In essence this onenfs4_delegationstructure is now on two different lists, which is not intended.
- NFS4 server's laundromat thread calls
-
The assumption of the code inside
expire_clientis invalid due to droppingrecall_lockspinlock before emptying the reaplist, and due to the fact that an nfs4_delegation structure which has been added toreaplistmay still be accessed from another thread callingnfsd_break_deleg_cb.
Diagnostic Steps
- Configure kdump and gather a vmcore. Full verification may not be possible without a vmcore.
Simple analysis steps (from kernel crash / oops message)
-
Check the Environment and Resolution sections to make sure the version in the crash message is an affected kernel
-
Look for the following in the crash message. the BUG line should be due to
NULL pointer dereference at 0000000000000008, the RIP line should containunhash_delegation, the RDI register should contain a value starting with 8 'f's (for example, ffffffffa0d12180), the Process is 'nfsd4', and theCall Tracecontainsexpire_client
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
RIP: 0010:[<ffffffffa0cfcba6>] [<ffffffffa0cfcba6>] unhash_delegation+0x16/0xe0 [nfsd]
...
RDX: 0000000000000000 RSI: ffff880f7e23b1b8 RDI: ffffffffa0d12180 <------------ RDI contains a value starting with 8 'f's
Process nfsd4 (pid: 5347, threadinfo ffff887028c9a000, task ffff887028b84040) <-- nfsd4 process
...
Call Trace:
[<ffffffffa0d020a0>] expire_client+0xd0/0x250 [nfsd] <------------------------ matches
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
