Kernel crash in unhash_delegation while attempting to expire a client

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 6
    • seen on kernel 2.6.32-431.11.2.el6 (all RHEL6 kernels believed affected)
  • NFS4
  • 3rd party modules in the vmcore
crash> mod -t
NAME         TAINTS
vxfs         P(U)
vxodm        P(U)
fdd          P(U)
vxportal     P(U)
vxdmp        P(U)
vxio         P(U)
vxspec       P(U)
dmpap        P(U)
dmpalua      P(U)
dmpCLARiiON  P(U)
dmpaa        P(U)
dmpjbod      P(U)

Issue

  • kernel crashed with the following message
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa0cfcba6>] unhash_delegation+0x16/0xe0 [nfsd]
PGD 1951953067 PUD 1b2a207067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/system/cpu/online
CPU 9
Modules linked in: mptctl mptbase vxodm(P)(U) nfsd lockd nfs_acl auth_rpcgss sunrpc dmpjbod(P)(U) dmpaa(P)(U) dmpCLARiiON(P)(U) dmpalua(P)(U) dmpap(P)(U) vxspec(P)(U) vxio(P)(U) vxdmp(P)(U) bonding 8021q garp stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vxportal(P)(U) fdd(P)(U) vxfs(P)(U) exportfs serio_raw fam15h_power k10temp amd64_edac_mod edac_core edac_mce_amd sg power_meter hpilo hpwdt be2net i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif hpsa qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic pata_atiixp ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 5347, comm: nfsd4 Tainted: P           ---------------    2.6.32-431.11.2.el6.x86_64 #1 HP ProLiant BL685c G7
RIP: 0010:[<ffffffffa0cfcba6>]  [<ffffffffa0cfcba6>] unhash_delegation+0x16/0xe0 [nfsd]
RSP: 0018:ffff887028c9bd80  EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffffffffa0d12180 RCX: ffff880f7e23b1d8
RDX: 0000000000000000 RSI: ffff880f7e23b1b8 RDI: ffffffffa0d12180
RBP: ffff887028c9bd90 R08: 0000000000000000 R09: 0000000000000000
R10: ffff880040409380 R11: 0000000000000000 R12: ffff881028db6800
R13: ffff887028c9bda0 R14: ffff881028db6830 R15: 000000000000005a
FS:  00007f474b9f5700(0000) GS:ffff882078900000(0000) knlGS:00000000f777c8e0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000001a19c3f000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd4 (pid: 5347, threadinfo ffff887028c9a000, task ffff887028b84040)
Stack:
 ffff881028db6830 ffff881028db6800 ffff887028c9bdd0 ffffffffa0d020a0
<d> ffff880f7e23b1d8 ffff880c6b619c98 ffff887028c9bdf0 ffff881028db6800
<d> ffff887028c9bdf0 00000000537ddd10 ffff887028c9be30 ffffffffa0d02597
Call Trace:
 [<ffffffffa0d020a0>] expire_client+0xd0/0x250 [nfsd]
 [<ffffffffa0d02597>] laundromat_main+0x1c7/0x3f0 [nfsd]
 [<ffffffffa0d023d0>] ? laundromat_main+0x0/0x3f0 [nfsd]
 [<ffffffff81094d10>] worker_thread+0x170/0x2a0
 [<ffffffff8109b290>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff81094ba0>] ? worker_thread+0x0/0x2a0
 [<ffffffff8109aee6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ae50>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20
Code: 89 fe 31 c0 48 c7 c7 38 af d0 a0 e8 60 ad 82 e0 eb 80 0f 1f 00 55 48 89 e5 41 54 53 0f 1f 44 00 00 48 8b 17 48 8b 47 08 48 89 fb <48> 89 42 08 48 89 10 48 8d 47 10 48 89 3b 48 89 7b 08 48 8b 4f
RIP  [<ffffffffa0cfcba6>] unhash_delegation+0x16/0xe0 [nfsd]
 RSP <ffff887028c9bd80>
CR2: 0000000000000008

Resolution

  • Tracked by private Red Hat Bug 1076663 - "Fix race between expire_client and nfsd_break_deleg_cb".
  • A patch to fix this issue is currently under development and testing. For more information, contact your support representative.

Workarounds

  • The only known workarounds are:
    1) Avoid using nfs4 delegations. These must be disabled on the NFS server. the following link provides steps on how to do this:
    Is it possible to disable NFSv4 delegations on a RHEL NFS server?
    2) Avoid using nfs4 (Mount with nfsv3 by using the "nfsvers=3" option when mounting the share on the client. See the nfs(5) man page for more information)

Root Cause

  • The following race in the NFS4 server implementation may cause a kernel crash

    • NFS4 server's laundromat thread calls expire_client, and an nfs4_delegation structure gets added to its local struct list_head called 'reaplist`
    • NFS4 server's laundromat thread drops the recall_lock spinlock, which protects adding/removing from the global struct list_head called del_recall_lru
    • Some other thread calls nfsd_break_deleg_cb, where file_lock.fl_owner is equal to the above nfs4_delegation structure pointer which was added to NFS4 server's laundromat thread's reaplist. This results in the nfs4_delegation structure getting added to the del_recall_lru list, and the NFS4 server's laundromat 'reaplist' containing an entry pointing at the global del_recall_lru list_head structure, which is not embedded inside a nfs4_delegation structure. In essence this one nfs4_delegation structure is now on two different lists, which is not intended.
  • The assumption of the code inside expire_client is invalid due to dropping recall_lock spinlock before emptying the reaplist, and due to the fact that an nfs4_delegation structure which has been added to reaplist may still be accessed from another thread calling nfsd_break_deleg_cb.

Diagnostic Steps

  • Configure kdump and gather a vmcore. Full verification may not be possible without a vmcore.

Simple analysis steps (from kernel crash / oops message)

  1. Check the Environment and Resolution sections to make sure the version in the crash message is an affected kernel

  2. Look for the following in the crash message. the BUG line should be due to NULL pointer dereference at 0000000000000008, the RIP line should contain unhash_delegation, the RDI register should contain a value starting with 8 'f's (for example, ffffffffa0d12180), the Process is 'nfsd4', and the Call Trace contains expire_client

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008

RIP: 0010:[<ffffffffa0cfcba6>]  [<ffffffffa0cfcba6>] unhash_delegation+0x16/0xe0 [nfsd]
...
RDX: 0000000000000000 RSI: ffff880f7e23b1b8 RDI: ffffffffa0d12180  <------------ RDI contains a value starting with 8 'f's

Process nfsd4 (pid: 5347, threadinfo ffff887028c9a000, task ffff887028b84040) <-- nfsd4 process
...
Call Trace:
 [<ffffffffa0d020a0>] expire_client+0xd0/0x250 [nfsd]  <------------------------ matches

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.