RHEL7.4: NFS4 client hung due to NFS4 state manager thread stuck inside nfs_reap_expired_delegations infinite loop

Solution Verified - Updated -

Issue

  • After update to RHEL7.4 kernel, NFS4.1 client hangs with the NFS4 state manager thread inside nfs_reap_expired_delegations and a tcpdump shows a constant stream of TEST_STATEID with the same stateid sent, and a NFS4ERR_BAD_STATEID response.
  • With NFS4.1 NFS client, after update to RHEL7.4, processes are hung and/or generate hung task error messages. Also, top shows the following process which has a name based on the IP address of the NFS server, is using the most CPU
(10.#.#.# is IP address of NAS):

          PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
        12073 root      20   0       0      0      0 D  16.7  0.0 525:23.29 10.#.#.#-manag
  • After update to RHEL7.4 kernel, NFS4.0 seeing soft lockups in nfs_reap_expired_delegations and the NFS4.0 client hangs requiring a reboot or panics due to soft lockup
[17596.853096] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [10.1.1.xx-ma:11637]
[17596.853853] Modules linked in: tcp_diag inet_diag rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock sb_edac edac_core coretemp iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev vmw_balloon joydev pcspkr sg parport_pc parport shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper sd_mod syscopyarea crc_t10dif sysfillrect crct10dif_generic sysimgblt fb_sys_fops ttm drm crct10dif_pclmul ata_piix crct10dif_common crc32c_intel libata serio_raw vmxnet3 vmw_pvscsi i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod
[17596.853900] CPU: 1 PID: 11637 Comm: 172.32.xx.xx-ma Tainted: G             L ------------   3.10.0-693.1.1.el7.x86_64 #1
[17596.853901] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/17/2015
[17596.853903] task: ffff8804242f5ee0 ti: ffff8802cc220000 task.ti: ffff8802cc220000
[17596.853904] RIP: 0010:[<ffffffffc058489a>]  [<ffffffffc058489a>] nfs_reap_expired_delegations+0x9a/0x220 [nfsv4]
[17596.853921] RSP: 0018:ffff8802cc223df8  EFLAGS: 00000206
[17596.853922] RAX: 0000000000000004 RBX: ffff88041ce0d000 RCX: 0000000000000003
[17596.853923] RDX: 0000000000000000 RSI: ffff8800b769d848 RDI: ffff8800bb556000
[17596.853924] RBP: ffff8802cc223e58 R08: ffff88041be93540 R09: 0000000000000000
[17596.853925] R10: 0000000000000000 R11: 7fffffffffffffff R12: ffff88041ce0d000
[17596.853926] R13: ffffffffc0584a6d R14: ffff8802cc223d78 R15: ffff8800b769d7c0
[17596.853927] FS:  0000000000000000(0000) GS:ffff88043fc40000(0000) knlGS:0000000000000000
[17596.853928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[17596.853929] CR2: 00007fd8449a7000 CR3: 00000000019f2000 CR4: 00000000000407e0
[17596.853932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17596.853934] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[17596.853935] Stack:
[17596.853936]  ffffffffc059d3c0 ffff88041be93540 ffff88041329a000 0000000000000000
[17596.853937]  04cdd20102072112 0000000400000000 00000000f389a2ad ffff88042c49a400
[17596.853939]  ffff88042c49a400 ffff88042c49a4c8 ffff88042c49a530 0000000000000000
[17596.853940] Call Trace:
[17596.853949]  [<ffffffffc0580c22>] nfs4_state_manager+0x5f2/0x8c0 [nfsv4]
[17596.853955]  [<ffffffffc0580ef0>] ? nfs4_state_manager+0x8c0/0x8c0 [nfsv4]
[17596.853961]  [<ffffffffc0580f0f>] nfs4_run_state_manager+0x1f/0x40 [nfsv4]
[17596.853964]  [<ffffffff810b098f>] kthread+0xcf/0xe0
[17596.853966]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[17596.853970]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
[17596.853972]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[17596.853972] Code: 24 10 4c 8b 7c 24 10 49 39 df 75 1b e9 e8 00 00 00 49 8b 07 48 89 44 24 10 4c 8b 7c 24 10 49 39 df 0f 84 d2 00 00 00 49 8b 47 48 <a8> 10 75 e2 49 8b 47 48 a8 40 74 da 49 8b be 70 03 00 00 e8 8e 
  • Another example of this issue triggering softlockups on kernel 3.10.0-693.el7.x86_64
    nfs4_state_manager() => nfs_reap_expired_delegations() => nfs_revoke_delegation()
[949664.745423] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [192.168.0.xx-m:17637]
[949664.745429] Modules linked in: binfmt_misc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack n
f_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables ipta
ble_filter rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ppdev crc32_pclmul sg ghash_clmulni_intel virtio_balloon joydev virtio_rng aesni_intel lrw gf128mul glue_helper ablk_helper cryptd parport_pc i2c_piix4 parport pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi virtio_net virtio_console virtio_scsi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata i2c_core crct10dif_pclmul
[949664.745491]  crct10dif_common crc32c_intel serio_raw virtio_pci virtio_ring floppy virtio dm_mirror dm_region_hash dm_log dm_mod
[949664.745501] CPU: 5 PID: 17637 Comm: 192.168.0.xx-m Tainted: G             L ------------   3.10.0-693.el7.x86_64 #1
[949664.745504] Hardware name: Red Hat RHEV Hypervisor, BIOS 1.9.1-5.el7_3.2 04/01/2014
[949664.745506] task: ffff880fb3299fa0 ti: ffff880fe12a8000 task.ti: ffff880fe12a8000
[949664.745508] RIP: 0010:[<ffffffffc0507429>]  [<ffffffffc0507429>] nfs_mark_return_delegation.isra.4+0x19/0x20 [nfsv4]
[949664.745535] RSP: 0018:ffff880fe12abdc0  EFLAGS: 00000202
[949664.745537] RAX: ffff880fe8760800 RBX: 00000000f50f06d8 RCX: 000000000000000f
[949664.745539] RDX: 000000000000000f RSI: ffff880411667100 RDI: ffff880fe6a99800
[949664.745540] RBP: ffff880fe12abdc0 R08: 0000000000000000 R09: 0000000000000000
[949664.745541] R10: ffff880fff359c40 R11: ffffea003c3f8980 R12: 0000000000000010
[949664.745543] R13: ffffffffc05074de R14: ffffffffffffff10 R15: ffff880fe6a99800
[949664.745545] FS:  0000000000000000(0000) GS:ffff880fff340000(0000) knlGS:0000000000000000
[949664.745547] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[949664.745548] CR2: 0000000000000004 CR3: 00000000019f2000 CR4: 00000000000006e0
[949664.745555] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[949664.745556] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[949664.745557] Stack:
[949664.745559]  ffff880fe12abde8 ffffffffc0507551 ffff880fb2f5aed0 ffff880fe87608c8
[949664.745561]  ffff880fe8760800 ffff880fe12abe58 ffffffffc05089ae ffffffffc05213c0
[949664.745564]  ffff880ffa8e2180 ffff880411667100 0000000000000000 00592902022a921d
[949664.745566] Call Trace:
[949664.745580]  [<ffffffffc0507551>] nfs_revoke_delegation+0x71/0x90 [nfsv4]
[949664.745592]  [<ffffffffc05089ae>] nfs_reap_expired_delegations+0x1ae/0x220 [nfsv4]
[949664.745603]  [<ffffffffc0504c22>] nfs4_state_manager+0x5f2/0x8c0 [nfsv4]
[949664.745626]  [<ffffffffc0504ef0>] ? nfs4_state_manager+0x8c0/0x8c0 [nfsv4]
[949664.745637]  [<ffffffffc0504f0f>] nfs4_run_state_manager+0x1f/0x40 [nfsv4]
[949664.745643]  [<ffffffff810b098f>] kthread+0xcf/0xe0
[949664.745647]  [<ffffffff8108ddeb>] ? do_exit+0x6bb/0xa40
[949664.745649]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[949664.745654]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
[949664.745656]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[949664.745657] Code: 48 8b 07 f0 80 88 28 01 00 00 20 5d c3 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 f0 80 4e 48 02 48 8b 07 f0 80 88 28 01 00 00 20 <5d> c3 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 55 4c 8d af 

Environment

  • Red Hat Enterprise Linux 7.4 (NFS client)
    • kernels between 3.10.0-693.el7 and prior to 3.10.0-693.5.2.el7
  • NFS4 with delegations enabled (both NFSv4.0 and NFSv4.1 are affected)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content