RHEL7.4: NFS4 client hung due to NFS4 state manager thread stuck inside nfs_reap_expired_delegations infinite loop

Solution In Progress - Updated -

Issue

  • After update to RHEL7.4 kernel, NFS4.1 client hangs with the NFS4 state manager thread inside nfs_reap_expired_delegations and a tcpdump shows a constant stream of TEST_STATEID with the same stateid sent, and a NFS4ERR_BAD_STATEID response.
  • With NFS4.1 NFS client, after update to RHEL7.4, processes are hung and/or generate hung task error messages. Also, top shows the following process which has a name based on the IP address of the NFS server, is using the most CPU
(10.#.#.# is IP address of NAS):

          PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
        12073 root      20   0       0      0      0 D  16.7  0.0 525:23.29 10.#.#.#-manag
  • After update to RHEL7.4 kernel, NFS4.0 seeing soft lockups in nfs_reap_expired_delegations and the NFS4.0 client hangs requiring a reboot or panics due to soft lockup
[17596.853096] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [10.1.1.42-ma:11637]
[17596.853853] Modules linked in: tcp_diag inet_diag rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock sb_edac edac_core coretemp iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev vmw_balloon joydev pcspkr sg parport_pc parport shpchp vmw_vmci i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper sd_mod syscopyarea crc_t10dif sysfillrect crct10dif_generic sysimgblt fb_sys_fops ttm drm crct10dif_pclmul ata_piix crct10dif_common crc32c_intel libata serio_raw vmxnet3 vmw_pvscsi i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod
[17596.853900] CPU: 1 PID: 11637 Comm: 172.32.55.22-ma Tainted: G             L ------------   3.10.0-693.1.1.el7.x86_64 #1
[17596.853901] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/17/2015
[17596.853903] task: ffff8804242f5ee0 ti: ffff8802cc220000 task.ti: ffff8802cc220000
[17596.853904] RIP: 0010:[<ffffffffc058489a>]  [<ffffffffc058489a>] nfs_reap_expired_delegations+0x9a/0x220 [nfsv4]
[17596.853921] RSP: 0018:ffff8802cc223df8  EFLAGS: 00000206
[17596.853922] RAX: 0000000000000004 RBX: ffff88041ce0d000 RCX: 0000000000000003
[17596.853923] RDX: 0000000000000000 RSI: ffff8800b769d848 RDI: ffff8800bb556000
[17596.853924] RBP: ffff8802cc223e58 R08: ffff88041be93540 R09: 0000000000000000
[17596.853925] R10: 0000000000000000 R11: 7fffffffffffffff R12: ffff88041ce0d000
[17596.853926] R13: ffffffffc0584a6d R14: ffff8802cc223d78 R15: ffff8800b769d7c0
[17596.853927] FS:  0000000000000000(0000) GS:ffff88043fc40000(0000) knlGS:0000000000000000
[17596.853928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[17596.853929] CR2: 00007fd8449a7000 CR3: 00000000019f2000 CR4: 00000000000407e0
[17596.853932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17596.853934] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[17596.853935] Stack:
[17596.853936]  ffffffffc059d3c0 ffff88041be93540 ffff88041329a000 0000000000000000
[17596.853937]  04cdd20102072112 0000000400000000 00000000f389a2ad ffff88042c49a400
[17596.853939]  ffff88042c49a400 ffff88042c49a4c8 ffff88042c49a530 0000000000000000
[17596.853940] Call Trace:
[17596.853949]  [<ffffffffc0580c22>] nfs4_state_manager+0x5f2/0x8c0 [nfsv4]
[17596.853955]  [<ffffffffc0580ef0>] ? nfs4_state_manager+0x8c0/0x8c0 [nfsv4]
[17596.853961]  [<ffffffffc0580f0f>] nfs4_run_state_manager+0x1f/0x40 [nfsv4]
[17596.853964]  [<ffffffff810b098f>] kthread+0xcf/0xe0
[17596.853966]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[17596.853970]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
[17596.853972]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[17596.853972] Code: 24 10 4c 8b 7c 24 10 49 39 df 75 1b e9 e8 00 00 00 49 8b 07 48 89 44 24 10 4c 8b 7c 24 10 49 39 df 0f 84 d2 00 00 00 49 8b 47 48 <a8> 10 75 e2 49 8b 47 48 a8 40 74 da 49 8b be 70 03 00 00 e8 8e 

Environment

  • Red Hat Enterprise Linux 7.4 (NFS client)
    • kernels between 3.10.0-693.el7 and prior to 3.10.0-693.5.2.el7
  • NFS4 with delegations enabled (both NFSv4.0 and NFSv4.1 are affected)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.