RHEL7.2: NFS4 server repeated soft lockups due to laundromat_main kworker process stuck in __destroy_client
Issue
- A RHEL6.6 NFs4 client hung and had to be rebooted. This issue is described in RHEL6.6: NFS4 client hangs with repeated 'Callback slot table overflowed' and constantly running rpciod and NFS4 state manager thread.
- The reboot of nfs-client occurred at Apr 9 18:12. I then saw the below "soft lockup" messages on the NFS server at the same time. The "soft lockup" bug message just kept coming until I was forced to fail over to the second node in the NFS cluster. This also had to be done by fencing the first node since it was stuck on I/O while trying to stop the NFS server component.
Apr 9 18:12:16 nfs-server kernel: BUG: soft lockup - CPU#3 stuck for 23s! [kworker/u64:0:14786]
Apr 9 18:12:16 nfs-server kernel: Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill bnx2i libiscsi nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache iptable_filter fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 bridge ses dm_service_time enclosure mptctl mptbase bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt scsi_transport_iscsi 8021q garp stp mrp llc bonding intel_powerclamp coretemp intel_rapl kvm_intel kvm iTCO_wdt crc32_pclmul ghash_clmulni_intel iTCO_vendor_support aesni_intel ipmi_devintf lrw gf128mul glue_helper ablk_helper cryptd sg hpwdt hpilo sb_edac i2c_i801 pcspkr edac_core ioatdma shpchp lpc_ich mfd_core dca ipmi_si wmi ipmi_msghandler pcc_cpufreq acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath ip_tables
Apr 9 18:12:16 nfs-server kernel: xfs sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect crct10dif_pclmul crct10dif_common sysimgblt crc32c_intel i2c_algo_bit serio_raw drm_kms_helper ttm bnx2x drm mdio i2c_core ptp pps_core hpsa libcrc32c dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_dbb9ab46bcd5709557faf4d2945b181f_13752]
Apr 9 18:12:16 nfs-server kernel: CPU: 3 PID: 14786 Comm: kworker/u64:0 Tainted: G OE ------------ 3.10.0-327.10.1.el7.x86_64 #1
Apr 9 18:12:16 nfs-server kernel: Hardware name: HP ProLiant BL460c Gen9, BIOS I36 05/06/2015
Apr 9 18:12:16 nfs-server kernel: Workqueue: nfsd4 laundromat_main [nfsd]
Apr 9 18:12:16 nfs-server kernel: task: ffff8810514f5c00 ti: ffff880924968000 task.ti: ffff880924968000
Apr 9 18:12:16 nfs-server kernel: RIP: 0010:[<ffffffff8163cb0b>] [<ffffffff8163cb0b>] _raw_spin_unlock_irqrestore+0x1b/0x40
Apr 9 18:12:16 nfs-server kernel: RSP: 0018:ffff88092496bcc8 EFLAGS: 00000246
Apr 9 18:12:16 nfs-server kernel: RAX: ffffffffa046b970 RBX: ffffffff810b0d54 RCX: ffffffffa046b988
Apr 9 18:12:16 nfs-server kernel: RDX: ffffffffa046b988 RSI: 0000000000000246 RDI: 0000000000000246
Apr 9 18:12:16 nfs-server kernel: RBP: ffff88092496bcd0 R08: 0000000000000000 R09: ffff88085fc77540
Apr 9 18:12:16 nfs-server kernel: R10: ffffea00054dd240 R11: ffffffffa045f6bb R12: ffffffff81983f60
Apr 9 18:12:16 nfs-server kernel: R13: 0000000000000246 R14: 0000000000000003 R15: 0000000000000246
Apr 9 18:12:16 nfs-server kernel: FS: 0000000000000000(0000) GS:ffff88085fc60000(0000) knlGS:0000000000000000
Apr 9 18:12:16 nfs-server kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 9 18:12:16 nfs-server kernel: CR2: 00007fb61f554140 CR3: 000000000194a000 CR4: 00000000001407e0
Apr 9 18:12:16 nfs-server kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 9 18:12:16 nfs-server kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 9 18:12:16 nfs-server kernel: Stack:
Apr 9 18:12:16 nfs-server kernel: ffffffffa046b980 ffff88092496bd08 ffffffff810b0d54 ffff88092496bcf8
Apr 9 18:12:16 nfs-server kernel: ffffffffa04538d5 ffffffffa0453595 ffff88092496bd48 ffff8810528af888
Apr 9 18:12:16 nfs-server kernel: ffff88092496bd38 ffffffffa0453595 ffff88092496bd48 ffff8810528af800
Apr 9 18:12:16 nfs-server kernel: Call Trace:
Apr 9 18:12:16 nfs-server kernel: [<ffffffff810b0d54>] __wake_up+0x44/0x50
Apr 9 18:12:16 nfs-server kernel: [<ffffffffa04538d5>] ? __destroy_client+0x135/0x180 [nfsd]
Apr 9 18:12:16 nfs-server kernel: [<ffffffffa0453595>] ? nfs4_put_stid+0x75/0x80 [nfsd]
Apr 9 18:12:16 nfs-server kernel: [<ffffffffa0453595>] nfs4_put_stid+0x75/0x80 [nfsd]
Apr 9 18:12:16 nfs-server kernel: [<ffffffffa045388f>] __destroy_client+0xef/0x180 [nfsd]
Apr 9 18:12:16 nfs-server kernel: [<ffffffffa0453942>] expire_client+0x22/0x30 [nfsd]
Apr 9 18:12:16 nfs-server kernel: [<ffffffffa0457506>] laundromat_main+0x166/0x4e0 [nfsd]
Apr 9 18:12:16 nfs-server kernel: [<ffffffff8109d5db>] process_one_work+0x17b/0x470
Apr 9 18:12:16 nfs-server kernel: [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
...
Apr 9 18:12:44 nfs-server kernel: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/u64:0:14786]
Apr 9 18:12:44 nfs-server kernel: Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill bnx2i libiscsi nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache iptable_filter fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 bridge ses dm_service_time enclosure mptctl mptbase bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt scsi_transport_iscsi 8021q garp stp mrp llc bonding intel_powerclamp coretemp intel_rapl kvm_intel kvm iTCO_wdt crc32_pclmul ghash_clmulni_intel iTCO_vendor_support aesni_intel ipmi_devintf lrw gf128mul glue_helper ablk_helper cryptd sg hpwdt hpilo sb_edac i2c_i801 pcspkr edac_core ioatdma shpchp lpc_ich mfd_core dca ipmi_si wmi ipmi_msghandler pcc_cpufreq acpi_power_meter binfmt_misc nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath ip_tables
Apr 9 18:12:44 nfs-server kernel: xfs sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect crct10dif_pclmul crct10dif_common sysimgblt crc32c_intel i2c_algo_bit serio_raw drm_kms_helper ttm bnx2x drm mdio i2c_core ptp pps_core hpsa libcrc32c dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_dbb9ab46bcd5709557faf4d2945b181f_13752]
Apr 9 18:12:44 nfs-server kernel: CPU: 3 PID: 14786 Comm: kworker/u64:0 Tainted: G OEL ------------ 3.10.0-327.10.1.el7.x86_64 #1
Apr 9 18:12:44 nfs-server kernel: Hardware name: HP ProLiant BL460c Gen9, BIOS I36 05/06/2015
Apr 9 18:12:44 nfs-server kernel: Workqueue: nfsd4 laundromat_main [nfsd]
Apr 9 18:12:44 nfs-server kernel: task: ffff8810514f5c00 ti: ffff880924968000 task.ti: ffff880924968000
Apr 9 18:12:44 nfs-server kernel: RIP: 0010:[<ffffffff812f2a0c>] [<ffffffff812f2a0c>] _atomic_dec_and_lock+0x1c/0x70
Apr 9 18:12:44 nfs-server kernel: RSP: 0018:ffff88092496bcf8 EFLAGS: 00000246
Apr 9 18:12:44 nfs-server kernel: RAX: 000000002496bd48 RBX: ffffffffa046b988 RCX: 000000002496bd47
Apr 9 18:12:44 nfs-server kernel: RDX: 000000002496bd48 RSI: ffffffffa04538d5 RDI: ffff88092496bcf8
Apr 9 18:12:44 nfs-server kernel: RBP: ffff88092496bd08 R08: ffff88092496bd48 R09: ffff88085fc77540
Apr 9 18:12:44 nfs-server kernel: R10: ffffea00054dd240 R11: ffffffffa045f6bb R12: 0000000000000000
Apr 9 18:12:44 nfs-server kernel: R13: ffff88085fc77540 R14: ffffea00054dd240 R15: ffffffffa045f6bb
Apr 9 18:12:44 nfs-server kernel: FS: 0000000000000000(0000) GS:ffff88085fc60000(0000) knlGS:0000000000000000
Apr 9 18:12:44 nfs-server kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 9 18:12:44 nfs-server kernel: CR2: 00007fb61f554140 CR3: 000000000194a000 CR4: 00000000001407e0
Apr 9 18:12:44 nfs-server kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 9 18:12:44 nfs-server kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 9 18:12:44 nfs-server kernel: Stack:
Apr 9 18:12:44 nfs-server kernel: ffff88092496bd47 ffff88092496bcf8 ffff88092496bd38 ffffffffa045354a
Apr 9 18:12:44 nfs-server kernel: ffff88092496bd48 ffff8810528af800 ffff88092496bcf8 ffff8810528af878
Apr 9 18:12:44 nfs-server kernel: ffff88092496bd80 ffffffffa045388f ffff88092496bd48 ffff88092496bd48
Apr 9 18:12:44 nfs-server kernel: Call Trace:
Apr 9 18:12:44 nfs-server kernel: [<ffffffffa045354a>] nfs4_put_stid+0x2a/0x80 [nfsd]
Apr 9 18:12:44 nfs-server kernel: [<ffffffffa045388f>] __destroy_client+0xef/0x180 [nfsd]
Apr 9 18:12:44 nfs-server kernel: [<ffffffffa0453942>] expire_client+0x22/0x30 [nfsd]
Apr 9 18:12:44 nfs-server kernel: [<ffffffffa0457506>] laundromat_main+0x166/0x4e0 [nfsd]
Apr 9 18:12:44 nfs-server kernel: [<ffffffff8109d5db>] process_one_work+0x17b/0x470
Apr 9 18:12:44 nfs-server kernel: [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
Environment
- Red Hat Enterprise Linux 7.2 (NFS server)
- kernel prior to kernel-3.10.0-327.18.2.el7
- reported on kernel-3.10.0-327.18.2.el7
- NFS4.1
- Connected to RHEL6.6 NFS client that rebooted
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.