NFS4.1: pNFS client crashes or hangs due to a kmalloc-64 slub corruption

Solution Unverified - Updated -

Issue

  • NFS client crashes with kernel BUG at mm/slub.c:3601! or hangs due to soft lockup.
  • NFS client crashes with kernel BUG at mm/slub.c:3601! with Call Trace containing filelayout_pg_init_write
[4203384.815215] ------------[ cut here ]------------
[4203384.817459] kernel BUG at mm/slub.c:3601!
[4203384.819195] invalid opcode: 0000 [#1] SMP 
[4203384.820873] Modules linked in: iscsi_target_mod target_core_mod macsec vsock_diag sctp_diag sctp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nfs_l
ayout_nfsv41_files rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock ext4 mbcache jbd2 sb_edac coretemp iosf_mbi crc32_pclmul ghash_clmulni_intel
 ppdev vmw_balloon aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg vmw_vmci i2c_piix4 shpchp parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace
 sunrpc binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi vmwgfx sd_mod crc_t10dif crct10dif_generic drm_kms_helper syscopyarea sysfillrect sysimgblt fb
_sys_fops ttm ahci drm ata_piix libahci crct10dif_pclmul libata crct10dif_common crc32c_intel vmxnet3 serio_raw i2c_core
[4203384.830479]  vmw_pvscsi dm_mirror dm_region_hash dm_log dm_mod
[4203384.832504] CPU: 8 PID: 57891 Comm: kworker/u256:2 Kdump: loaded Tainted: G    B        L ------------   3.10.0-862.2.3.el7.x86_64 #1
[4203384.834577] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[4203384.836685] Workqueue: writeback bdi_writeback_workfn (flush-0:43)
[4203384.838786] task: ffff92c352352f70 ti: ffff92bc7f82c000 task.ti: ffff92bc7f82c000
[4203384.840870] RIP: 0010:[<ffffffff8a3f6bbc>]  [<ffffffff8a3f6bbc>] kfree+0x13c/0x140
[4203384.843000] RSP: 0018:ffff92bc7f82f850  EFLAGS: 00010246
[4203384.845070] RAX: 002fffff00000000 RBX: ffff92bf1320bf78 RCX: 0000000000000000
[4203384.847150] RDX: 002fffff00000000 RSI: ffff92ca8920d898 RDI: ffff92bf1320bf78
[4203384.849218] RBP: ffff92bc7f82f868 R08: ffff92c4b6408630 R09: 0000000000000000
[4203384.851280] R10: ffff92c4b64086e0 R11: fffff6925f4c82c0 R12: ffff92bf1320bf78
[4203384.853302] R13: ffffffffc06cec5f R14: ffff92bc7f82fb38 R15: ffff92c1d5684100
[4203384.855343] FS:  0000000000000000(0000) GS:ffff92cb2d800000(0000) knlGS:0000000000000000
[4203384.857352] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4203384.859350] CR2: 00007fca7abd7000 CR3: 00000013a1f44000 CR4: 00000000003607e0
[4203384.861475] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[4203384.863542] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[4203384.865596] Call Trace:
[4203384.867634]  [<ffffffffc06cec5f>] filelayout_pg_init_write+0x1ef/0x280 [nfs_layout_nfsv41_files]
[4203384.869731]  [<ffffffffc084271f>] __nfs_pageio_add_request+0x12f/0x440 [nfs]
[4203384.871799]  [<ffffffffc08431ba>] nfs_pageio_add_request+0x19a/0x340 [nfs]
[4203384.874216]  [<ffffffffc0846ecd>] nfs_do_writepage+0x10d/0x2c0 [nfs]
[4203384.876253]  [<ffffffffc0847096>] nfs_writepages_callback+0x16/0x30 [nfs]
[4203384.878260]  [<ffffffff8a39fa44>] write_cache_pages+0x254/0x4e0
[4203384.880248]  [<ffffffffc0847080>] ? nfs_do_writepage+0x2c0/0x2c0 [nfs]
[4203384.882257]  [<ffffffffc08474b4>] nfs_writepages+0xa4/0x140 [nfs]
[4203384.884214]  [<ffffffff8a3a0dc1>] do_writepages+0x21/0x50
[4203384.886150]  [<ffffffff8a448a90>] __writeback_single_inode+0x40/0x260
[4203384.888058]  [<ffffffff8a449524>] writeback_sb_inodes+0x1c4/0x490
[4203384.889966]  [<ffffffff8a44988f>] __writeback_inodes_wb+0x9f/0xd0
[4203384.891842]  [<ffffffff8a44a0c3>] wb_writeback+0x263/0x2f0
[4203384.893704]  [<ffffffff8a3a0470>] ? bdi_dirty_limit+0x40/0xe0
[4203384.895564]  [<ffffffff8a44a94c>] bdi_writeback_workfn+0x1cc/0x460
[4203384.897442]  [<ffffffff8a2b2dff>] process_one_work+0x17f/0x440
[4203384.899354]  [<ffffffff8a2b3ac6>] worker_thread+0x126/0x3c0
[4203384.901224]  [<ffffffff8a2b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[4203384.903128]  [<ffffffff8a2bae31>] kthread+0xd1/0xe0
[4203384.904984]  [<ffffffff8a2bad60>] ? insert_kthread_work+0x40/0x40
[4203384.906863]  [<ffffffff8a91f5f7>] ret_from_fork_nospec_begin+0x21/0x21
[4203384.908709]  [<ffffffff8a2bad60>] ? insert_kthread_work+0x40/0x40
[4203384.910532] Code: 49 8b 03 31 f6 f6 c4 40 74 04 41 8b 73 68 4c 89 df e8 89 63 fa ff eb 84 4c 8b 58 30 48 8b 10 80 e6 80 4c 0f 44 d8 e9 28 ff ff ff <0f> 0b 66 90 0f 1f 44 00 00 55 89 f1 48 89 e5 41 57 41 56 41 55 
[4203384.914517] RIP  [<ffffffff8a3f6bbc>] kfree+0x13c/0x140
  • NFS client hangs due to soft lockup. dmesg from the collected vmcore:

    crash> log | grep "soft lockup"
    [5805930.128614] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613]
    [5805958.128585] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613]
    [5805986.128568] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613]
    ...
    [5813186.124150] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613]
    [5813214.124132] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613]
    [5813242.124113] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613]
    

~ kmalloc-64 was corrupted.

~~~
crash> kmem -s kmalloc-64
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
kmem: kmalloc-64: slab: ffffea35beed0640 invalid freepointer: bf8dcb2f933400
ffff8ab63fc03b00       64    1353792   1523840  23810     4k  kmalloc-64
~~~

Environment

  • Red Hat Enterprise Linux 7 (NFS client)
    • kernels from 3.10.0-229.el7 to 3.10.0-1127.el7 are affected.
  • Red Hat Enterprise Linux 8 (NFS client)
    • kernels from 4.18.0-80 to 4.18.0-240 are affected
  • NFS v4.1 pNFS

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content