NFS4.1: pNFS client crashes or hangs due to a kmalloc-64 slub corruption
Issue
- NFS client crashes with kernel BUG at mm/slub.c:3601! or hangs due to soft lockup.
- NFS client crashes with
kernel BUG at mm/slub.c:3601!
with Call Trace containingfilelayout_pg_init_write
[4203384.815215] ------------[ cut here ]------------
[4203384.817459] kernel BUG at mm/slub.c:3601!
[4203384.819195] invalid opcode: 0000 [#1] SMP
[4203384.820873] Modules linked in: iscsi_target_mod target_core_mod macsec vsock_diag sctp_diag sctp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nfs_l
ayout_nfsv41_files rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock ext4 mbcache jbd2 sb_edac coretemp iosf_mbi crc32_pclmul ghash_clmulni_intel
ppdev vmw_balloon aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr sg vmw_vmci i2c_piix4 shpchp parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace
sunrpc binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi vmwgfx sd_mod crc_t10dif crct10dif_generic drm_kms_helper syscopyarea sysfillrect sysimgblt fb
_sys_fops ttm ahci drm ata_piix libahci crct10dif_pclmul libata crct10dif_common crc32c_intel vmxnet3 serio_raw i2c_core
[4203384.830479] vmw_pvscsi dm_mirror dm_region_hash dm_log dm_mod
[4203384.832504] CPU: 8 PID: 57891 Comm: kworker/u256:2 Kdump: loaded Tainted: G B L ------------ 3.10.0-862.2.3.el7.x86_64 #1
[4203384.834577] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[4203384.836685] Workqueue: writeback bdi_writeback_workfn (flush-0:43)
[4203384.838786] task: ffff92c352352f70 ti: ffff92bc7f82c000 task.ti: ffff92bc7f82c000
[4203384.840870] RIP: 0010:[<ffffffff8a3f6bbc>] [<ffffffff8a3f6bbc>] kfree+0x13c/0x140
[4203384.843000] RSP: 0018:ffff92bc7f82f850 EFLAGS: 00010246
[4203384.845070] RAX: 002fffff00000000 RBX: ffff92bf1320bf78 RCX: 0000000000000000
[4203384.847150] RDX: 002fffff00000000 RSI: ffff92ca8920d898 RDI: ffff92bf1320bf78
[4203384.849218] RBP: ffff92bc7f82f868 R08: ffff92c4b6408630 R09: 0000000000000000
[4203384.851280] R10: ffff92c4b64086e0 R11: fffff6925f4c82c0 R12: ffff92bf1320bf78
[4203384.853302] R13: ffffffffc06cec5f R14: ffff92bc7f82fb38 R15: ffff92c1d5684100
[4203384.855343] FS: 0000000000000000(0000) GS:ffff92cb2d800000(0000) knlGS:0000000000000000
[4203384.857352] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4203384.859350] CR2: 00007fca7abd7000 CR3: 00000013a1f44000 CR4: 00000000003607e0
[4203384.861475] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[4203384.863542] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[4203384.865596] Call Trace:
[4203384.867634] [<ffffffffc06cec5f>] filelayout_pg_init_write+0x1ef/0x280 [nfs_layout_nfsv41_files]
[4203384.869731] [<ffffffffc084271f>] __nfs_pageio_add_request+0x12f/0x440 [nfs]
[4203384.871799] [<ffffffffc08431ba>] nfs_pageio_add_request+0x19a/0x340 [nfs]
[4203384.874216] [<ffffffffc0846ecd>] nfs_do_writepage+0x10d/0x2c0 [nfs]
[4203384.876253] [<ffffffffc0847096>] nfs_writepages_callback+0x16/0x30 [nfs]
[4203384.878260] [<ffffffff8a39fa44>] write_cache_pages+0x254/0x4e0
[4203384.880248] [<ffffffffc0847080>] ? nfs_do_writepage+0x2c0/0x2c0 [nfs]
[4203384.882257] [<ffffffffc08474b4>] nfs_writepages+0xa4/0x140 [nfs]
[4203384.884214] [<ffffffff8a3a0dc1>] do_writepages+0x21/0x50
[4203384.886150] [<ffffffff8a448a90>] __writeback_single_inode+0x40/0x260
[4203384.888058] [<ffffffff8a449524>] writeback_sb_inodes+0x1c4/0x490
[4203384.889966] [<ffffffff8a44988f>] __writeback_inodes_wb+0x9f/0xd0
[4203384.891842] [<ffffffff8a44a0c3>] wb_writeback+0x263/0x2f0
[4203384.893704] [<ffffffff8a3a0470>] ? bdi_dirty_limit+0x40/0xe0
[4203384.895564] [<ffffffff8a44a94c>] bdi_writeback_workfn+0x1cc/0x460
[4203384.897442] [<ffffffff8a2b2dff>] process_one_work+0x17f/0x440
[4203384.899354] [<ffffffff8a2b3ac6>] worker_thread+0x126/0x3c0
[4203384.901224] [<ffffffff8a2b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[4203384.903128] [<ffffffff8a2bae31>] kthread+0xd1/0xe0
[4203384.904984] [<ffffffff8a2bad60>] ? insert_kthread_work+0x40/0x40
[4203384.906863] [<ffffffff8a91f5f7>] ret_from_fork_nospec_begin+0x21/0x21
[4203384.908709] [<ffffffff8a2bad60>] ? insert_kthread_work+0x40/0x40
[4203384.910532] Code: 49 8b 03 31 f6 f6 c4 40 74 04 41 8b 73 68 4c 89 df e8 89 63 fa ff eb 84 4c 8b 58 30 48 8b 10 80 e6 80 4c 0f 44 d8 e9 28 ff ff ff <0f> 0b 66 90 0f 1f 44 00 00 55 89 f1 48 89 e5 41 57 41 56 41 55
[4203384.914517] RIP [<ffffffff8a3f6bbc>] kfree+0x13c/0x140
-
NFS client hangs due to soft lockup. dmesg from the collected vmcore:
crash> log | grep "soft lockup" [5805930.128614] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613] [5805958.128585] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613] [5805986.128568] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613] ... [5813186.124150] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613] [5813214.124132] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613] [5813242.124113] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [xxx.xxx.xxx.xxx:15613]
~ kmalloc-64 was corrupted.
~~~
crash> kmem -s kmalloc-64
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
kmem: kmalloc-64: slab: ffffea35beed0640 invalid freepointer: bf8dcb2f933400
ffff8ab63fc03b00 64 1353792 1523840 23810 4k kmalloc-64
~~~
Environment
- Red Hat Enterprise Linux 7 (NFS client)
- kernels from 3.10.0-229.el7 to 3.10.0-1127.el7 are affected.
- Red Hat Enterprise Linux 8 (NFS client)
- kernels from 4.18.0-80 to 4.18.0-240 are affected
- NFS v4.1 pNFS
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.