RHEL 7 server hangs with soft lockup errors in DAX mounted Kafka environment.
Issue
- Server hangs reporting multiple soft lockup messages in /var/log/messages.
[ 2319.596588] NMI watchdog: BUG: soft lockup - CPU#55 stuck for 23s! [java:24212]
[ 2319.596617] Modules linked in: tcp_diag inet_diag nfsv3 nfs_acl nfs lockd grace fscache bonding sunrpc vfat fat iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel raid456 aesni_intel ast i2c_algo_bit lrw gf128mul ttm async_raid6_recov glue_helper ablk_helper async_memcpy cryptd async_pq raid6_pq drm_kms_helper async_xor syscopyarea sysfillrect pcspkr xor sysimgblt fb_sys_fops async_tx drm i2c_i801 drm_panel_orientation_quirks joydev mei_me sg mei lpc_ich wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem device_dax acpi_power_meter acpi_pad tcp_htcp binfmt_misc ip_tables xfs libcrc32c nd_pmem nd_btt sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel nvme i40e nvme_core ahci libahci libata
[ 2319.596664] ptp nfit pps_core libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[ 2319.596670] CPU: 55 PID: 24212 Comm: java Kdump: loaded Tainted: G ------------ T 3.10.0-1127.18.2.el7.x86_64 #1
[ 2319.596672] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0009.092820190230 09/28/2019
[ 2319.596673] task: ffff93df73a041c0 ti: ffff93df434f4000 task.ti: ffff93df434f4000
[ 2319.596674] RIP: 0010:[<ffffffffbabc127e>] [<ffffffffbabc127e>] __find_get_pages+0xee/0x1c0
[ 2319.596681] RSP: 0018:ffff93df434f7b90 EFLAGS: 00000296
[ 2319.596682] RAX: 000000000000000d RBX: 0000000000000010 RCX: 0000000000000040
[ 2319.596684] RDX: ffff94de00cd97f8 RSI: 0000000000000c80 RDI: ffff94de00cd9890
[ 2319.596685] RBP: ffff93df434f7bf8 R08: ffff93df434f7c20 R09: 0000000000000000
[ 2319.596686] R10: 0000000000000000 R11: fffff6ef334cea00 R12: ffffffffffffff10
[ 2319.596687] R13: ffff94ddf3f48298 R14: 0000000000000001 R15: ffff94de00cd96d0
[ 2319.596689] FS: 00007f4084f2c700(0000) GS:ffff94debecc0000(0000) knlGS:0000000000000000
[ 2319.596690] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2319.596691] CR2: 00007f408420b158 CR3: 0000005eb864c000 CR4: 00000000007607e0
[ 2319.596693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2319.596694] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2319.596695] PKRU: 55555554
[ 2319.596696] Call Trace:
[ 2319.596701] [<ffffffffbaca7400>] ? dax_iomap_actor_write+0x30/0x30
[ 2319.596704] [<ffffffffbabce08e>] __pagevec_lookup+0x1e/0x30
[ 2319.596706] [<ffffffffbaca79bc>] dax_layout_busy_page+0xac/0x280
[ 2319.596708] [<ffffffffbabc8f26>] ? __alloc_pages_nodemask+0x166/0x450
[ 2319.596740] [<ffffffffc03938b8>] xfs_break_layouts+0x78/0x1b0 [xfs]
[ 2319.596745] [<ffffffffbb1850e2>] ? down_write+0x12/0x3d
[ 2319.596760] [<ffffffffc039f623>] xfs_vn_setattr+0x63/0xc0 [xfs]
[ 2319.596764] [<ffffffffbac6cecc>] notify_change+0x30c/0x4d0
[ 2319.596767] [<ffffffffbac4af05>] do_truncate+0x75/0xc0
[ 2319.596769] [<ffffffffbac50118>] ? __sb_start_write+0x58/0x120
[ 2319.596771] [<ffffffffbac4b329>] do_sys_ftruncate.constprop.14+0x139/0x1a0
[ 2319.596773] [<ffffffffbac4b3ce>] SyS_ftruncate+0xe/0x10
[ 2319.596776] [<ffffffffbb192ed2>] system_call_fastpath+0x25/0x2a
[ 2319.596777] Code: 74 6b 8b 4d c8 48 8b 45 b8 4c 89 ef 48 d3 e7 48 29 f0 48 d3 e8 48 89 f9 48 89 d7 48 83 e8 01 48 85 c0 7e 7a 48 83 c7 08 4c 8b 0f <48> 01 ce 4d 89 ca 41 83 e2 03 49 83 fa 01 75 52 49 83 e1 fe 4c
Environment
- Red Hat Enterprise Linux 7.
- Intel Optane (pMEM) devices.
- Kafka Framework.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.