RHEL 7 server hangs with soft lockup errors in DAX mounted Kafka environment.

Solution Verified - Updated -

Issue

  • Server hangs reporting multiple soft lockup messages in /var/log/messages.
[ 2319.596588] NMI watchdog: BUG: soft lockup - CPU#55 stuck for 23s! [java:24212]
[ 2319.596617] Modules linked in: tcp_diag inet_diag nfsv3 nfs_acl nfs lockd grace fscache bonding sunrpc vfat fat iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel raid456 aesni_intel ast i2c_algo_bit lrw gf128mul ttm async_raid6_recov glue_helper ablk_helper async_memcpy cryptd async_pq raid6_pq drm_kms_helper async_xor syscopyarea sysfillrect pcspkr xor sysimgblt fb_sys_fops async_tx drm i2c_i801 drm_panel_orientation_quirks joydev mei_me sg mei lpc_ich wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem device_dax acpi_power_meter acpi_pad tcp_htcp binfmt_misc ip_tables xfs libcrc32c nd_pmem nd_btt sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel nvme i40e nvme_core ahci libahci libata
[ 2319.596664]  ptp nfit pps_core libnvdimm dm_mirror dm_region_hash dm_log dm_mod
[ 2319.596670] CPU: 55 PID: 24212 Comm: java Kdump: loaded Tainted: G               ------------ T 3.10.0-1127.18.2.el7.x86_64 #1
[ 2319.596672] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0009.092820190230 09/28/2019
[ 2319.596673] task: ffff93df73a041c0 ti: ffff93df434f4000 task.ti: ffff93df434f4000
[ 2319.596674] RIP: 0010:[<ffffffffbabc127e>]  [<ffffffffbabc127e>] __find_get_pages+0xee/0x1c0
[ 2319.596681] RSP: 0018:ffff93df434f7b90  EFLAGS: 00000296
[ 2319.596682] RAX: 000000000000000d RBX: 0000000000000010 RCX: 0000000000000040
[ 2319.596684] RDX: ffff94de00cd97f8 RSI: 0000000000000c80 RDI: ffff94de00cd9890
[ 2319.596685] RBP: ffff93df434f7bf8 R08: ffff93df434f7c20 R09: 0000000000000000
[ 2319.596686] R10: 0000000000000000 R11: fffff6ef334cea00 R12: ffffffffffffff10
[ 2319.596687] R13: ffff94ddf3f48298 R14: 0000000000000001 R15: ffff94de00cd96d0
[ 2319.596689] FS:  00007f4084f2c700(0000) GS:ffff94debecc0000(0000) knlGS:0000000000000000
[ 2319.596690] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2319.596691] CR2: 00007f408420b158 CR3: 0000005eb864c000 CR4: 00000000007607e0
[ 2319.596693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2319.596694] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2319.596695] PKRU: 55555554
[ 2319.596696] Call Trace:
[ 2319.596701]  [<ffffffffbaca7400>] ? dax_iomap_actor_write+0x30/0x30
[ 2319.596704]  [<ffffffffbabce08e>] __pagevec_lookup+0x1e/0x30
[ 2319.596706]  [<ffffffffbaca79bc>] dax_layout_busy_page+0xac/0x280
[ 2319.596708]  [<ffffffffbabc8f26>] ? __alloc_pages_nodemask+0x166/0x450
[ 2319.596740]  [<ffffffffc03938b8>] xfs_break_layouts+0x78/0x1b0 [xfs]
[ 2319.596745]  [<ffffffffbb1850e2>] ? down_write+0x12/0x3d
[ 2319.596760]  [<ffffffffc039f623>] xfs_vn_setattr+0x63/0xc0 [xfs]
[ 2319.596764]  [<ffffffffbac6cecc>] notify_change+0x30c/0x4d0
[ 2319.596767]  [<ffffffffbac4af05>] do_truncate+0x75/0xc0
[ 2319.596769]  [<ffffffffbac50118>] ? __sb_start_write+0x58/0x120
[ 2319.596771]  [<ffffffffbac4b329>] do_sys_ftruncate.constprop.14+0x139/0x1a0
[ 2319.596773]  [<ffffffffbac4b3ce>] SyS_ftruncate+0xe/0x10
[ 2319.596776]  [<ffffffffbb192ed2>] system_call_fastpath+0x25/0x2a
[ 2319.596777] Code: 74 6b 8b 4d c8 48 8b 45 b8 4c 89 ef 48 d3 e7 48 29 f0 48 d3 e8 48 89 f9 48 89 d7 48 83 e8 01 48 85 c0 7e 7a 48 83 c7 08 4c 8b 0f <48> 01 ce 4d 89 ca 41 83 e2 03 49 83 fa 01 75 52 49 83 e1 fe 4c 

Environment

  • Red Hat Enterprise Linux 7.
  • Intel Optane (pMEM) devices.
  • Kafka Framework.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content