RHEL 8 perpetual Soft Lockup walking THP-backed shared memory

Solution Verified - Updated -

Issue

  • The issue has been seen when using THP-backed Shared memory, without the i915 driver.
  • The i915 graphics driver can forcefully request Transparent Huge Pages (THP) backed shared memory, leading to a perpetual soft lockup of the system.
    [94859.997138] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [kworker/1:0:105968]
    [94859.997144] Modules linked in: seqiv ip_vti ip_tunnel ah4 esp4 xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp tunnel6 chacha20poly1305 cmac camellia_generic came
    llia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 ccm xcbc des_generic uinput nft_counter nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables
    _set nf_tables libcrc32c nfnetlink sunrpc vfat fat intel_rapl_msr pmt_telemetry wmi_bmof pmt_class i2c_designware_platform i2c_designware_core snd_sof_pci_in
    tel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_cadence snd_sof_intel_hda_mlink intel_rapl_common snd_sof_intel_hda snd_sof_pci x
    86_pkg_temp_thermal snd_hda_codec_hdmi intel_powerclamp snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_hda_ext_core coretemp snd_hda_codec_realtek snd_soc_acpi
    _intel_match kvm_intel snd_soc_acpi snd_hda_codec_generic soundwire_generic_allocation ledtrig_audio soundwire_bus snd_soc_core kvm snd_compress irqbypass sn
    d_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec
    [94859.997193]  snd_hda_core snd_hwdep snd_seq snd_seq_device intel_cstate snd_pcm joydev intel_uncore wdat_wdt pcspkr snd_timer snd soundcore i2c_i801 mei_m
    e idma64 intel_lpss_pci intel_vsec intel_lpss mei wmi serial_multi_instantiate intel_pmc_core acpi_pad acpi_tad binfmt_misc ext4 mbcache jbd2 dm_crypt sd_mod
     t10_pi sg i915 i2c_algo_bit cec drm_buddy intel_gtt drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt ttm ahci crct10dif_pclmul crc32_pclm
    ul libahci crc32c_intel drm libata ghash_clmulni_intel r8169 igc realtek video hid_multitouch dm_mirror dm_region_hash dm_log dm_mod
    [94859.997234] CPU: 1 PID: 105968 Comm: kworker/1:0 Kdump: loaded Tainted: G     U            -------- -  - 4.18.0-553.37.1.el8_10.x86_64 #1
    [94859.997244] Hardware name: Draeger Infinity CentralStation Gen5 CPU/K3931-Nx, BIOS V5.0.0.26 R1.4.0 for K3931-Nxx                     08/14/2024
    [94859.997247] Workqueue: events delayed_fput
    [94859.997251] RIP: 0010:xas_load+0x53/0x80
    [94859.997255] Code: 41 38 48 10 77 ed 49 8b 50 08 48 d3 ea 83 e2 3f 89 d0 48 8d 44 c6 28 48 8b 00 49 89 70 18 48 89 c1 83 e1 03 48 83 f9 02 75 18 <48> 3d fd
     00 00 00 77 10 48 c1 e8 02 89 c2 89 c0 48 8d 44 c6 28 48
    [94859.997256] RSP: 0018:ffffa871815abb30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
    [94859.997258] RAX: ffff9be1de0c3232 RBX: 000000000000000f RCX: 0000000000000002
    [94859.997259] RDX: 0000000000000000 RSI: ffff9be3efa61240 RDI: ffffa871815abb48
    [94859.997260] RBP: 0000000000000001 R08: ffffa871815abb48 R09: ffffa871815abb48
    [94859.997261] R10: ffffffffffffffff R11: 0000000000000001 R12: ffffffffffffffff
    [94859.997262] R13: ffffa871815abc78 R14: ffffa871815abbf8 R15: 0000000000000000
    [94859.997263] FS:  0000000000000000(0000) GS:ffff9be3f7c40000(0000) knlGS:0000000000000000
    [94859.997264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [94859.997265] CR2: 00007f09f61741a0 CR3: 0000000263010000 CR4: 0000000000750ee0
    [94859.997266] PKRU: 55555554
    [94859.997267] Call Trace:
    [94859.997269]  <IRQ>
    [94859.997271]  ? watchdog_timer_fn.cold.10+0x46/0x9e
    [94859.997273]  ? watchdog+0x30/0x30
    [94859.997275]  ? __hrtimer_run_queues+0x101/0x280
    [94859.997278]  ? hrtimer_interrupt+0x100/0x220
    [94859.997279]  ? sched_clock+0x5/0x10
    [94859.997281]  ? smp_apic_timer_interrupt+0x6a/0x130
    [94859.997283]  ? apic_timer_interrupt+0xf/0x20
    [94859.997284]  </IRQ>
    [94859.997285]  ? xas_load+0x53/0x80
    [94859.997286]  xas_find+0x183/0x1c0
    [94859.997288]  find_get_entries+0x219/0x2d0
    [94859.997291]  shmem_undo_range+0xec/0x8c0
    [94859.997294]  ? current_time+0x4a/0x90
    [94859.997296]  shmem_truncate_range+0x14/0x40
    [94859.997298]  shmem_evict_inode+0xe7/0x240
    [94859.997299]  ? var_wake_function+0x30/0x30
    [94859.997302]  evict+0xd2/0x1a0
    [94859.997303]  __dentry_kill+0xd5/0x170
    [94859.997305]  dentry_kill+0x4d/0x1a0
    [94859.997306]  dput.part.33+0xff/0x150
    [94859.997308]  __fput+0x10b/0x250
    [94859.997310]  delayed_fput+0x1c/0x30
    [94859.997311]  process_one_work+0x1d3/0x390
    [94859.997314]  ? process_one_work+0x390/0x390
    [94859.997315]  worker_thread+0x30/0x390
    [94859.997317]  ? process_one_work+0x390/0x390
    [94859.997319]  kthread+0x134/0x150
    [94859.997321]  ? set_kthread_struct+0x50/0x50
    [94859.997322]  ret_from_fork+0x1f/0x40

    [94859.997326] Kernel panic - not syncing: softlockup: hung tasks
    [94859.997327] CPU: 1 PID: 105968 Comm: kworker/1:0 Kdump: loaded Tainted: G     U       L    -------- -  - 4.18.0-553.37.1.el8_10.x86_64 #1
    [94859.997329] Hardware name: Draeger Infinity CentralStation Gen5 CPU/K3931-Nx, BIOS V5.0.0.26 R1.4.0 for K3931-Nxx                     08/14/2024
    [94859.997330] Workqueue: events delayed_fput
    [94859.997332] Call Trace:
    [94859.997333]  <IRQ>
    [94859.997334]  dump_stack+0x41/0x60
    [94859.997336]  panic+0xe7/0x2ac
    [94859.997338]  ? syscall_return_via_sysret+0x6e/0x94
    [94859.997340]  watchdog_timer_fn.cold.10+0x85/0x9e
    [94859.997341]  ? watchdog+0x30/0x30
    [94859.997343]  __hrtimer_run_queues+0x101/0x280
    [94859.997345]  hrtimer_interrupt+0x100/0x220
    [94859.997347]  ? sched_clock+0x5/0x10
    [94859.997349]  smp_apic_timer_interrupt+0x6a/0x130
    [94859.997350]  apic_timer_interrupt+0xf/0x20
    [94859.997352]  </IRQ>
    [94859.997352] RIP: 0010:xas_load+0x53/0x80
    [94859.997354] Code: 41 38 48 10 77 ed 49 8b 50 08 48 d3 ea 83 e2 3f 89 d0 48 8d 44 c6 28 48 8b 00 49 89 70 18 48 89 c1 83 e1 03 48 83 f9 02 75 18 <48> 3d fd 00 00 00 77 10 48 c1 e8 02 89 c2 89 c0 48 8d 44 c6 28 48
    [94859.997355] RSP: 0018:ffffa871815abb30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
    [94859.997357] RAX: ffff9be1de0c3232 RBX: 000000000000000f RCX: 0000000000000002
    [94859.997358] RDX: 0000000000000000 RSI: ffff9be3efa61240 RDI: ffffa871815abb48
    [94859.997359] RBP: 0000000000000001 R08: ffffa871815abb48 R09: ffffa871815abb48
    [94859.997360] R10: ffffffffffffffff R11: 0000000000000001 R12: ffffffffffffffff
    [94859.997361] R13: ffffa871815abc78 R14: ffffa871815abbf8 R15: 0000000000000000
    [94859.997362]  xas_find+0x183/0x1c0
    [94859.997364]  find_get_entries+0x219/0x2d0
    [94859.997366]  shmem_undo_range+0xec/0x8c0
    [94859.997368]  ? current_time+0x4a/0x90
    [94859.997370]  shmem_truncate_range+0x14/0x40
    [94859.997372]  shmem_evict_inode+0xe7/0x240
    [94859.997373]  ? var_wake_function+0x30/0x30
    [94859.997375]  evict+0xd2/0x1a0
    [94859.997376]  __dentry_kill+0xd5/0x170
    [94859.997378]  dentry_kill+0x4d/0x1a0
    [94859.997379]  dput.part.33+0xff/0x150
    [94859.997381]  __fput+0x10b/0x250
    [94859.997382]  delayed_fput+0x1c/0x30
    [94859.997384]  process_one_work+0x1d3/0x390
    [94859.997386]  ? process_one_work+0x390/0x390
    [94859.997388]  worker_thread+0x30/0x390
    [94859.997389]  ? process_one_work+0x390/0x390
    [94859.997391]  kthread+0x134/0x150
    [94859.997393]  ? set_kthread_struct+0x50/0x50
    [94859.997395]  ret_from_fork+0x1f/0x40

Environment

  • Red Hat Enterprise Linux 8.10
  • Seen both with and without the i915 Graphics Driver

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content