RHEL8: Lots of list_del corruption, soft lockup, and rcu_sched CPU stall messages followed by the crash

Solution Verified - Updated -

Issue

  • Lots of list_del corruption, soft lockup, and rcu_sched CPU stall messages followed by the crash
    ...
[13424923.747327] ------------[ cut here ]------------
[13424923.747327] list_del corruption, ffff8e9cc5d0d0c0->prev is LIST_POISON2 (dead000000000200)
[13424923.747333] WARNING: CPU: 3 PID: 109042 at lib/list_debug.c:50 __list_del_entry_valid+0x62/0x90
[13424923.747334] Modules linked in: macsec tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nf_tables nfnetlink dm_mod arc4 md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs ccm dns_resolver nfsv3 nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul sg i2c_piix4 virtio_balloon ghash_clmulni_intel pcspkr joydev binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel libata virtio_net serio_raw net_failover virtio_scsi failover
[13424923.747352] CPU: 3 PID: 109042 Comm: httpd Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-147.8.1.el8_1.x86_64 #1
[13424923.747352] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[13424923.747353] RIP: 0010:__list_del_entry_valid+0x62/0x90
[13424923.747354] Code: 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 88 b1 2c 95 e8 ac 19 c9 ff 0f 0b 31 c0 c3 48 89 fe 48 c7 c7 c0 b1 2c 95 e8 98 19 c9 ff <0f> 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 f8 b1 2c 95 e8 81 19 c9
[13424923.747354] RSP: 0018:ffffa2e30844fcd8 EFLAGS: 00010286
[13424923.747355] RAX: 0000000000000000 RBX: ffff8e9cbede2000 RCX: 0000000000000006
[13424923.747355] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8e9da64d6a00
[13424923.747356] RBP: ffff8e9cc5d0d0c0 R08: 0000000005cb724d R09: 0000000000000004
[13424923.747356] R10: 0000000000000000 R11: 0000000000000001 R12: 000000000004802c
[13424923.747357] R13: ffff8e9cecb49568 R14: ffff8e9cecb49560 R15: ffff8e9cecb49564
[13424923.747358] FS:  00007f40fcf42700(0000) GS:ffff8e9da64c0000(0000) knlGS:0000000000000000
[13424923.747358] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13424923.747359] CR2: 00007f410b026775 CR3: 000000056a60a003 CR4: 00000000007606e0
[13424923.747361] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13424923.747361] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13424923.747361] PKRU: 55555554
[13424923.747362] Call Trace:
[13424923.747363]  exit_sem+0x1a2/0x5c1
[13424923.747364]  do_exit+0x2ac/0xb40
[13424923.747365]  ? __switch_to_asm+0x41/0x70
[13424923.747367]  ? __switch_to_asm+0x35/0x70
[13424923.747368]  do_group_exit+0x3a/0xa0
[13424923.747369]  get_signal+0x159/0x850
[13424923.747370]  do_signal+0x36/0x610
[13424923.747371]  ? _copy_to_user+0x26/0x30
[13424923.747372]  ? poll_select_copy_remaining+0xde/0x150
[13424923.747373]  ? kern_select+0xc7/0x110
[13424923.747375]  exit_to_usermode_loop+0x89/0xf0
[13424923.747376]  do_syscall_64+0x182/0x1b0
[13424923.747377]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[13424923.747378] RIP: 0033:0x7f410b02679f
[13424923.747379] Code: Bad RIP value.
[13424923.747380] RSP: 002b:00007f40fcf41c60 EFLAGS: 00000293 ORIG_RAX: 0000000000000017
[13424923.747380] RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: 00007f410b02679f
[13424923.747381] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[13424923.747381] RBP: 0000000000000000 R08: 00007f40fcf41ca0 R09: 0000000000000000
[13424923.747382] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[13424923.747382] R13: 00007f40fcf41ca0 R14: 0000000000000000 R15: 0000000000000001
[13424923.747383] ---[ end trace d19d31c98621d342 ]---
    ...
[13424923.747384] ------------[ cut here ]------------
[13424923.747384] list_del corruption, ffff8e9cc5d0d0e8->next is LIST_POISON1 (dead000000000100)
[13424923.747392] WARNING: CPU: 3 PID: 109042 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90
[13424923.747392] Modules linked in: macsec tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nf_tables n
fnetlink dm_mod arc4 md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs ccm dns_resolver nfsv3 nfs_acl nfs lockd grace fscache sun
rpc crct10dif_pclmul crc32_pclmul sg i2c_piix4 virtio_balloon ghash_clmulni_intel pcspkr joydev binfmt_misc ip_tables xfs libcrc32
c sr_mod cdrom sd_mod ata_generic bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_i
ntel libata virtio_net serio_raw net_failover virtio_scsi failover
[13424923.747409] CPU: 3 PID: 109042 Comm: httpd Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-147.8.1.el8_1.x86_
64 #1
[13424923.747409] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[13424923.747410] RIP: 0010:__list_del_entry_valid+0x4e/0x90
[13424923.747411] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 88 b1 2c 95 e8 ac 19 c9 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 c0 b1 2c 95 e8 98 19 c9 ff 0f 0b
[13424923.747411] RSP: 0018:ffffa2e30844fcd8 EFLAGS: 00010286
[13424923.747413] RAX: 0000000000000000 RBX: ffff8e9cbede2000 RCX: 0000000000000006
[13424923.747413] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8e9da64d6a00
[13424923.747413] RBP: ffff8e9cc5d0d0c0 R08: 0000000005cb7278 R09: 0000000000000004
[13424923.747414] R10: 0000000000000000 R11: 0000000000000001 R12: 000000000004802c
[13424923.747414] R13: ffff8e9cecb49568 R14: ffff8e9cecb49560 R15: ffff8e9cecb49564
[13424923.747415] FS:  00007f40fcf42700(0000) GS:ffff8e9da64c0000(0000) knlGS:0000000000000000
[13424923.747416] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13424923.747416] CR2: 00007f410b026775 CR3: 000000056a60a003 CR4: 00000000007606e0
[13424923.747418] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13424923.747419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13424923.747419] PKRU: 55555554
[13424923.747420] Call Trace:
[13424923.747421]  exit_sem+0x16f/0x5c1
[13424923.747422]  do_exit+0x2ac/0xb40
[13424923.747423]  ? __switch_to_asm+0x41/0x70
[13424923.747425]  ? __switch_to_asm+0x35/0x70
[13424923.747426]  do_group_exit+0x3a/0xa0
[13424923.747428]  get_signal+0x159/0x850
[13424923.747429]  do_signal+0x36/0x610
[13424923.747430]  ? _copy_to_user+0x26/0x30
[13424923.747431]  ? poll_select_copy_remaining+0xde/0x150
[13424923.747432]  ? kern_select+0xc7/0x110
[13424923.747434]  exit_to_usermode_loop+0x89/0xf0
[13424923.747435]  do_syscall_64+0x182/0x1b0
[13424923.747436]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[13424923.747437] RIP: 0033:0x7f410b02679f
[13424923.747438] Code: Bad RIP value.
[13424923.747439] RSP: 002b:00007f40fcf41c60 EFLAGS: 00000293 ORIG_RAX: 0000000000000017
[13424923.747440] RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: 00007f410b02679f
[13424923.747440] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[13424923.747441] RBP: 0000000000000000 R08: 00007f40fcf41ca0 R09: 0000000000000000
[13424923.747441] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[13424923.747442] R13: 00007f40fcf41ca0 R14: 0000000000000000 R15: 0000000000000001
[13424923.747442] ---[ end trace d19d31c98621d343 ]---
[13424923.747443] ------------[ cut here ]------------
    ...
[13424948.099646] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [httpd:109042]
[13424948.099712] Modules linked in: macsec tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nf_tables n
fnetlink dm_mod arc4 md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs ccm dns_resolver nfsv3 nfs_acl nfs lockd grace fscache sun
rpc crct10dif_pclmul crc32_pclmul sg i2c_piix4 virtio_balloon ghash_clmulni_intel pcspkr joydev binfmt_misc ip_tables xfs libcrc32
c sr_mod cdrom sd_mod ata_generic bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_i
ntel libata virtio_net serio_raw net_failover virtio_scsi failover
[13424948.099750] CPU: 3 PID: 109042 Comm: httpd Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-147.8.1.el8_1.x86_64 #1
[13424948.099751] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[13424948.099759] RIP: 0010:__slab_free+0xf5/0x340
[13424948.099762] Code: 54 24 08 44 8b 44 24 18 48 8b 54 24 10 44 0f b6 5c 24 1f 48 89 04 24 0f b6 74 24 20 41 8b 47 08 4c 8b 4c 24 28 48 8b 4c 24 58 <a9> 00 00 00 40 74 45 4c 89 e8 f0 49 0f c7 4c 24 20 0f 94 c0 84 c0
[13424948.099763] RSP: 0018:ffff8e9da64c3e60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[13424948.099764] RAX: 0000000040000000 RBX: ffff8e9cc5d0d0c0 RCX: 00000000802a189b
[13424948.099765] RDX: 00000000802a189c RSI: 0000000000000001 RDI: ffff8e9907c0f400
[13424948.099765] RBP: ffff8e9da64c3f00 R08: 0000000000000001 R09: 0000000000000000
[13424948.099766] R10: ffff8e9cc5d0d0c0 R11: 002fb24fcd4df600 R12: ffffd22a13174340
[13424948.099767] R13: ffff8e9cc5d0d0c0 R14: 00000000802a189b R15: ffff8e9907c0f400
[13424948.099768] FS:  00007f40fcf42700(0000) GS:ffff8e9da64c0000(0000) knlGS:0000000000000000
[13424948.099769] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13424948.099769] CR2: 00007f410b026775 CR3: 000000056a60a003 CR4: 00000000007606e0
[13424948.099772] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13424948.099773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13424948.099773] PKRU: 55555554
[13424948.099774] Call Trace:
[13424948.099776]  <IRQ>
[13424948.099779]  ? apic_timer_interrupt+0xa/0x20
[13424948.099786]  rcu_process_callbacks+0x3fa/0x460
[13424948.099788]  __do_softirq+0xe3/0x30a
[13424948.099793]  irq_exit+0x100/0x110
[13424948.099794]  smp_apic_timer_interrupt+0x74/0x140
[13424948.099796]  apic_timer_interrupt+0xf/0x20
[13424948.099797]  </IRQ>
[13424948.099801] RIP: 0010:__list_del_entry_valid+0x50/0x90
[13424948.099802] Code: 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 88 b1 2c 95 e8 ac 19 c9 ff 0f 0b <31> c0 c3 48 89 fe 48 c7 c7 c0 b1 2c 95 e8 98 19 c9 ff 0f 0b 31 c0
[13424948.099803] RSP: 0018:ffffa2e30844fcd8 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
[13424948.099804] RAX: 0000000000000000 RBX: ffff8e9cbede2000 RCX: 0000000000000006
[13424948.099804] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8e9da64d6a00
[13424948.099805] RBP: ffff8e9cc5d0d0c0 R08: 0000000005cbab3e R09: 0000000000000004
[13424948.099805] R10: 0000000000000000 R11: 0000000000000001 R12: 000000000004802c
[13424948.099806] R13: ffff8e9cecb49568 R14: ffff8e9cecb49560 R15: ffff8e9cecb49564
[13424948.099808]  ? __list_del_entry_valid+0x4e/0x90
[13424948.099811]  exit_sem+0x16f/0x5c1
[13424948.099813]  do_exit+0x2ac/0xb40
[13424948.099817]  ? __switch_to_asm+0x41/0x70
[13424948.099818]  ? __switch_to_asm+0x35/0x70
[13424948.099819]  do_group_exit+0x3a/0xa0
[13424948.099823]  get_signal+0x159/0x850
[13424948.099827]  do_signal+0x36/0x610
[13424948.099829]  ? _copy_to_user+0x26/0x30
[13424948.099832]  ? poll_select_copy_remaining+0xde/0x150
[13424948.099833]  ? kern_select+0xc7/0x110
[13424948.099837]  exit_to_usermode_loop+0x89/0xf0
[13424948.099839]  do_syscall_64+0x182/0x1b0
[13424948.099840]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[13424948.099842] RIP: 0033:0x7f410b02679f
[13424948.099848] Code: Bad RIP value.
[13424948.099848] RSP: 002b:00007f40fcf41c60 EFLAGS: 00000293 ORIG_RAX: 0000000000000017
[13424948.099849] RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: 00007f410b02679f
[13424948.099850] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[13424948.099850] RBP: 0000000000000000 R08: 00007f40fcf41ca0 R09: 0000000000000000
[13424948.099851] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[13424948.099851] R13: 00007f40fcf41ca0 R14: 0000000000000000 R15: 0000000000000001
    ...
[13430090.968834] general protection fault: 0000 [#1] SMP PTI
[13430090.969283] CPU: 3 PID: 109042 Comm: httpd Kdump: loaded Tainted: G        W    L   --------- -  - 4.18.0-147.8.1.el8_1.x86_64 #1
[13430090.970048] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[13430090.970430] RIP: 0010:__x86_indirect_thunk_rax+0x10/0x20
[13430090.970819] Code: ff ff ff 30 c0 e9 6b 6f c5 ff b9 f2 ff ff ff e9 6d 6f c5 ff 90 90 90 90 90 90 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 89 04 24 <c3> 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 e8 07 00 00 00 f3
[13430090.971641] RSP: 0018:ffff8e9da64c3f00 EFLAGS: 00010216
[13430090.972039] RAX: 0002000000000010 RBX: ffff8e9da64e3e40 RCX: 00000000802a000f
[13430090.972423] RDX: ffff8e9cc5d0d0d0 RSI: 0000000000000001 RDI: ffff8e9cc5d0d0d0
[13430090.972807] RBP: ffff8e9da64e3e70 R08: 0000000000000001 R09: 0000000000000000
[13430090.973201] R10: ffff8e9cc5d0d0c0 R11: 002fb24fcd4df600 R12: 7fffffffffffffff
[13430090.973601] R13: 0000000000000202 R14: 0000000000000009 R15: 0000000000000009
[13430090.973999] FS:  00007f40fcf42700(0000) GS:ffff8e9da64c0000(0000) knlGS:0000000000000000
[13430090.974399] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13430090.974805] CR2: 00007f410b026775 CR3: 000000056a60a003 CR4: 00000000007606e0
[13430090.975221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13430090.975618] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13430090.976010] PKRU: 55555554
[13430090.976387] Call Trace:
[13430090.976767]  <IRQ>
[13430090.977157]  ? rcu_process_callbacks+0x297/0x460
[13430090.977545]  ? __do_softirq+0xe3/0x30a
[13430090.977970]  ? irq_exit+0x100/0x110
[13430090.978375]  ? smp_apic_timer_interrupt+0x74/0x140
[13430090.978790]  ? apic_timer_interrupt+0xf/0x20
[13430090.979186]  </IRQ>
[13430090.979572]  ? __list_del_entry_valid+0x50/0x90
[13430090.979938]  ? __list_del_entry_valid+0x4e/0x90
[13430090.980315]  ? exit_sem+0x16f/0x5c1
[13430090.980683]  ? do_exit+0x2ac/0xb40
[13430090.981062]  ? __switch_to_asm+0x41/0x70
[13430090.981448]  ? __switch_to_asm+0x35/0x70
[13430090.981822]  ? do_group_exit+0x3a/0xa0
[13430090.982219]  ? get_signal+0x159/0x850
[13430090.982587]  ? do_signal+0x36/0x610
[13430090.982959]  ? _copy_to_user+0x26/0x30
[13430090.983300]  ? poll_select_copy_remaining+0xde/0x150
[13430090.983647]  ? kern_select+0xc7/0x110
[13430090.984023]  ? exit_to_usermode_loop+0x89/0xf0
[13430090.984389]  ? do_syscall_64+0x182/0x1b0
[13430090.984745]  ? entry_SYSCALL_64_after_hwframe+0x65/0xca
[13430090.985113] Modules linked in: macsec tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nf_tables nfnetlink dm_mod arc4 md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs ccm dns_resolver nfsv3 nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul sg i2c_piix4 virtio_balloon ghash_clmulni_intel pcspkr joydev binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel libata virtio_net serio_raw net_failover virtio_scsi failover

Environment

environment

  • Red Hat Enterprise Linux (RHEL) 8
  • as per current understanding, kernels starting from 4.18.0-80 (rhel8.0GA) up until (including) 4.18.0-193.81.1.el8_2 (rhel8.2.z) are affected
  • this issue was diagnosed on
    • kernel-4.18.0-147.8.1.el8_1 (rhel8.1.z)
    • kernel-4.18.0-193.6.3.el8_2 (rhel8.2.z)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content