RHEL8: Lots of list_del corruption, soft lockup, and rcu_sched CPU stall messages followed by the crash
Issue
- Lots of list_del corruption, soft lockup, and rcu_sched CPU stall messages followed by the crash
...
[13424923.747327] ------------[ cut here ]------------
[13424923.747327] list_del corruption, ffff8e9cc5d0d0c0->prev is LIST_POISON2 (dead000000000200)
[13424923.747333] WARNING: CPU: 3 PID: 109042 at lib/list_debug.c:50 __list_del_entry_valid+0x62/0x90
[13424923.747334] Modules linked in: macsec tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nf_tables nfnetlink dm_mod arc4 md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs ccm dns_resolver nfsv3 nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul sg i2c_piix4 virtio_balloon ghash_clmulni_intel pcspkr joydev binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel libata virtio_net serio_raw net_failover virtio_scsi failover
[13424923.747352] CPU: 3 PID: 109042 Comm: httpd Kdump: loaded Tainted: G W --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[13424923.747352] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[13424923.747353] RIP: 0010:__list_del_entry_valid+0x62/0x90
[13424923.747354] Code: 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 88 b1 2c 95 e8 ac 19 c9 ff 0f 0b 31 c0 c3 48 89 fe 48 c7 c7 c0 b1 2c 95 e8 98 19 c9 ff <0f> 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 f8 b1 2c 95 e8 81 19 c9
[13424923.747354] RSP: 0018:ffffa2e30844fcd8 EFLAGS: 00010286
[13424923.747355] RAX: 0000000000000000 RBX: ffff8e9cbede2000 RCX: 0000000000000006
[13424923.747355] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8e9da64d6a00
[13424923.747356] RBP: ffff8e9cc5d0d0c0 R08: 0000000005cb724d R09: 0000000000000004
[13424923.747356] R10: 0000000000000000 R11: 0000000000000001 R12: 000000000004802c
[13424923.747357] R13: ffff8e9cecb49568 R14: ffff8e9cecb49560 R15: ffff8e9cecb49564
[13424923.747358] FS: 00007f40fcf42700(0000) GS:ffff8e9da64c0000(0000) knlGS:0000000000000000
[13424923.747358] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13424923.747359] CR2: 00007f410b026775 CR3: 000000056a60a003 CR4: 00000000007606e0
[13424923.747361] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13424923.747361] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13424923.747361] PKRU: 55555554
[13424923.747362] Call Trace:
[13424923.747363] exit_sem+0x1a2/0x5c1
[13424923.747364] do_exit+0x2ac/0xb40
[13424923.747365] ? __switch_to_asm+0x41/0x70
[13424923.747367] ? __switch_to_asm+0x35/0x70
[13424923.747368] do_group_exit+0x3a/0xa0
[13424923.747369] get_signal+0x159/0x850
[13424923.747370] do_signal+0x36/0x610
[13424923.747371] ? _copy_to_user+0x26/0x30
[13424923.747372] ? poll_select_copy_remaining+0xde/0x150
[13424923.747373] ? kern_select+0xc7/0x110
[13424923.747375] exit_to_usermode_loop+0x89/0xf0
[13424923.747376] do_syscall_64+0x182/0x1b0
[13424923.747377] entry_SYSCALL_64_after_hwframe+0x65/0xca
[13424923.747378] RIP: 0033:0x7f410b02679f
[13424923.747379] Code: Bad RIP value.
[13424923.747380] RSP: 002b:00007f40fcf41c60 EFLAGS: 00000293 ORIG_RAX: 0000000000000017
[13424923.747380] RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: 00007f410b02679f
[13424923.747381] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[13424923.747381] RBP: 0000000000000000 R08: 00007f40fcf41ca0 R09: 0000000000000000
[13424923.747382] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[13424923.747382] R13: 00007f40fcf41ca0 R14: 0000000000000000 R15: 0000000000000001
[13424923.747383] ---[ end trace d19d31c98621d342 ]---
...
[13424923.747384] ------------[ cut here ]------------
[13424923.747384] list_del corruption, ffff8e9cc5d0d0e8->next is LIST_POISON1 (dead000000000100)
[13424923.747392] WARNING: CPU: 3 PID: 109042 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90
[13424923.747392] Modules linked in: macsec tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nf_tables n
fnetlink dm_mod arc4 md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs ccm dns_resolver nfsv3 nfs_acl nfs lockd grace fscache sun
rpc crct10dif_pclmul crc32_pclmul sg i2c_piix4 virtio_balloon ghash_clmulni_intel pcspkr joydev binfmt_misc ip_tables xfs libcrc32
c sr_mod cdrom sd_mod ata_generic bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_i
ntel libata virtio_net serio_raw net_failover virtio_scsi failover
[13424923.747409] CPU: 3 PID: 109042 Comm: httpd Kdump: loaded Tainted: G W --------- - - 4.18.0-147.8.1.el8_1.x86_
64 #1
[13424923.747409] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[13424923.747410] RIP: 0010:__list_del_entry_valid+0x4e/0x90
[13424923.747411] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 88 b1 2c 95 e8 ac 19 c9 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 c0 b1 2c 95 e8 98 19 c9 ff 0f 0b
[13424923.747411] RSP: 0018:ffffa2e30844fcd8 EFLAGS: 00010286
[13424923.747413] RAX: 0000000000000000 RBX: ffff8e9cbede2000 RCX: 0000000000000006
[13424923.747413] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8e9da64d6a00
[13424923.747413] RBP: ffff8e9cc5d0d0c0 R08: 0000000005cb7278 R09: 0000000000000004
[13424923.747414] R10: 0000000000000000 R11: 0000000000000001 R12: 000000000004802c
[13424923.747414] R13: ffff8e9cecb49568 R14: ffff8e9cecb49560 R15: ffff8e9cecb49564
[13424923.747415] FS: 00007f40fcf42700(0000) GS:ffff8e9da64c0000(0000) knlGS:0000000000000000
[13424923.747416] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13424923.747416] CR2: 00007f410b026775 CR3: 000000056a60a003 CR4: 00000000007606e0
[13424923.747418] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13424923.747419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13424923.747419] PKRU: 55555554
[13424923.747420] Call Trace:
[13424923.747421] exit_sem+0x16f/0x5c1
[13424923.747422] do_exit+0x2ac/0xb40
[13424923.747423] ? __switch_to_asm+0x41/0x70
[13424923.747425] ? __switch_to_asm+0x35/0x70
[13424923.747426] do_group_exit+0x3a/0xa0
[13424923.747428] get_signal+0x159/0x850
[13424923.747429] do_signal+0x36/0x610
[13424923.747430] ? _copy_to_user+0x26/0x30
[13424923.747431] ? poll_select_copy_remaining+0xde/0x150
[13424923.747432] ? kern_select+0xc7/0x110
[13424923.747434] exit_to_usermode_loop+0x89/0xf0
[13424923.747435] do_syscall_64+0x182/0x1b0
[13424923.747436] entry_SYSCALL_64_after_hwframe+0x65/0xca
[13424923.747437] RIP: 0033:0x7f410b02679f
[13424923.747438] Code: Bad RIP value.
[13424923.747439] RSP: 002b:00007f40fcf41c60 EFLAGS: 00000293 ORIG_RAX: 0000000000000017
[13424923.747440] RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: 00007f410b02679f
[13424923.747440] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[13424923.747441] RBP: 0000000000000000 R08: 00007f40fcf41ca0 R09: 0000000000000000
[13424923.747441] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[13424923.747442] R13: 00007f40fcf41ca0 R14: 0000000000000000 R15: 0000000000000001
[13424923.747442] ---[ end trace d19d31c98621d343 ]---
[13424923.747443] ------------[ cut here ]------------
...
[13424948.099646] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [httpd:109042]
[13424948.099712] Modules linked in: macsec tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nf_tables n
fnetlink dm_mod arc4 md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs ccm dns_resolver nfsv3 nfs_acl nfs lockd grace fscache sun
rpc crct10dif_pclmul crc32_pclmul sg i2c_piix4 virtio_balloon ghash_clmulni_intel pcspkr joydev binfmt_misc ip_tables xfs libcrc32
c sr_mod cdrom sd_mod ata_generic bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_i
ntel libata virtio_net serio_raw net_failover virtio_scsi failover
[13424948.099750] CPU: 3 PID: 109042 Comm: httpd Kdump: loaded Tainted: G W --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[13424948.099751] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[13424948.099759] RIP: 0010:__slab_free+0xf5/0x340
[13424948.099762] Code: 54 24 08 44 8b 44 24 18 48 8b 54 24 10 44 0f b6 5c 24 1f 48 89 04 24 0f b6 74 24 20 41 8b 47 08 4c 8b 4c 24 28 48 8b 4c 24 58 <a9> 00 00 00 40 74 45 4c 89 e8 f0 49 0f c7 4c 24 20 0f 94 c0 84 c0
[13424948.099763] RSP: 0018:ffff8e9da64c3e60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[13424948.099764] RAX: 0000000040000000 RBX: ffff8e9cc5d0d0c0 RCX: 00000000802a189b
[13424948.099765] RDX: 00000000802a189c RSI: 0000000000000001 RDI: ffff8e9907c0f400
[13424948.099765] RBP: ffff8e9da64c3f00 R08: 0000000000000001 R09: 0000000000000000
[13424948.099766] R10: ffff8e9cc5d0d0c0 R11: 002fb24fcd4df600 R12: ffffd22a13174340
[13424948.099767] R13: ffff8e9cc5d0d0c0 R14: 00000000802a189b R15: ffff8e9907c0f400
[13424948.099768] FS: 00007f40fcf42700(0000) GS:ffff8e9da64c0000(0000) knlGS:0000000000000000
[13424948.099769] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13424948.099769] CR2: 00007f410b026775 CR3: 000000056a60a003 CR4: 00000000007606e0
[13424948.099772] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13424948.099773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13424948.099773] PKRU: 55555554
[13424948.099774] Call Trace:
[13424948.099776] <IRQ>
[13424948.099779] ? apic_timer_interrupt+0xa/0x20
[13424948.099786] rcu_process_callbacks+0x3fa/0x460
[13424948.099788] __do_softirq+0xe3/0x30a
[13424948.099793] irq_exit+0x100/0x110
[13424948.099794] smp_apic_timer_interrupt+0x74/0x140
[13424948.099796] apic_timer_interrupt+0xf/0x20
[13424948.099797] </IRQ>
[13424948.099801] RIP: 0010:__list_del_entry_valid+0x50/0x90
[13424948.099802] Code: 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 88 b1 2c 95 e8 ac 19 c9 ff 0f 0b <31> c0 c3 48 89 fe 48 c7 c7 c0 b1 2c 95 e8 98 19 c9 ff 0f 0b 31 c0
[13424948.099803] RSP: 0018:ffffa2e30844fcd8 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
[13424948.099804] RAX: 0000000000000000 RBX: ffff8e9cbede2000 RCX: 0000000000000006
[13424948.099804] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8e9da64d6a00
[13424948.099805] RBP: ffff8e9cc5d0d0c0 R08: 0000000005cbab3e R09: 0000000000000004
[13424948.099805] R10: 0000000000000000 R11: 0000000000000001 R12: 000000000004802c
[13424948.099806] R13: ffff8e9cecb49568 R14: ffff8e9cecb49560 R15: ffff8e9cecb49564
[13424948.099808] ? __list_del_entry_valid+0x4e/0x90
[13424948.099811] exit_sem+0x16f/0x5c1
[13424948.099813] do_exit+0x2ac/0xb40
[13424948.099817] ? __switch_to_asm+0x41/0x70
[13424948.099818] ? __switch_to_asm+0x35/0x70
[13424948.099819] do_group_exit+0x3a/0xa0
[13424948.099823] get_signal+0x159/0x850
[13424948.099827] do_signal+0x36/0x610
[13424948.099829] ? _copy_to_user+0x26/0x30
[13424948.099832] ? poll_select_copy_remaining+0xde/0x150
[13424948.099833] ? kern_select+0xc7/0x110
[13424948.099837] exit_to_usermode_loop+0x89/0xf0
[13424948.099839] do_syscall_64+0x182/0x1b0
[13424948.099840] entry_SYSCALL_64_after_hwframe+0x65/0xca
[13424948.099842] RIP: 0033:0x7f410b02679f
[13424948.099848] Code: Bad RIP value.
[13424948.099848] RSP: 002b:00007f40fcf41c60 EFLAGS: 00000293 ORIG_RAX: 0000000000000017
[13424948.099849] RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: 00007f410b02679f
[13424948.099850] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[13424948.099850] RBP: 0000000000000000 R08: 00007f40fcf41ca0 R09: 0000000000000000
[13424948.099851] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[13424948.099851] R13: 00007f40fcf41ca0 R14: 0000000000000000 R15: 0000000000000001
...
[13430090.968834] general protection fault: 0000 [#1] SMP PTI
[13430090.969283] CPU: 3 PID: 109042 Comm: httpd Kdump: loaded Tainted: G W L --------- - - 4.18.0-147.8.1.el8_1.x86_64 #1
[13430090.970048] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[13430090.970430] RIP: 0010:__x86_indirect_thunk_rax+0x10/0x20
[13430090.970819] Code: ff ff ff 30 c0 e9 6b 6f c5 ff b9 f2 ff ff ff e9 6d 6f c5 ff 90 90 90 90 90 90 e8 07 00 00 00 f3 90 0f ae e8 eb f9 48 89 04 24 <c3> 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 e8 07 00 00 00 f3
[13430090.971641] RSP: 0018:ffff8e9da64c3f00 EFLAGS: 00010216
[13430090.972039] RAX: 0002000000000010 RBX: ffff8e9da64e3e40 RCX: 00000000802a000f
[13430090.972423] RDX: ffff8e9cc5d0d0d0 RSI: 0000000000000001 RDI: ffff8e9cc5d0d0d0
[13430090.972807] RBP: ffff8e9da64e3e70 R08: 0000000000000001 R09: 0000000000000000
[13430090.973201] R10: ffff8e9cc5d0d0c0 R11: 002fb24fcd4df600 R12: 7fffffffffffffff
[13430090.973601] R13: 0000000000000202 R14: 0000000000000009 R15: 0000000000000009
[13430090.973999] FS: 00007f40fcf42700(0000) GS:ffff8e9da64c0000(0000) knlGS:0000000000000000
[13430090.974399] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13430090.974805] CR2: 00007f410b026775 CR3: 000000056a60a003 CR4: 00000000007606e0
[13430090.975221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13430090.975618] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[13430090.976010] PKRU: 55555554
[13430090.976387] Call Trace:
[13430090.976767] <IRQ>
[13430090.977157] ? rcu_process_callbacks+0x297/0x460
[13430090.977545] ? __do_softirq+0xe3/0x30a
[13430090.977970] ? irq_exit+0x100/0x110
[13430090.978375] ? smp_apic_timer_interrupt+0x74/0x140
[13430090.978790] ? apic_timer_interrupt+0xf/0x20
[13430090.979186] </IRQ>
[13430090.979572] ? __list_del_entry_valid+0x50/0x90
[13430090.979938] ? __list_del_entry_valid+0x4e/0x90
[13430090.980315] ? exit_sem+0x16f/0x5c1
[13430090.980683] ? do_exit+0x2ac/0xb40
[13430090.981062] ? __switch_to_asm+0x41/0x70
[13430090.981448] ? __switch_to_asm+0x35/0x70
[13430090.981822] ? do_group_exit+0x3a/0xa0
[13430090.982219] ? get_signal+0x159/0x850
[13430090.982587] ? do_signal+0x36/0x610
[13430090.982959] ? _copy_to_user+0x26/0x30
[13430090.983300] ? poll_select_copy_remaining+0xde/0x150
[13430090.983647] ? kern_select+0xc7/0x110
[13430090.984023] ? exit_to_usermode_loop+0x89/0xf0
[13430090.984389] ? do_syscall_64+0x182/0x1b0
[13430090.984745] ? entry_SYSCALL_64_after_hwframe+0x65/0xca
[13430090.985113] Modules linked in: macsec tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nf_tables nfnetlink dm_mod arc4 md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs ccm dns_resolver nfsv3 nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul sg i2c_piix4 virtio_balloon ghash_clmulni_intel pcspkr joydev binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel libata virtio_net serio_raw net_failover virtio_scsi failover
Environment
environment
- Red Hat Enterprise Linux (RHEL) 8
- as per current understanding, kernels starting from 4.18.0-80 (rhel8.0GA) up until (including) 4.18.0-193.81.1.el8_2 (rhel8.2.z) are affected
- this issue was diagnosed on
- kernel-4.18.0-147.8.1.el8_1 (rhel8.1.z)
- kernel-4.18.0-193.6.3.el8_2 (rhel8.2.z)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.