Warnings from kernel/sched/core.c:1827, kernel/softirq.c:200, and some others followed by a strange hang-up state of a server, ultimately leading to a crash invoked by HP watchdog timer
Issue
- Warnings from kernel/sched/core.c:1827 (set_task_cpu()), kernel/softirq.c:200 (__local_bh_enable()), kernel/rcu/tree_plugin.h:301 (rcu_note_context_switch()), and kernel/softirq.c:296 (run_timersd()) followed by a strange hang-up condition of a server, ultimately leading to a crash invoked by HP watchdog timer:
[494522.724120] WARNING: CPU: 21 PID: 0 at kernel/sched/core.c:1827 set_task_cpu+0x1cd/0x1e0
[494522.724130] Modules linked in: ...
...
[494522.724203] CPU: 21 PID: 0 Comm: swapper/21 Kdump: loaded Not tainted 4.18.0-372.71.1.rt7.230.el8_6.x86_64 #1
[494522.724207] Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 04/20/2023
[494522.724208] RIP: 0010:set_task_cpu+0x1cd/0x1e0
[494522.724211] Code: 8b 50 0c 85 d2 0f 85 a8 fe ff ff 48 8b 00 a9 00 00 04 00 0f 84 9a fe ff ff e8 9f 05 ee ff e9 90 fe ff ff 0f 0b e9 6d fe ff ff <0f> 0b e9 74 fe ff ff 80 8b ec 08 00 00 04 e9 c1 fe ff ff 0f 1f 44
[494522.724214] RSP: 0018:ff6ab9acc6c34e20 EFLAGS: 00010002
[494522.724216] RAX: 0000000000000000 RBX: ff4089223c118000 RCX: dead000000000200
[494522.724218] RDX: ff40892199c6a9f8 RSI: 0000000000000001 RDI: ff4089223c118000
[494522.724219] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000029880
[494522.724221] R10: 0002a137b0ac7a4f R11: 0000000000000080 R12: 0000000000000001
[494522.724222] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000087
[494522.724224] FS: 0000000000000000(0000) GS:ff40892199940000(0000) knlGS:0000000000000000
[494522.724226] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[494522.724227] CR2: 00007fed7ae6d300 CR3: 0000001cd621a001 CR4: 0000000000771ea0
[494522.724229] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[494522.724230] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[494522.724232] PKRU: 55555554
[494522.724233] Call Trace:
[494522.724235] <IRQ>
[494522.724238] push_rt_task.part.49+0x22c/0x2c0
[494522.724242] push_rt_tasks.part.50+0x1d/0x30
[494522.724245] ttwu_do_wakeup+0x53/0x1b0
[494522.724248] try_to_wake_up+0x23e/0x6c0
[494522.724251] __handle_irq_event_percpu+0x90/0x240
[494522.724255] ? intel_idle_irq+0x82/0xe0
[494522.724261] handle_irq_event_percpu+0x55/0xa0
[494522.724263] ? tick_nohz_next_event+0x86/0x180
[494522.724267] handle_irq_event+0x58/0x9d
[494522.724270] handle_edge_irq+0xb3/0x250
[494522.724274] handle_irq+0x1f/0x30
[494522.724278] do_IRQ+0x79/0x130
[494522.724282] common_interrupt+0xf/0xf
[494522.724284] </IRQ>
[494522.724285] RIP: 0010:intel_idle_irq+0x82/0xe0
[494522.724288] Code: 75 28 48 8b 00 a9 00 00 04 00 75 1e 8b 05 02 6d 82 01 85 c0 7e 07 0f 00 2d 97 32 4c 00 c1 ee 18 b9 01 00 00 00 89 f0 0f 01 c9 <65> 48 8b 04 25 40 6e 01 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b
[494522.724290] RSP: 0018:ff6ab9acc63dfe70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[494522.724292] RAX: 0000000000000000 RBX: ffffffffa8eeace8 RCX: 0000000000000001
[494522.724293] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[494522.724294] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000029880
[494522.724295] R10: 0002a137b0abc85d R11: ff40892199968c44 R12: ff9cb9acbf341538
[494522.724296] R13: ffffffffa8eeac60 R14: 0000000000000001 R15: 0000000000000000
[494522.724300] cpuidle_enter_state+0x8c/0x470
[494522.724305] ? tick_nohz_stop_tick+0x1f/0x220
[494522.724308] cpuidle_enter+0x2c/0x40
[494522.724311] do_idle+0x2be/0x320
[494522.724316] cpu_startup_entry+0x46/0x50
[494522.724319] start_secondary+0x19f/0x1e0
[494522.724323] secondary_startup_64_no_verify+0xc2/0xcb
[494522.724329] ---[ end trace 0000000000000002 ]---
...
[494523.727659] ------------[ cut here ]------------
[494523.727662] DEBUG_LOCKS_WARN_ON(current->softirq_disable_cnt != this_cpu_read(softirq_ctrl.cnt))
[494523.727668] WARNING: CPU: 1 PID: 1296 at kernel/softirq.c:200 __local_bh_enable+0x101/0x110
[494523.727678] Modules linked in: ...
...
[494523.727759] CPU: 1 PID: 1296 Comm: irq/225-nvme1q2 Kdump: loaded Tainted: G W --------- - - 4.18.0-372.71.1.rt7.230.el8_6.x86_64 #1
[494523.727762] Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 04/20/2023
[494523.727764] RIP: 0010:__local_bh_enable+0x101/0x110
[494523.727767] Code: 00 85 c0 0f 84 53 ff ff ff 8b 15 42 c9 ad 01 85 d2 0f 85 45 ff ff ff 48 c7 c6 30 bd 4c a8 48 c7 c7 e7 59 4b a8 e8 4c 93 ff ff <0f> 0b e9 2b ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48
[494523.727770] RSP: 0000:ff6ab9acca517e58 EFLAGS: 00010082
[494523.727772] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000001
[494523.727774] RDX: 0000000000000000 RSI: ffffffffa85088d7 RDI: 00000000ffffffff
[494523.727776] RBP: 0000000000000001 R08: ffffffffa8a6b3a0 R09: 0000000000000000
[494523.727777] R10: ffffffffa97cb3dd R11: 0000000000000000 R12: 0000000000000200
[494523.727778] R13: ff4089223dcf2000 R14: ffffffffa7560f20 R15: 0000000000000001
[494523.727780] FS: 0000000000000000(0000) GS:ff40892199440000(0000) knlGS:0000000000000000
[494523.727782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[494523.727783] CR2: 0000000000000000 CR3: 0000001cd621a006 CR4: 0000000000771ea0
[494523.727785] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[494523.727786] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[494523.727787] PKRU: 55555554
[494523.727788] Call Trace:
[494523.727793] __local_bh_enable_ip+0x68/0x100
[494523.727797] irq_forced_thread_fn+0x6d/0x80
[494523.727803] irq_thread+0xd0/0x180
[494523.727805] ? preempt_count_add+0x70/0xa0
[494523.727810] ? wake_threads_waitq+0x30/0x30
[494523.727813] ? irq_thread_check_affinity+0x20/0x20
[494523.727815] kthread+0x151/0x170
[494523.727819] ? set_kthread_struct+0x50/0x50
[494523.727822] ret_from_fork+0x1f/0x40
[494523.727828] ---[ end trace 0000000000000003 ]---
...
[494523.929277] WARNING: CPU: 1 PID: 1296 at kernel/rcu/tree_plugin.h:301 rcu_note_context_switch+0x406/0x640
[494523.929286] Modules linked in: ...
...
[494523.929346] CPU: 1 PID: 1296 Comm: irq/225-nvme1q2 Kdump: loaded Tainted: G W --------- - - 4.18.0-372.71.1.rt7.230.el8_6.x86_64 #1
[494523.929348] Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 04/20/2023
[494523.929349] RIP: 0010:rcu_note_context_switch+0x406/0x640
[494523.929351] Code: ff 45 85 e4 0f 8e 27 fe ff ff e9 80 fe ff ff c6 43 15 00 48 8b 73 20 ba 01 00 00 00 48 8b 7b 18 e8 0f ac ff ff e9 9c fc ff ff <0f> 0b e9 60 fc ff ff 48 8b 45 08 f0 83 44 24 fc 00 48 8b 75 20 48
[494523.929353] RSP: 0000:ff6ab9acca517e08 EFLAGS: 00010046
[494523.929355] RAX: ff4089223c118000 RBX: ff4089219946af40 RCX: 000000007dace0b8
[494523.929356] RDX: ff4089219941be00 RSI: ffffffffa85088d7 RDI: 0000000000000000
[494523.929357] RBP: ff6ab9acca517e98 R08: 0000000000000002 R09: ff4089223dcf2ce0
[494523.929358] R10: ff40890415e42e18 R11: 0000000000000444 R12: ff4089223c118000
[494523.929359] R13: 0000000000000000 R14: ff4089219946ac00 R15: 0000000000000000
[494523.929360] FS: 0000000000000000(0000) GS:ff40892199440000(0000) knlGS:0000000000000000
[494523.929361] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[494523.929362] CR2: 0000000000000000 CR3: 0000001cd621a006 CR4: 0000000000771ea0
[494523.929364] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[494523.929364] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[494523.929365] PKRU: 55555554
[494523.929366] Call Trace:
[494523.929369] __schedule+0x9d/0x8e0
[494523.929374] ? __wake_up_common_lock+0x81/0xb0
[494523.929379] ? irq_finalize_oneshot.part.48+0xf0/0xf0
[494523.929382] schedule+0x6c/0x120
[494523.929385] irq_thread+0x8e/0x180
[494523.929387] ? preempt_count_add+0x70/0xa0
[494523.929390] ? wake_threads_waitq+0x30/0x30
[494523.929392] ? irq_thread_check_affinity+0x20/0x20
[494523.929394] kthread+0x151/0x170
[494523.929398] ? set_kthread_struct+0x50/0x50
[494523.929400] ret_from_fork+0x1f/0x40
[494523.929404] ---[ end trace 0000000000000004 ]---
...
[494524.934556] WARNING: CPU: 1 PID: 27 at kernel/softirq.c:296 run_timersd+0xa2/0xb0
[494524.934568] Modules linked in: ...
...
[494524.934648] CPU: 1 PID: 27 Comm: ktimers/1 Kdump: loaded Tainted: G W --------- - - 4.18.0-372.71.1.rt7.230.el8_6.x86_64 #1
[494524.934652] Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 04/20/2023
[494524.934653] RIP: 0010:run_timersd+0xa2/0xb0
[494524.934656] Code: 00 8b 80 88 0c 00 00 65 8b 15 12 38 b2 58 25 00 ff 00 00 81 e2 00 00 ff 00 09 c2 75 0d fb 66 0f 1f 44 00 00 5b e9 2e de b0 00 <0f> 0b eb ef 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 3c
[494524.934659] RSP: 0000:ff6ab9acc6567ed8 EFLAGS: 00010006
[494524.934662] RAX: 000000000000fe00 RBX: 0000000000000102 RCX: 0000000000000001
[494524.934663] RDX: 000000000000fe00 RSI: 0000000000000001 RDI: 00000000ffffffff
[494524.934665] RBP: ff408902c00257d0 R08: ff408902c736c000 R09: 0000000000000001
[494524.934666] R10: 0002a13875d6bf6b R11: 0000000000000000 R12: ffffffffa8a59320
[494524.934667] R13: 0000000000000001 R14: 0000000000000001 R15: ff408902c00257d0
[494524.934669] FS: 0000000000000000(0000) GS:ff40892199440000(0000) knlGS:0000000000000000
[494524.934670] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[494524.934671] CR2: 0000000000000000 CR3: 0000001cd621a006 CR4: 0000000000771ea0
[494524.934673] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[494524.934674] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[494524.934675] PKRU: 55555554
[494524.934676] Call Trace:
[494524.934680] smpboot_thread_fn+0x1c1/0x2b0
[494524.934685] ? smpboot_register_percpu_thread_cpumask+0x140/0x140
[494524.934688] kthread+0x151/0x170
[494524.934693] ? set_kthread_struct+0x50/0x50
[494524.934696] ret_from_fork+0x1f/0x40
[494524.934703] ---[ end trace 0000000000000005 ]---
...
[514298.553054] Kernel panic - not syncing:
[514298.553058] 04: An NMI occurred. Depending on your system the reason for the NMI is logged in any one of the following resources:
1. Integrated Management Log (IML)
2. OA Syslog
3. OA Forward Progress Log
4. iLO Event Log
[514298.553061] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G W --------- - - 4.18.0-372.71.1.rt7.230.el8_6.x86_64 #1
[514298.553065] Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 04/20/2023
[514298.553067] Call Trace:
[514298.553070] <NMI>
[514298.553072] dump_stack+0x41/0x60
[514298.553080] panic+0xb9/0x2cd
[514298.553084] ? intel_pmu_handle_irq+0x160/0x480
[514298.553092] nmi_panic.cold.8+0xc/0xc
[514298.553094] hpwdt_pretimeout+0x7f/0xc6 [hpwdt]
[514298.553100] nmi_handle+0x5b/0x160
[514298.553104] unknown_nmi_error+0x19/0xa0
[514298.553106] do_nmi+0x1b9/0x250
[514298.553109] end_repeat_nmi+0x16/0x69
[514298.553113] RIP: 0010:intel_idle_irq+0x82/0xe0
[514298.553119] Code: 75 28 48 8b 00 a9 00 00 04 00 75 1e 8b 05 02 6d 82 01 85 c0 7e 07 0f 00 2d 97 32 4c 00 c1 ee 18 b9 01 00 00 00 89 f0 0f 01 c9 <65> 48 8b 04 25 40 6e 01 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b
[514298.553121] RSP: 0018:ffffffffa8a03e50 EFLAGS: 00000246
[514298.553124] RAX: 0000000000000000 RBX: ffffffffa8eeace8 RCX: 0000000000000001
[514298.553126] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[514298.553127] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000029880
[514298.553129] R10: 0002bc2221102315 R11: ff40892199428c44 R12: ff9cb9acbee01538
[514298.553130] R13: ffffffffa8eeac60 R14: 0000000000000001 R15: 0000000000000000
[514298.553134] ? intel_idle_irq+0x82/0xe0
[514298.553138] ? intel_idle_irq+0x82/0xe0
[514298.553141] </NMI>
[514298.553142] cpuidle_enter_state+0x8c/0x470
[514298.553148] cpuidle_enter+0x2c/0x40
[514298.553152] do_idle+0x2be/0x320
[514298.553158] cpu_startup_entry+0x46/0x50
[514298.553162] start_kernel+0x50c/0x530
[514298.553168] secondary_startup_64_no_verify+0xc2/0xcb
Environment
- Red Hat OpenShift Container Platform 4.12.43 or older
- RHCOS kernel-rt version 4.18.0-372.78.1.rt7.237.el8_6 or older
- HPE ProLiant series servers (but potentially occur on other hardware platforms)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.