Warnings from kernel/sched/core.c:1827, kernel/softirq.c:200, and some others followed by a strange hang-up state of a server, ultimately leading to a crash invoked by HP watchdog timer

Solution Verified - Updated -

Issue

  • Warnings from kernel/sched/core.c:1827 (set_task_cpu()), kernel/softirq.c:200 (__local_bh_enable()), kernel/rcu/tree_plugin.h:301 (rcu_note_context_switch()), and kernel/softirq.c:296 (run_timersd()) followed by a strange hang-up condition of a server, ultimately leading to a crash invoked by HP watchdog timer:
[494522.724120] WARNING: CPU: 21 PID: 0 at kernel/sched/core.c:1827 set_task_cpu+0x1cd/0x1e0
[494522.724130] Modules linked in: ...
    ...
[494522.724203] CPU: 21 PID: 0 Comm: swapper/21 Kdump: loaded Not tainted 4.18.0-372.71.1.rt7.230.el8_6.x86_64 #1
[494522.724207] Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 04/20/2023
[494522.724208] RIP: 0010:set_task_cpu+0x1cd/0x1e0
[494522.724211] Code: 8b 50 0c 85 d2 0f 85 a8 fe ff ff 48 8b 00 a9 00 00 04 00 0f 84 9a fe ff ff e8 9f 05 ee ff e9 90 fe ff ff 0f 0b e9 6d fe ff ff <0f> 0b e9 74 fe ff ff 80 8b ec 08 00 00 04 e9 c1 fe ff ff 0f 1f 44
[494522.724214] RSP: 0018:ff6ab9acc6c34e20 EFLAGS: 00010002
[494522.724216] RAX: 0000000000000000 RBX: ff4089223c118000 RCX: dead000000000200
[494522.724218] RDX: ff40892199c6a9f8 RSI: 0000000000000001 RDI: ff4089223c118000
[494522.724219] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000029880
[494522.724221] R10: 0002a137b0ac7a4f R11: 0000000000000080 R12: 0000000000000001
[494522.724222] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000087
[494522.724224] FS:  0000000000000000(0000) GS:ff40892199940000(0000) knlGS:0000000000000000
[494522.724226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[494522.724227] CR2: 00007fed7ae6d300 CR3: 0000001cd621a001 CR4: 0000000000771ea0
[494522.724229] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[494522.724230] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[494522.724232] PKRU: 55555554
[494522.724233] Call Trace:
[494522.724235]  <IRQ>
[494522.724238]  push_rt_task.part.49+0x22c/0x2c0
[494522.724242]  push_rt_tasks.part.50+0x1d/0x30
[494522.724245]  ttwu_do_wakeup+0x53/0x1b0
[494522.724248]  try_to_wake_up+0x23e/0x6c0
[494522.724251]  __handle_irq_event_percpu+0x90/0x240
[494522.724255]  ? intel_idle_irq+0x82/0xe0
[494522.724261]  handle_irq_event_percpu+0x55/0xa0
[494522.724263]  ? tick_nohz_next_event+0x86/0x180
[494522.724267]  handle_irq_event+0x58/0x9d
[494522.724270]  handle_edge_irq+0xb3/0x250
[494522.724274]  handle_irq+0x1f/0x30
[494522.724278]  do_IRQ+0x79/0x130
[494522.724282]  common_interrupt+0xf/0xf
[494522.724284]  </IRQ>
[494522.724285] RIP: 0010:intel_idle_irq+0x82/0xe0
[494522.724288] Code: 75 28 48 8b 00 a9 00 00 04 00 75 1e 8b 05 02 6d 82 01 85 c0 7e 07 0f 00 2d 97 32 4c 00 c1 ee 18 b9 01 00 00 00 89 f0 0f 01 c9 <65> 48 8b 04 25 40 6e 01 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b
[494522.724290] RSP: 0018:ff6ab9acc63dfe70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[494522.724292] RAX: 0000000000000000 RBX: ffffffffa8eeace8 RCX: 0000000000000001
[494522.724293] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[494522.724294] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000029880
[494522.724295] R10: 0002a137b0abc85d R11: ff40892199968c44 R12: ff9cb9acbf341538
[494522.724296] R13: ffffffffa8eeac60 R14: 0000000000000001 R15: 0000000000000000
[494522.724300]  cpuidle_enter_state+0x8c/0x470
[494522.724305]  ? tick_nohz_stop_tick+0x1f/0x220
[494522.724308]  cpuidle_enter+0x2c/0x40
[494522.724311]  do_idle+0x2be/0x320
[494522.724316]  cpu_startup_entry+0x46/0x50
[494522.724319]  start_secondary+0x19f/0x1e0
[494522.724323]  secondary_startup_64_no_verify+0xc2/0xcb
[494522.724329] ---[ end trace 0000000000000002 ]---
    ...
[494523.727659] ------------[ cut here ]------------
[494523.727662] DEBUG_LOCKS_WARN_ON(current->softirq_disable_cnt != this_cpu_read(softirq_ctrl.cnt))
[494523.727668] WARNING: CPU: 1 PID: 1296 at kernel/softirq.c:200 __local_bh_enable+0x101/0x110
[494523.727678] Modules linked in: ...
    ...
[494523.727759] CPU: 1 PID: 1296 Comm: irq/225-nvme1q2 Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-372.71.1.rt7.230.el8_6.x86_64 #1
[494523.727762] Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 04/20/2023
[494523.727764] RIP: 0010:__local_bh_enable+0x101/0x110
[494523.727767] Code: 00 85 c0 0f 84 53 ff ff ff 8b 15 42 c9 ad 01 85 d2 0f 85 45 ff ff ff 48 c7 c6 30 bd 4c a8 48 c7 c7 e7 59 4b a8 e8 4c 93 ff ff <0f> 0b e9 2b ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48
[494523.727770] RSP: 0000:ff6ab9acca517e58 EFLAGS: 00010082
[494523.727772] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000001
[494523.727774] RDX: 0000000000000000 RSI: ffffffffa85088d7 RDI: 00000000ffffffff
[494523.727776] RBP: 0000000000000001 R08: ffffffffa8a6b3a0 R09: 0000000000000000
[494523.727777] R10: ffffffffa97cb3dd R11: 0000000000000000 R12: 0000000000000200
[494523.727778] R13: ff4089223dcf2000 R14: ffffffffa7560f20 R15: 0000000000000001
[494523.727780] FS:  0000000000000000(0000) GS:ff40892199440000(0000) knlGS:0000000000000000
[494523.727782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[494523.727783] CR2: 0000000000000000 CR3: 0000001cd621a006 CR4: 0000000000771ea0
[494523.727785] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[494523.727786] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[494523.727787] PKRU: 55555554
[494523.727788] Call Trace:
[494523.727793]  __local_bh_enable_ip+0x68/0x100
[494523.727797]  irq_forced_thread_fn+0x6d/0x80
[494523.727803]  irq_thread+0xd0/0x180
[494523.727805]  ? preempt_count_add+0x70/0xa0
[494523.727810]  ? wake_threads_waitq+0x30/0x30
[494523.727813]  ? irq_thread_check_affinity+0x20/0x20
[494523.727815]  kthread+0x151/0x170
[494523.727819]  ? set_kthread_struct+0x50/0x50
[494523.727822]  ret_from_fork+0x1f/0x40
[494523.727828] ---[ end trace 0000000000000003 ]---
    ...
[494523.929277] WARNING: CPU: 1 PID: 1296 at kernel/rcu/tree_plugin.h:301 rcu_note_context_switch+0x406/0x640
[494523.929286] Modules linked in: ...
    ...
[494523.929346] CPU: 1 PID: 1296 Comm: irq/225-nvme1q2 Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-372.71.1.rt7.230.el8_6.x86_64 #1
[494523.929348] Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 04/20/2023
[494523.929349] RIP: 0010:rcu_note_context_switch+0x406/0x640
[494523.929351] Code: ff 45 85 e4 0f 8e 27 fe ff ff e9 80 fe ff ff c6 43 15 00 48 8b 73 20 ba 01 00 00 00 48 8b 7b 18 e8 0f ac ff ff e9 9c fc ff ff <0f> 0b e9 60 fc ff ff 48 8b 45 08 f0 83 44 24 fc 00 48 8b 75 20 48
[494523.929353] RSP: 0000:ff6ab9acca517e08 EFLAGS: 00010046
[494523.929355] RAX: ff4089223c118000 RBX: ff4089219946af40 RCX: 000000007dace0b8
[494523.929356] RDX: ff4089219941be00 RSI: ffffffffa85088d7 RDI: 0000000000000000
[494523.929357] RBP: ff6ab9acca517e98 R08: 0000000000000002 R09: ff4089223dcf2ce0
[494523.929358] R10: ff40890415e42e18 R11: 0000000000000444 R12: ff4089223c118000
[494523.929359] R13: 0000000000000000 R14: ff4089219946ac00 R15: 0000000000000000
[494523.929360] FS:  0000000000000000(0000) GS:ff40892199440000(0000) knlGS:0000000000000000
[494523.929361] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[494523.929362] CR2: 0000000000000000 CR3: 0000001cd621a006 CR4: 0000000000771ea0
[494523.929364] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[494523.929364] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[494523.929365] PKRU: 55555554
[494523.929366] Call Trace:
[494523.929369]  __schedule+0x9d/0x8e0
[494523.929374]  ? __wake_up_common_lock+0x81/0xb0
[494523.929379]  ? irq_finalize_oneshot.part.48+0xf0/0xf0
[494523.929382]  schedule+0x6c/0x120
[494523.929385]  irq_thread+0x8e/0x180
[494523.929387]  ? preempt_count_add+0x70/0xa0
[494523.929390]  ? wake_threads_waitq+0x30/0x30
[494523.929392]  ? irq_thread_check_affinity+0x20/0x20
[494523.929394]  kthread+0x151/0x170
[494523.929398]  ? set_kthread_struct+0x50/0x50
[494523.929400]  ret_from_fork+0x1f/0x40
[494523.929404] ---[ end trace 0000000000000004 ]---
    ...
[494524.934556] WARNING: CPU: 1 PID: 27 at kernel/softirq.c:296 run_timersd+0xa2/0xb0
[494524.934568] Modules linked in: ...
    ...
[494524.934648] CPU: 1 PID: 27 Comm: ktimers/1 Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-372.71.1.rt7.230.el8_6.x86_64 #1
[494524.934652] Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 04/20/2023
[494524.934653] RIP: 0010:run_timersd+0xa2/0xb0
[494524.934656] Code: 00 8b 80 88 0c 00 00 65 8b 15 12 38 b2 58 25 00 ff 00 00 81 e2 00 00 ff 00 09 c2 75 0d fb 66 0f 1f 44 00 00 5b e9 2e de b0 00 <0f> 0b eb ef 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 3c
[494524.934659] RSP: 0000:ff6ab9acc6567ed8 EFLAGS: 00010006
[494524.934662] RAX: 000000000000fe00 RBX: 0000000000000102 RCX: 0000000000000001
[494524.934663] RDX: 000000000000fe00 RSI: 0000000000000001 RDI: 00000000ffffffff
[494524.934665] RBP: ff408902c00257d0 R08: ff408902c736c000 R09: 0000000000000001
[494524.934666] R10: 0002a13875d6bf6b R11: 0000000000000000 R12: ffffffffa8a59320
[494524.934667] R13: 0000000000000001 R14: 0000000000000001 R15: ff408902c00257d0
[494524.934669] FS:  0000000000000000(0000) GS:ff40892199440000(0000) knlGS:0000000000000000
[494524.934670] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[494524.934671] CR2: 0000000000000000 CR3: 0000001cd621a006 CR4: 0000000000771ea0
[494524.934673] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[494524.934674] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[494524.934675] PKRU: 55555554
[494524.934676] Call Trace:
[494524.934680]  smpboot_thread_fn+0x1c1/0x2b0
[494524.934685]  ? smpboot_register_percpu_thread_cpumask+0x140/0x140
[494524.934688]  kthread+0x151/0x170
[494524.934693]  ? set_kthread_struct+0x50/0x50
[494524.934696]  ret_from_fork+0x1f/0x40
[494524.934703] ---[ end trace 0000000000000005 ]---
    ...
[514298.553054] Kernel panic - not syncing:
[514298.553058] 04: An NMI occurred. Depending on your system the reason for the NMI is logged in any one of the following resources:
                1. Integrated Management Log (IML)
                2. OA Syslog
                3. OA Forward Progress Log
                4. iLO Event Log
[514298.553061] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-372.71.1.rt7.230.el8_6.x86_64 #1
[514298.553065] Hardware name: HPE ProLiant DL110 Gen10 Plus/ProLiant DL110 Gen10 Plus, BIOS U56 04/20/2023
[514298.553067] Call Trace:
[514298.553070]  <NMI>
[514298.553072]  dump_stack+0x41/0x60
[514298.553080]  panic+0xb9/0x2cd
[514298.553084]  ? intel_pmu_handle_irq+0x160/0x480
[514298.553092]  nmi_panic.cold.8+0xc/0xc
[514298.553094]  hpwdt_pretimeout+0x7f/0xc6 [hpwdt]
[514298.553100]  nmi_handle+0x5b/0x160
[514298.553104]  unknown_nmi_error+0x19/0xa0
[514298.553106]  do_nmi+0x1b9/0x250
[514298.553109]  end_repeat_nmi+0x16/0x69
[514298.553113] RIP: 0010:intel_idle_irq+0x82/0xe0
[514298.553119] Code: 75 28 48 8b 00 a9 00 00 04 00 75 1e 8b 05 02 6d 82 01 85 c0 7e 07 0f 00 2d 97 32 4c 00 c1 ee 18 b9 01 00 00 00 89 f0 0f 01 c9 <65> 48 8b 04 25 40 6e 01 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b
[514298.553121] RSP: 0018:ffffffffa8a03e50 EFLAGS: 00000246
[514298.553124] RAX: 0000000000000000 RBX: ffffffffa8eeace8 RCX: 0000000000000001
[514298.553126] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[514298.553127] RBP: 0000000000000001 R08: 0000000000000002 R09: 0000000000029880
[514298.553129] R10: 0002bc2221102315 R11: ff40892199428c44 R12: ff9cb9acbee01538
[514298.553130] R13: ffffffffa8eeac60 R14: 0000000000000001 R15: 0000000000000000
[514298.553134]  ? intel_idle_irq+0x82/0xe0
[514298.553138]  ? intel_idle_irq+0x82/0xe0
[514298.553141]  </NMI>
[514298.553142]  cpuidle_enter_state+0x8c/0x470
[514298.553148]  cpuidle_enter+0x2c/0x40
[514298.553152]  do_idle+0x2be/0x320
[514298.553158]  cpu_startup_entry+0x46/0x50
[514298.553162]  start_kernel+0x50c/0x530
[514298.553168]  secondary_startup_64_no_verify+0xc2/0xcb

Environment

  • Red Hat OpenShift Container Platform 4.12.43 or older
    • RHCOS kernel-rt version 4.18.0-372.78.1.rt7.237.el8_6 or older
  • HPE ProLiant series servers (but potentially occur on other hardware platforms)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content