The server got hung up and then crashed due to the blocked tasks where real-time application threads were running utilizing 100% CPU time though stalld was installed and running

Solution Verified - Updated -

Issue

  • The server got hung up and then crashed due to the blocked tasks where real-time application threads were running utilizing 100% CPU time though stalld was installed and running.
[250396.966263] systemd[1]: systemd-journald.service: State 'stop-sigabrt' timed out. Terminating.
[250418.116445] INFO: task systemd-journal:1729 blocked for more than 180 seconds.
[250418.117922]       Tainted: G        W I      --------- -  - 4.18.0-348.2.1.rt7.132.el8_5.x86_64 #1
[250418.118535] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[250418.119154] task:systemd-journal state:D stack:    0 pid: 1729 ppid:     1 flags:0x00080324
[250418.119157] Call Trace:
[250418.119166]  __schedule+0x34d/0x880
[250418.119171]  ? _raw_spin_lock+0x13/0x40
[250418.119173]  preempt_schedule_lock+0x19/0x40
[250418.119176]  rt_spin_lock_slowlock_locked+0x10e/0x2c0
[250418.119179]  rt_read_lock+0xaf/0xf0
[250418.119186]  do_prlimit+0x14c/0x1e0
[250418.119188]  __x64_sys_prlimit64+0x133/0x270
[250418.119192]  ? syscall_trace_enter+0x144/0x310
[250418.119195]  do_syscall_64+0x87/0x1a0
[250418.119198]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[250418.119201] RIP: 0033:0x7f48f41f3064
[250418.119206] Code: Unable to access opcode bytes at RIP 0x7f48f41f303a.
[250418.119207] RSP: 002b:00007ffc55a2a148 EFLAGS: 00000246 ORIG_RAX: 000000000000012e
[250418.119210] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f48f41f3064
[250418.119211] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000000
[250418.119212] RBP: 00007ffc55a2a250 R08: 0000000000000000 R09: 0000000000000001
[250418.119213] R10: 00007ffc55a2a150 R11: 0000000000000246 R12: 00007ffc55a2a580
[250418.119214] R13: 0000000000000000 R14: 0000555fe4ebdef0 R15: 0000000000000000
[250418.119225] INFO: task lsmd:2466 blocked for more than 180 seconds.
[250418.132840]       Tainted: G        W I      --------- -  - 4.18.0-348.2.1.rt7.132.el8_5.x86_64 #1
[...]
[...]
[...]
[250418.160175] INFO: task runsvdir:8888 blocked for more than 180 seconds.
[250418.161214]       Tainted: G        W I      --------- -  - 4.18.0-348.2.1.rt7.132.el8_5.x86_64 #1
[250418.162291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[250418.163404] task:runsvdir        state:D stack:    0 pid: 8888 ppid:  8871 flags:0x00080000
[250418.163406] Call Trace:
[250418.163410]  __schedule+0x34d/0x880
[250418.163412]  ? _raw_spin_lock+0x13/0x40
[250418.163414]  preempt_schedule_lock+0x19/0x40
[250418.163416]  rt_spin_lock_slowlock_locked+0x10e/0x2c0
[250418.163419]  rt_read_lock+0xaf/0xf0
[250418.163422]  do_wait+0xe9/0x280
[250418.163425]  kernel_wait4+0xa6/0x140
[250418.163428]  ? task_stopped_code+0x90/0x90
[250418.163430]  __do_sys_wait4+0x83/0x90
[250418.163433]  ? preempt_count_add+0x49/0xa0
[250418.163436]  ? migrate_enable+0x118/0x3a0
[250418.163439]  ? __do_sys_newstat+0x48/0x70
[250418.163441]  ? recalc_sigpending+0x17/0x50
[250418.163444]  ? sched_clock+0x5/0x10
[250418.163445]  ? get_vtime_delta+0x13/0xc0
[250418.163449]  do_syscall_64+0x87/0x1a0
[250418.163452]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[250418.163454] RIP: 0033:0x7ff79f0fbaab
[250418.163457] Code: Unable to access opcode bytes at RIP 0x7ff79f0fba81.
[250418.163458] RSP: 002b:00007ffc099698a8 EFLAGS: 00000246 ORIG_RAX: 000000000000003d
[250418.163459] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff79f0fbaab
[250418.163461] RDX: 0000000000000001 RSI: 00007ffc099698bc RDI: 00000000ffffffff
[250418.163462] RBP: 0000000000000005 R08: 00000000171055c4 R09: 000000001dd14a3e
[250418.163463] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000061ec0d20
[250418.163464] R13: 400000000000000a R14: 0000000000000000 R15: 0000000000000000
[250418.163470] INFO: task calico-node:10065 blocked for more than 180 seconds.
[250418.164571]       Tainted: G        W I      --------- -  - 4.18.0-348.2.1.rt7.132.el8_5.x86_64 #1
[250418.165682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[250418.166811] task:calico-node     state:D stack:    0 pid:10065 ppid:  9921 flags:0x00080000
[250418.166813] Call Trace:
[250418.166817]  __schedule+0x34d/0x880
[250418.166820]  ? _raw_spin_lock+0x13/0x40
[250418.166822]  preempt_schedule_lock+0x19/0x40
[250418.166824]  rt_spin_lock_slowlock_locked+0x10e/0x2c0
[250418.166827]  rt_spin_lock_slowlock+0x50/0x80
[250418.166830]  rt_write_lock+0x1e/0x1d0
[250418.166833]  copy_process+0x11e4/0x1bf0
[250418.166837]  _do_fork+0x8e/0x3a0
[250418.166840]  ? sched_clock+0x5/0x10
[250418.166842]  ? get_vtime_delta+0x13/0xc0
[250418.166845]  do_syscall_64+0x87/0x1a0
[250418.166848]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[250418.166849] RIP: 0033:0x4c7351
[250418.166852] Code: Unable to access opcode bytes at RIP 0x4c7327.
[250418.166853] RSP: 002b:000000c000674208 EFLAGS: 00000206 ORIG_RAX: 0000000000000038
[250418.166855] RAX: ffffffffffffffda RBX: 000000c000788800 RCX: 00000000004c7351
[250418.166856] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000004111
[250418.166857] RBP: 000000c0006743a0 R08: 0000000000000000 R09: 0000000000000000
[250418.166858] R10: 0000000000000000 R11: 0000000000000206 R12: 00000000004bd3b8
[250418.166859] R13: 0000000000000018 R14: 0000000000000017 R15: 0000000000000100
[250418.166926] NMI backtrace for cpu 73
[250418.166928] CPU: 73 PID: 825 Comm: khungtaskd Kdump: loaded Tainted: G        W I      --------- -  - 4.18.0-348.2.1.rt7.132.el8_5.x86_64 #1
[250418.166929] Hardware name: Intel Corporation S2600WFQ/S2600WFQ, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
[250418.166929] Call Trace:
[250418.166935]  dump_stack+0x5c/0x80
[250418.166937]  nmi_cpu_backtrace.cold.4+0x13/0x4e
[250418.166939]  nmi_trigger_cpumask_backtrace+0x116/0x129
[250418.166943]  watchdog+0x23b/0x360
[250418.166946]  ? hungtask_pm_notify+0x40/0x40
[250418.166948]  kthread+0x15d/0x180
[250418.166949]  ? __kthread_parkme+0xa0/0xa0
[250418.166952]  ret_from_fork+0x1f/0x40

Environment

  • Red Hat Enterprise Linux 8.5 Realtime
    • kernel-rt-4.18.0-348.2.1.rt7.132.el8_5

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content