System hangs after "kernel: BUG: scheduling while atomic" printed in the messages when "traps" kernel module used.
Issue
-
At random systems crash with very high CPU usage with the following error message on the console
Mar 15 01:10:50 abc.123.xxx.yyy systemd[1]: Started Firewall Company Networks. Mar 15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: — starting — Mar 15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [fs] strategy [2] Mar 15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [stats] not exporting stats Mar 15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [callbacks] data-proto version [2] supported [1:2] Mar 15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [callbacks] enabled providers [0x7f] requested [0x7f] supported [0x3fff] Mar 15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [callbacks] enabled features [0x0] requested [0x0] supported [0x3] Mar 15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [auth-link] strategy [0] Mar 15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [auth-link] enabled flows [0x50] requested [0x50] supported [0x50] Mar 15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [auth-link] connected to pid [203773] Mar 15 01:10:55 abc.123.xxx.yyy kernel: TRAPS: [auth-link] flow registered [6] Mar 15 01:10:56 abc.123.xxx.yyy kernel: TRAPS: [data-link] mapped by [203773] Mar 15 01:10:56 abc.123.xxx.yyy kernel: TRAPS: [auth-link] flow registered [4] Mar 15 01:10:56 abc.123.xxx.yyy kernel: BUG: scheduling while atomic: grep/204090/0x00000002 Mar 15 01:10:56 abc.123.xxx.yyy kernel: Modules linked in: traps(O) tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_syslog nft_log nft_ct nft_chain_nat nf_nat nf_conntrack rfkill nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vfat fat vmw_balloon intel_rapl_msr intel_rapl_common rapl pcspkr vmw_vmci i2c_piix4 joydev ext4 mbcache jbd2 sr_mod cdrom vmwgfx drm_ttm_helper ata_generic ttm drm_kms_helper syscopyarea sysfillrect sysimgblt drm ahci sd_mod libahci ata_piix t10_pi sg libata crc32_pclmul vmxnet3 vmw_pvscsi serio_raw dm_mirror dm_region_hash dm_log dm_mod xxhash_generic ansi_cprng des_generic des3_ede_x86_64 libdes ghash_clmulni_intel crct10dif_pclmul crc32c_intel fuse [last unloaded: traps] Mar 15 01:10:56 abc.123.xxx.yyy kernel: CPU: 0 PID: 204090 Comm: grep Kdump: loaded Tainted: G O ------- --- 5.14.0-362.8.1.el9_3.x86_64 #1 Mar 15 01:10:56 abc.123.xxx.yyy kernel: Hardware name: VMware, Inc. VMware9,2/555BZ Desktop Reference Platform, BIOS ABC43.55X.34210386.X11.6230101555 10/25/2021 Mar 15 01:10:56 abc.123.xxx.yyy kernel: Call Trace: Mar 15 01:10:56 abc.123.xxx.yyy kernel: <TASK> Mar 15 01:10:56 abc.123.xxx.yyy kernel: dump_stack_lvl+0x34/0x48 Mar 15 01:10:56 abc.123.xxx.yyy kernel: __schedule_bug.cold+0x47/0x53 Mar 15 01:10:56 abc.123.xxx.yyy kernel: schedule_debug.constprop.0+0xc5/0x100 Mar 15 01:10:56 abc.123.xxx.yyy kernel: __schedule+0x48/0x550 Mar 15 01:10:56 abc.123.xxx.yyy kernel: ? __mod_timer+0x286/0x3d0 Mar 15 01:10:56 abc.123.xxx.yyy kernel: schedule+0x2d/0x70 Mar 15 01:10:56 abc.123.xxx.yyy kernel: schedule_timeout+0x88/0x160 Mar 15 01:10:56 abc.123.xxx.yyy kernel: ? __pfx_process_timeout+0x10/0x10 Mar 15 01:10:56 abc.123.xxx.yyy kernel: wait_queue_wait_on_ticket+0x126/0x200 [traps] Mar 15 01:10:56 abc.123.xxx.yyy kernel: ? __pfx_autoremove_wake_function+0x10/0x10 Mar 15 01:10:56 abc.123.xxx.yyy kernel: auth_link_send_request+0x94/0x190 [traps] Mar 15 01:10:56 abc.123.xxx.yyy kernel: exec_tracker_process_exec_sync+0x6c/0x90 [traps] Mar 15 01:10:56 abc.123.xxx.yyy kernel: probe_fini+0x1196/0x12f0 [traps] Mar 15 01:10:56 abc.123.xxx.yyy kernel: probe_fini+0x1299/0x12f0 [traps] Mar 15 01:10:56 abc.123.xxx.yyy kernel: exec_binprm+0x14e/0x1a0 Mar 15 01:10:56 abc.123.xxx.yyy kernel: bprm_execve.part.0+0x170/0x220 Mar 15 01:10:56 abc.123.xxx.yyy kernel: do_execveat_common.isra.0+0x1a2/0x220 Mar 15 01:10:56 abc.123.xxx.yyy kernel: __x64_sys_execve+0x33/0x40 Mar 15 01:10:56 abc.123.xxx.yyy kernel: do_syscall_64+0x5c/0x90 Mar 15 01:10:56 abc.123.xxx.yyy kernel: ? exc_page_fault+0x62/0x150 Mar 15 01:10:56 abc.123.xxx.yyy kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc Mar 15 01:10:56 abc.123.xxx.yyy kernel: RIP: 0033:0x7fabcb3efe70 Mar 15 01:10:56 abc.123.xxx.yyy kernel: Code: Unable to access opcode bytes at RIP 0x7fabcb3efe46. Mar 15 01:10:56 abc.123.xxx.yyy kernel: RSP: 002b:00007ffed9f19f00 EFLAGS: 00000200 ORIG_RAX: 000000000000003b Mar 15 01:10:56 abc.123.xxx.yyy kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000 Mar 15 01:10:56 abc.123.xxx.yyy kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Mar 15 01:10:56 abc.123.xxx.yyy kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 Mar 15 01:10:56 abc.123.xxx.yyy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Mar 15 01:10:56 abc.123.xxx.yyy kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Mar 15 01:10:56 abc.123.xxx.yyy kernel: </TASK>
-
Shortly thereafter the Soft Lockup is detected, and is continuously reported until the system is rebooted.
Mar 15 01:35:51 abc.123.xxx.yyy kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [systemd-coredum:204096] Mar 15 01:35:51 abc.123.xxx.yyy kernel: Modules linked in: traps(O) tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_syslog nft_log nft_ct nft_chain_nat nf_nat nf_conntrack rfkill nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vfat fat vmw_balloon intel_rapl_msr intel_rapl_common rapl pcspkr vmw_vmci i2c_piix4 joydev ext4 mbcache jbd2 sr_mod cdrom vmwgfx drm_ttm_helper ata_generic ttm drm_kms_helper syscopyarea sysfillrect sysimgblt drm ahci sd_mod libahci ata_piix t10_pi sg libata crc32_pclmul vmxnet3 vmw_pvscsi serio_raw dm_mirror dm_region_hash dm_log dm_mod xxhash_generic ansi_cprng des_generic des3_ede_x86_64 libdes ghash_clmulni_intel crct10dif_pclmul crc32c_intel fuse [last unloaded: traps] Mar 15 01:35:51 abc.123.xxx.yyy kernel: CPU: 0 PID: 204096 Comm: systemd-coredum Kdump: loaded Tainted: G W O ------- --- 5.14.0-362.8.1.el9_3.x86_64 #1 Mar 15 01:35:51 abc.123.xxx.yyy kernel: Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.21100432.B64.2301110304 01/11/2023 Mar 15 01:35:51 abc.123.xxx.yyy kernel: RIP: 0010:_raw_spin_lock+0x17/0x30 Mar 15 01:35:51 abc.123.xxx.yyy kernel: Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 65 ff 05 5c ae bc 6f 31 c0 ba 01 00 00 00 f0 0f b1 17 <75> 05 c3 cc cc cc cc 89 c6 e8 cb 01 00 00 66 90 c3 cc cc cc cc 0f Mar 15 01:35:51 abc.123.xxx.yyy kernel: RSP: 0000:ffffa448c1d27b10 EFLAGS: 00000246 Mar 15 01:35:51 abc.123.xxx.yyy kernel: RAX: 0000000000000000 RBX: 0000558c95d72000 RCX: 0000000197321067 Mar 15 01:35:51 abc.123.xxx.yyy kernel: RDX: 0000000000000001 RSI: ffffd4b100000000 RDI: ffffd4b1065cc868 Mar 15 01:35:51 abc.123.xxx.yyy kernel: RBP: ffff942417321b88 R08: ffff942280000b88 R09: ffffa448c1d27c70 Mar 15 01:35:51 abc.123.xxx.yyy kernel: R10: 0000558c95d70000 R11: ffff9423af439000 R12: 0000558c95d71000 Mar 15 01:35:51 abc.123.xxx.yyy kernel: R13: ffffa448c1d27ce0 R14: 0000000000000004 R15: ffff94238602e658 Mar 15 01:35:51 abc.123.xxx.yyy kernel: FS: 0000000000000000(0000) GS:ffff9426afc00000(0000) knlGS:0000000000000000 Mar 15 01:35:51 abc.123.xxx.yyy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 15 01:35:51 abc.123.xxx.yyy kernel: CR2: 00007f7c0de9ce46 CR3: 0000000273c10004 CR4: 00000000001706f0 Mar 15 01:35:51 abc.123.xxx.yyy kernel: Call Trace: Mar 15 01:35:51 abc.123.xxx.yyy kernel: <IRQ> Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? show_trace_log_lvl+0x1c4/0x2df Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? show_trace_log_lvl+0x1c4/0x2df Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? zap_pte_range+0xec/0xc30 Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? watchdog_timer_fn+0x1b2/0x210 Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? __pfx_watchdog_timer_fn+0x10/0x10 Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? __hrtimer_run_queues+0x12a/0x2c0 Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? hrtimer_interrupt+0xfc/0x210 Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? __sysvec_apic_timer_interrupt+0x5f/0x110 Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? sysvec_apic_timer_interrupt+0x6d/0x90 Mar 15 01:35:51 abc.123.xxx.yyy kernel: </IRQ> Mar 15 01:35:51 abc.123.xxx.yyy kernel: <TASK> Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20 Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? _raw_spin_lock+0x17/0x30 Mar 15 01:35:51 abc.123.xxx.yyy kernel: zap_pte_range+0xec/0xc30 Mar 15 01:35:51 abc.123.xxx.yyy kernel: zap_pmd_range.isra.0+0x141/0x310 Mar 15 01:35:51 abc.123.xxx.yyy kernel: unmap_page_range+0x2b1/0x4c0 Mar 15 01:35:51 abc.123.xxx.yyy kernel: unmap_vmas+0x92/0x110 Mar 15 01:35:51 abc.123.xxx.yyy kernel: exit_mmap+0x90/0x210 Mar 15 01:35:51 abc.123.xxx.yyy kernel: ? mutex_lock+0xe/0x30 Mar 15 01:35:51 abc.123.xxx.yyy kernel: mmput+0x5a/0x140 Mar 15 01:35:51 abc.123.xxx.yyy kernel: exit_mm+0xb8/0x120 Mar 15 01:35:51 abc.123.xxx.yyy kernel: do_exit+0x212/0x4b0 Mar 15 01:35:51 abc.123.xxx.yyy kernel: do_group_exit+0x2d/0x90 Mar 15 01:35:51 abc.123.xxx.yyy kernel: get_signal+0x880/0x8d0 Mar 15 01:35:51 abc.123.xxx.yyy kernel: arch_do_signal_or_restart+0x25/0x100 Mar 15 01:35:51 abc.123.xxx.yyy kernel: exit_to_user_mode_loop+0x9c/0x130 Mar 15 01:35:51 abc.123.xxx.yyy kernel: exit_to_user_mode_prepare+0xb6/0x100 Mar 15 01:35:51 abc.123.xxx.yyy kernel: irqentry_exit_to_user_mode+0x5/0x30 Mar 15 01:35:51 abc.123.xxx.yyy kernel: asm_exc_page_fault+0x22/0x30 Mar 15 01:35:51 abc.123.xxx.yyy kernel: RIP: 0033:0x7f7c0de9ce70 Mar 15 01:35:51 abc.123.xxx.yyy kernel: Code: Unable to access opcode bytes at RIP 0x7f7c0de9ce46. Mar 15 01:35:51 abc.123.xxx.yyy kernel: RSP: 002b:00007ffe30d0f360 EFLAGS: 00010202 Mar 15 01:35:51 abc.123.xxx.yyy kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 Mar 15 01:35:51 abc.123.xxx.yyy kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Mar 15 01:35:51 abc.123.xxx.yyy kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 Mar 15 01:35:51 abc.123.xxx.yyy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Mar 15 01:35:51 abc.123.xxx.yyy kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Mar 15 01:35:51 abc.123.xxx.yyy kernel: </TASK> $ grep BUG var/log/messages | head Mar 15 01:23:32 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: spmd/350651/0x00000002 Mar 15 01:23:32 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: spmd/350651/0x00000000 Mar 15 01:23:32 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: systemd-coredum/350657/0x00000002 Mar 15 01:23:32 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: systemd-coredum/350657/0x00000000 Mar 15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: sh/350664/0x00000002 Mar 15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: sh/350664/0x00000000 Mar 15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: systemd-coredum/350665/0x00000002 Mar 15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: sh/350672/0x00000002 Mar 15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: sh/350672/0x00000000 Mar 15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: systemd-coredum/350673/0x00000002 .. snip .. Mar 15 06:36:50 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 4095s! [systemd-coredum:350691] Mar 15 06:37:14 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 4120s! [systemd-coredum:350665] Mar 15 06:37:18 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 4121s! [systemd-coredum:350683] Mar 15 06:37:18 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 4121s! [systemd-coredum:350691] Mar 15 06:37:42 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 4146s! [systemd-coredum:350665] Mar 15 06:37:46 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 4147s! [systemd-coredum:350683] Mar 15 06:37:46 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 4147s! [systemd-coredum:350691] Mar 15 06:38:10 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 4172s! [systemd-coredum:350665] Mar 15 06:38:14 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 4173s! [systemd-coredum:350683] Mar 15 06:38:14 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 4173s! [systemd-coredum:350691] $ grep ( proc/modules traps 184320 2 - Live 0xffffffffc0975000 (O)
Environment
- Red Hat Enterprise Linux 9.3
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.