System hangs after "kernel: BUG: scheduling while atomic" printed in the messages when "traps" kernel module used.

Solution Verified - Updated -

Issue

  • At random systems crash with very high CPU usage with the following error message on the console

    Mar  15 01:10:50 abc.123.xxx.yyy systemd[1]: Started Firewall Company Networks.
    Mar  15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: — starting —
    Mar  15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [fs] strategy [2]
    Mar  15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [stats] not exporting stats
    Mar  15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [callbacks] data-proto version [2] supported [1:2]
    Mar  15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [callbacks] enabled providers [0x7f] requested [0x7f] supported 
                                                    [0x3fff]
    Mar  15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [callbacks] enabled features [0x0] requested [0x0] supported [0x3]
    Mar  15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [auth-link] strategy [0]
    Mar  15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [auth-link] enabled flows [0x50] requested [0x50] supported [0x50]
    Mar  15 01:10:52 abc.123.xxx.yyy kernel: TRAPS: [auth-link] connected to pid [203773]
    Mar  15 01:10:55 abc.123.xxx.yyy kernel: TRAPS: [auth-link] flow registered [6]
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: TRAPS: [data-link] mapped by [203773]
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: TRAPS: [auth-link] flow registered [4]
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: BUG: scheduling while atomic: grep/204090/0x00000002
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: Modules linked in: traps(O) tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6  
    nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_syslog nft_log nft_ct nft_chain_nat nf_nat 
    nf_conntrack rfkill nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink vsock_loopback 
    vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vfat fat vmw_balloon intel_rapl_msr intel_rapl_common 
    rapl pcspkr vmw_vmci i2c_piix4 joydev ext4 mbcache jbd2 sr_mod cdrom vmwgfx drm_ttm_helper ata_generic ttm 
    drm_kms_helper syscopyarea sysfillrect sysimgblt drm ahci sd_mod libahci ata_piix t10_pi sg libata crc32_pclmul 
    vmxnet3 vmw_pvscsi serio_raw dm_mirror dm_region_hash dm_log dm_mod xxhash_generic ansi_cprng des_generic 
    des3_ede_x86_64 libdes ghash_clmulni_intel crct10dif_pclmul crc32c_intel fuse [last unloaded: traps]
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: CPU: 0 PID: 204090 Comm: grep Kdump: loaded Tainted: G           O         
    -------  ---  5.14.0-362.8.1.el9_3.x86_64 #1
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: Hardware name: VMware, Inc. VMware9,2/555BZ Desktop Reference Platform, BIOS 
    ABC43.55X.34210386.X11.6230101555 10/25/2021
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: Call Trace:
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: <TASK>
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: dump_stack_lvl+0x34/0x48
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: __schedule_bug.cold+0x47/0x53
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: schedule_debug.constprop.0+0xc5/0x100
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: __schedule+0x48/0x550
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: ? __mod_timer+0x286/0x3d0
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: schedule+0x2d/0x70
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: schedule_timeout+0x88/0x160
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: ? __pfx_process_timeout+0x10/0x10
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: wait_queue_wait_on_ticket+0x126/0x200 [traps]
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: ? __pfx_autoremove_wake_function+0x10/0x10
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: auth_link_send_request+0x94/0x190 [traps]
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: exec_tracker_process_exec_sync+0x6c/0x90 [traps]
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: probe_fini+0x1196/0x12f0 [traps]
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: probe_fini+0x1299/0x12f0 [traps]
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: exec_binprm+0x14e/0x1a0
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: bprm_execve.part.0+0x170/0x220
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: do_execveat_common.isra.0+0x1a2/0x220
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: __x64_sys_execve+0x33/0x40
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: do_syscall_64+0x5c/0x90
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: ? exc_page_fault+0x62/0x150
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: RIP: 0033:0x7fabcb3efe70
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: Code: Unable to access opcode bytes at RIP 0x7fabcb3efe46.
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: RSP: 002b:00007ffed9f19f00 EFLAGS: 00000200 ORIG_RAX: 000000000000003b
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    Mar  15 01:10:56 abc.123.xxx.yyy kernel: </TASK>
    
  • Shortly thereafter the Soft Lockup is detected, and is continuously reported until the system is rebooted.

    Mar  15 01:35:51 abc.123.xxx.yyy kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [systemd-coredum:204096]
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: Modules linked in: traps(O) tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 
    nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_syslog nft_log nft_ct nft_chain_nat nf_nat 
    nf_conntrack rfkill nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink vsock_loopback 
    vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vfat fat vmw_balloon intel_rapl_msr intel_rapl_common 
    rapl pcspkr vmw_vmci i2c_piix4 joydev ext4 mbcache jbd2 sr_mod cdrom vmwgfx drm_ttm_helper ata_generic ttm 
    drm_kms_helper syscopyarea sysfillrect sysimgblt drm ahci sd_mod libahci ata_piix t10_pi sg libata crc32_pclmul 
    vmxnet3 vmw_pvscsi serio_raw dm_mirror dm_region_hash dm_log dm_mod xxhash_generic ansi_cprng des_generic 
    des3_ede_x86_64 libdes ghash_clmulni_intel crct10dif_pclmul crc32c_intel fuse [last unloaded: traps]
    
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: CPU: 0 PID: 204096 Comm: systemd-coredum Kdump: loaded Tainted: G        W  O       
    -------  ---  5.14.0-362.8.1.el9_3.x86_64 #1
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS 
    VMW71.00V.21100432.B64.2301110304 01/11/2023
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: RIP: 0010:_raw_spin_lock+0x17/0x30
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: Code: 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 
    65 ff 05 5c ae bc 6f 31 c0 ba 01 00 00 00 f0 0f b1 17 <75> 05 c3 cc cc cc cc 89 c6 e8 cb 01 00 00 66 90 c3 cc cc cc cc 
    0f
    
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: RSP: 0000:ffffa448c1d27b10 EFLAGS: 00000246
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: RAX: 0000000000000000 RBX: 0000558c95d72000 RCX: 0000000197321067
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: RDX: 0000000000000001 RSI: ffffd4b100000000 RDI: ffffd4b1065cc868
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: RBP: ffff942417321b88 R08: ffff942280000b88 R09: ffffa448c1d27c70
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: R10: 0000558c95d70000 R11: ffff9423af439000 R12: 0000558c95d71000
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: R13: ffffa448c1d27ce0 R14: 0000000000000004 R15: ffff94238602e658
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: FS:  0000000000000000(0000) GS:ffff9426afc00000(0000) knlGS:0000000000000000
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: CR2: 00007f7c0de9ce46 CR3: 0000000273c10004 CR4: 00000000001706f0
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: Call Trace:
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: <IRQ>
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? show_trace_log_lvl+0x1c4/0x2df
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? show_trace_log_lvl+0x1c4/0x2df
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? zap_pte_range+0xec/0xc30
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? watchdog_timer_fn+0x1b2/0x210
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? __pfx_watchdog_timer_fn+0x10/0x10
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? __hrtimer_run_queues+0x12a/0x2c0
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? hrtimer_interrupt+0xfc/0x210
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? __sysvec_apic_timer_interrupt+0x5f/0x110
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? sysvec_apic_timer_interrupt+0x6d/0x90
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: </IRQ>
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: <TASK>
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? _raw_spin_lock+0x17/0x30
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: zap_pte_range+0xec/0xc30
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: zap_pmd_range.isra.0+0x141/0x310
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: unmap_page_range+0x2b1/0x4c0
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: unmap_vmas+0x92/0x110
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: exit_mmap+0x90/0x210
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: ? mutex_lock+0xe/0x30
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: mmput+0x5a/0x140
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: exit_mm+0xb8/0x120
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: do_exit+0x212/0x4b0
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: do_group_exit+0x2d/0x90
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: get_signal+0x880/0x8d0
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: arch_do_signal_or_restart+0x25/0x100
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: exit_to_user_mode_loop+0x9c/0x130
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: exit_to_user_mode_prepare+0xb6/0x100
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: irqentry_exit_to_user_mode+0x5/0x30
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: asm_exc_page_fault+0x22/0x30
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: RIP: 0033:0x7f7c0de9ce70
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: Code: Unable to access opcode bytes at RIP 0x7f7c0de9ce46.
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: RSP: 002b:00007ffe30d0f360 EFLAGS: 00010202
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    Mar  15 01:35:51 abc.123.xxx.yyy kernel: </TASK>
    
    $ grep BUG  var/log/messages | head
    
    Mar  15 01:23:32 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: spmd/350651/0x00000002
    Mar  15 01:23:32 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: spmd/350651/0x00000000
    Mar  15 01:23:32 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: systemd-coredum/350657/0x00000002
    Mar  15 01:23:32 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: systemd-coredum/350657/0x00000000
    Mar  15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: sh/350664/0x00000002
    Mar  15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: sh/350664/0x00000000
    Mar  15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: systemd-coredum/350665/0x00000002
    Mar  15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: sh/350672/0x00000002
    Mar  15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: sh/350672/0x00000000
    Mar  15 01:23:33 a-gitlab1.123.xxxx.yyyy kernel: BUG: scheduling while atomic: systemd-coredum/350673/0x00000002
    
    .. snip ..
    
    Mar  15 06:36:50 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 4095s! [systemd-coredum:350691]
    Mar  15 06:37:14 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 4120s! [systemd-coredum:350665]
    Mar  15 06:37:18 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 4121s! [systemd-coredum:350683]
    Mar  15 06:37:18 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 4121s! [systemd-coredum:350691]
    Mar  15 06:37:42 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 4146s! [systemd-coredum:350665]
    Mar  15 06:37:46 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 4147s! [systemd-coredum:350683]
    Mar  15 06:37:46 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 4147s! [systemd-coredum:350691]
    Mar  15 06:38:10 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 4172s! [systemd-coredum:350665]
    Mar  15 06:38:14 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 4173s! [systemd-coredum:350683]
    Mar  15 06:38:14 a-gitlab1.123.xxxx.yyyy kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 4173s! [systemd-coredum:350691]
    
    $ grep ( proc/modules
    traps 184320 2 - Live 0xffffffffc0975000 (O)
    

Environment

  • Red Hat Enterprise Linux 9.3

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content