Every time when I try to login to the vKVM console on Cisco IMC, the UCS server is getting hung up due to rcu_sched CPU stall that happens on a CPU that is busy writing lengthy output of Show_State to a slow serial console
Issue
- Every time when I try to login to a vKVM console on Cisco IMC, UCS server is getting hung up due to rcu_sched CPU stall that happens on a CPU that is busy writing lengthy output of Show_State to a slow serial console.
[ 1807.573590] task:pmpause state:S stack: 0 pid: 6300 ppid: 1 flags:0x00000080
[ 1807.583266] Call Trace:
[ 1807.586343] __schedule+0x2d1/0x840
[ 1807.590585] schedule+0x35/0xa0
[ 1807.594438] __ia32_sys_pause+0x2b/0x40
[ 1807.599068] do_syscall_64+0x5b/0x1b0
[ 1807.603503] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 1807.609496] RIP: 0033:0x7fb0aa8c85a8
[ 1807.613841] Code: Unable to access opcode bytes at RIP 0x7fb0aa8c857e.
[ 1807.621496] RSP: 002b:00007ffe4ec26378 EFLAGS: 00000246 ORIG_RAX: 0000000000000022
[ 1807.630323] RAX: ffffffffffffffda RBX: 00007ffe4ec26390 RCX: 00007fb0aa8c85a8
[ 1807.638668] RDX: 0000000000000000 RSI: 00007ffe4ec26390 RDI: 0000000000000000
[ 1807.647007] RBP: 00007ffe4ec26410 R08: 0000000000000000 R09: 00007fb0ab05c980
[ 1807.655335] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffe4ec26390
[ 1807.663674] R13: 000000000000000f R14: 0000560243b65f50 R15: 00007ffe4ec26390
[ 1807.672012] Sched Debug Version: v0.11, 4.18.0-372.32.1.el8_6.x86_64 #1
[ 1807.679772] ktime : 1807550.685323
[ 1807.687245] sched_clk : 1819351.499416
[ 1807.694715] cpu_clk : 1807672.011877
[ 1807.702184] jiffies : 4296474877
[ 1807.709264] sched_clock_stable() : 1
[ 1807.715468]
[ 1807.717499] sysctl_sched
[ 1807.720698] .sysctl_sched_latency : 24.000000
[ 1807.727982] .sysctl_sched_min_granularity : 10.000000
[ 1807.735265] .sysctl_sched_wakeup_granularity : 15.000000
[ 1807.742541] .sysctl_sched_child_runs_first : 0
[ 1807.749040] .sysctl_sched_features : 16723515
[ 1807.756219] .sysctl_sched_tunable_scaling : 1 (logaritmic)
[ 1807.763979]
[ 1807.766014] cpu#0, 3300.000 MHz
[ 1807.769884] .nr_running : 0
[ 1807.775395] .nr_switches : 23123
[ 1807.781298] .nr_load_updates : 0
[ 1807.786813] .nr_uninterruptible : 2
[ 1807.792326] .next_balance : 4296.474969
[ 1807.798810] .curr->pid : 0
[ 1807.804321] .clock : 1807803.325501
[ 1807.811093] .clock_task : 1799465.935201
[ 1807.817861] .avg_idle : 889756
[ 1807.823850] .max_idle_balance_cost : 500000
[ 1807.829834]
[ 1807.831829] rt_rq[0]:/system.slice/pmlogger_farm.service
[ 1807.838094] .rt_nr_running : 0
[ 1807.843583] .rt_nr_migratory : 0
[ 1807.849070] .rt_throttled : 0
[ 1807.854557] .rt_time : 0.000000
[ 1807.860722] .rt_runtime : 0.000000
[ 1807.866876]
[ 1807.868837] rt_rq[0]:/system.slice/pmlogger.service
[ 1807.874595] .rt_nr_running : 0
[ 1807.880067] .rt_nr_migratory : 0
[ 1807.885533] .rt_throttled : 0
[ 1807.890999] .rt_time : 0.000000
[ 1807.897141] .rt_runtime : 0.000000
[ 1807.903279]
[ 1807.905233] rt_rq[0]:/system.slice/run-user-0.mount
[ 1807.910971] .rt_nr_running : 0
[ 1807.916411] .rt_nr_migratory : 0
[ 1807.921838] .rt_throttled : 0
[ 1807.927268] .rt_time : 0.000000
[ 1807.933376] .rt_runtime : 0.000000
[ 1807.939483]
...
[ 1814.108126] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1814.115690] rcu: 70-....: (59995 ticks this GP) idle=1aa/1/0x4000000000000002 softirq=31365/31365 fqs=13248
[ 1814.132507] (t=60024 jiffies g=31637 q=36407)
[ 1814.143222] Sending NMI from CPU 70 to CPUs 13:
[ 1814.155749] NMI backtrace for cpu 13
[ 1814.155750] CPU: 13 PID: 0 Comm: swapper/13 Kdump: loaded Tainted: G OE --------- - - 4.18.0-372.32.1.el8_6.x86_64 #1
[ 1814.155751] Hardware name: Cisco Systems Inc UCSC-C480-M5/UCSC-C480-M5, BIOS C480M5.4.2.2d.0.0808221350 08/08/2022
[ 1814.155751] RIP: 0010:io_serial_in+0x14/0x20
[ 1814.155752] Code: 00 00 d3 e6 48 63 f6 48 03 77 10 8b 06 e9 14 55 60 00 0f 1f 40 00 0f 1f 44 00 00 0f b6 8f b1 00 00 00 8b 57 08 d3 e6 01 f2 ec <0f> b6 c0 e9 f4 54 60 00 0f 1f 40 00 0f 1f 44 00 00 0f b6 8f b1 00
[ 1814.155752] RSP: 0018:ffffc08831c2c948 EFLAGS: 00000006
[ 1814.155753] RAX: ffffffff851fbf05 RBX: ffffffff87b0a9e0 RCX: 0000000000000000
[ 1814.155754] RDX: 00000000000003f9 RSI: 0000000000000001 RDI: ffffffff87b0a9e0
[ 1814.155754] RBP: 0000000000000000 R08: ffffffff851fa610 R09: 2020202020202020
[ 1814.155755] R10: 69746e75725f7472 R11: 75725f74722e2020 R12: 0000000000000001
[ 1814.155755] R13: 000000000000003b R14: ffffffff8707db40 R15: 000000000000003b
[ 1814.155756] FS: 0000000000000000(0000) GS:ffff9ba03e140000(0000) knlGS:0000000000000000
[ 1814.155756] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1814.155756] CR2: 00007fb0aa8c857e CR3: 0000004ed5210006 CR4: 00000000007706e0
[ 1814.155757] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1814.155757] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1814.155758] PKRU: 55555554
[ 1814.155758] Call Trace:
[ 1814.155758] <IRQ>
[ 1814.155759] serial8250_console_write+0xa1/0x320
[ 1814.155759] console_unlock+0x363/0x4b0
[ 1814.155759] vprintk_emit+0x151/0x250
[ 1814.155760] printk+0x58/0x73
[ 1814.155760] ? irq_work_queue+0x9/0x30
[ 1814.155761] ? wake_up_klogd.part.30+0x30/0x40
[ 1814.155761] ? vprintk_emit+0x125/0x250
[ 1814.155761] ? printk+0x58/0x73
[ 1814.155762] print_cpu+0x24c/0x4a0
[ 1814.155762] sysrq_sched_debug_show+0x34/0x50
[ 1814.155762] show_state_filter+0xb9/0xd0
[ 1814.155763] kbd_event+0x2da/0xc40
[ 1814.155763] input_to_handler+0xd3/0xf0
[ 1814.155763] input_pass_values.part.7+0x11b/0x140
[ 1814.155764] input_handle_event+0x138/0x5a0
[ 1814.155764] input_event+0x4f/0x80
[ 1814.155765] hid_input_report+0xfe/0x160
[ 1814.155766] hid_irq_in+0x185/0x1b0
[ 1814.155766] __usb_hcd_giveback_urb+0x6f/0x110
[ 1814.155767] usb_giveback_urb_bh+0xb0/0x120
[ 1814.155767] tasklet_action_common.isra.17+0x5a/0x110
[ 1814.155768] __do_softirq+0xd7/0x2c8
[ 1814.155768] irq_exit_rcu+0xd7/0xe0
[ 1814.155769] irq_exit+0xa/0x10
[ 1814.155769] do_IRQ+0x7f/0xd0
[ 1814.155769] common_interrupt+0xf/0xf
[ 1814.155770] </IRQ>
[ 1814.155770] RIP: 0010:cpu_idle_poll+0x35/0x110
[ 1814.155771] Code: 2d b0 7b a3 7a 0f 1f 44 00 00 e8 76 d5 79 ff fb 66 0f 1f 44 00 00 65 48 8b 04 25 40 5c 01 00 48 8b 00 a8 08 74 14 eb 25 f3 90 <65> 48 8b 04 25 40 5c 01 00 48 8b 00 a8 08 75 13 8b 05 c1 65 3d 01
[ 1814.155771] RSP: 0018:ffffc088003ebea8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffde
[ 1814.155772] RAX: 0000000000000001 RBX: 000000000000000d RCX: ffff9ba03e16bc00
[ 1814.155772] RDX: 000000000142571e RSI: 0000000026d489c6 RDI: ffff9ba03e16bc00
[ 1814.155773] RBP: 000000000000000d R08: 0000000000000002 R09: 000000000002a680
[ 1814.155773] R10: 0003d09e1de75cda R11: 0000000000000001 R12: 0000000000000000
[ 1814.155774] R13: 0000000000000000 R14: ffffffffffffffff R15: 0000000000000000
[ 1814.155774] ? cpu_idle_poll+0x1a/0x110
[ 1814.155775] do_idle+0x64/0x2d0
[ 1814.155775] cpu_startup_entry+0x6f/0x80
[ 1814.155776] secondary_startup_64_no_verify+0xc2/0xcb
[ 1814.155823] NMI backtrace for cpu 70
[ 1814.160991]
[ 1814.165302] CPU: 70 PID: 3759 Comm: pmdalinux Kdump: loaded Tainted: G OE --------- - - 4.18.0-372.32.1.el8_6.x86_64 #1
[ 1814.191107] Hardware name: Cisco Systems Inc UCSC-C480-M5/UCSC-C480-M5, BIOS C480M5.4.2.2d.0.0808221350 08/08/2022
[ 1814.217998] Call Trace:
[ 1814.232623] <IRQ>
[ 1814.249390] dump_stack+0x41/0x60
[ 1814.266173] nmi_cpu_backtrace.cold.8+0x13/0x4f
[ 1814.282494] ? lapic_can_unplug_cpu.cold.30+0x43/0x43
[ 1814.290908]
[ 1814.299317] nmi_trigger_cpumask_backtrace+0xe9/0xee
[ 1814.311179] rcu_dump_cpu_stacks+0xc8/0xfc
[ 1814.317033] rcu_sched_clock_irq.cold.99+0x116/0x395
[ 1814.327359] ? tick_sched_do_timer+0x50/0x50
[ 1814.335624] ? tick_sched_do_timer+0x50/0x50
[ 1814.345452] update_process_times+0x55/0x80
[ 1814.355582] tick_sched_handle+0x22/0x60
[ 1814.359524]
[ 1814.364049] tick_sched_timer+0x37/0x80
[ 1814.373575] __hrtimer_run_queues+0x100/0x280
[ 1814.382622] hrtimer_interrupt+0x100/0x220
[ 1814.393041] smp_apic_timer_interrupt+0x6a/0x130
[ 1814.402384] apic_timer_interrupt+0xf/0x20
[ 1814.412875] </IRQ>
[ 1814.421966] RIP: 0010:smp_call_function_single+0x8c/0xf0
[ 1814.427323]
[ 1814.432396] Code: 1f 44 00 00 f6 c4 02 74 72 85 c9 75 4a 49 c7 c0 80 bf 02 00 65 4c 03 05 8a c8 26 7b 41 8b 40 18 a8 01 74 0a f3 90 41 8b 40 18 <a8> 01 75 f6 41 83 48 18 01 48 89 d1 48 89 f2 4c 89 c6 e8 8d fe ff
[ 1814.442828] RSP: 0018:ffffc08834a97c00 EFLAGS: 00000202
[ 1814.450906] ORIG_RAX: ffffffffffffff13
[ 1814.459012] RAX: 0000000000000001 RBX: 00000002540bb3a0 RCX: 0000000000000000
[ 1814.466976] RDX: 0000000000000000 RSI: ffffffff84c3c410 RDI: 0000000000000012
[ 1814.497400] RBP: ffffc08834a97c48 R08: ffff9ba03efabf80 R09: ffff9ba03e280000
[ 1814.514010] R10: ffffc08834a97cd8 R11: 0000000000000000 R12: 0000019861c2cd57
[ 1814.522311]
[ 1814.530609] R13: 0000000000000001 R14: ffff98adf9dc3ed8 R15: ffff9bac3e895100
[ 1814.543510] ? cpu_bugs_smt_update.cold.11+0xaa/0xaa
[ 1814.552029] aperfmperf_snapshot_cpu+0x57/0x70
[ 1814.562680] arch_freq_prepare_all+0x6c/0xb0
[ 1814.568970] cpuinfo_open+0xe/0x20
[ 1814.589430] proc_reg_open+0x71/0x140
[ 1814.606893] ? proc_put_link+0x10/0x10
[ 1814.609997]
[ 1814.615527] do_dentry_open+0x132/0x350
[ 1814.623675] path_openat+0x542/0x14f0
[ 1814.633964] do_filp_open+0x93/0x100
[ 1814.645612] ? getname_flags+0x4a/0x1e0
[ 1814.653690] ? __check_object_size+0xac/0x173
[ 1814.665274] do_sys_open+0x188/0x220
[ 1814.675796] do_syscall_64+0x5b/0x1b0
[ 1814.681740]
[ 1814.687300] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 1814.698040] RIP: 0033:0x7fb0d8231252
[ 1814.709462] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 55 43 2a 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[ 1814.720795] RSP: 002b:00007fff4167b620 EFLAGS: 00000246
[ 1814.727680] ORIG_RAX: 0000000000000101
[ 1814.739328] RAX: ffffffffffffffda RBX: 0000563b5b50da10 RCX: 00007fb0d8231252
[ 1814.750200] RDX: 0000000000000000 RSI: 00007fff4167b7c0 RDI: 00000000ffffff9c
[ 1814.755181]
[ 1814.760736] RBP: 0000000000000008 R08: 0000000000000008 R09: 0000000000000001
[ 1814.771841] R10: 0000000000000000 R11: 0000000000000246 R12: 0000563b5a49c660
[ 1814.783038] R13: 0000563b5a49c660 R14: 0000000000000001 R15: 00007fff4167c8a0
[ 1814.792035] Kernel panic - not syncing: RCU Stall
- Neither [Ctrl]+[t] nor [Ctrl]+[Scroll Lock] is pressed when I try to login to vKVM console.
- I just press [Ctrl]+[c] when I try to login to the vKVM console.
Environment
- Red Hat Enterprise Linux 8
- vKVM console on Cisco Integrated Managment Controller
- Windows 10 on VMware VDI Desktop
- VMware Horizon Client
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.