Every time when I try to login to the vKVM console on Cisco IMC, the UCS server is getting hung up due to rcu_sched CPU stall that happens on a CPU that is busy writing lengthy output of Show_State to a slow serial console

Solution Verified - Updated -

Issue

  • Every time when I try to login to a vKVM console on Cisco IMC, UCS server is getting hung up due to rcu_sched CPU stall that happens on a CPU that is busy writing lengthy output of Show_State to a slow serial console.
[ 1807.573590] task:pmpause         state:S stack:    0 pid: 6300 ppid:     1 flags:0x00000080
[ 1807.583266] Call Trace:
[ 1807.586343]  __schedule+0x2d1/0x840
[ 1807.590585]  schedule+0x35/0xa0
[ 1807.594438]  __ia32_sys_pause+0x2b/0x40
[ 1807.599068]  do_syscall_64+0x5b/0x1b0
[ 1807.603503]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 1807.609496] RIP: 0033:0x7fb0aa8c85a8
[ 1807.613841] Code: Unable to access opcode bytes at RIP 0x7fb0aa8c857e.
[ 1807.621496] RSP: 002b:00007ffe4ec26378 EFLAGS: 00000246 ORIG_RAX: 0000000000000022
[ 1807.630323] RAX: ffffffffffffffda RBX: 00007ffe4ec26390 RCX: 00007fb0aa8c85a8
[ 1807.638668] RDX: 0000000000000000 RSI: 00007ffe4ec26390 RDI: 0000000000000000
[ 1807.647007] RBP: 00007ffe4ec26410 R08: 0000000000000000 R09: 00007fb0ab05c980
[ 1807.655335] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffe4ec26390
[ 1807.663674] R13: 000000000000000f R14: 0000560243b65f50 R15: 00007ffe4ec26390
[ 1807.672012] Sched Debug Version: v0.11, 4.18.0-372.32.1.el8_6.x86_64 #1
[ 1807.679772] ktime                                   : 1807550.685323
[ 1807.687245] sched_clk                               : 1819351.499416
[ 1807.694715] cpu_clk                                 : 1807672.011877
[ 1807.702184] jiffies                                 : 4296474877
[ 1807.709264] sched_clock_stable()                    : 1
[ 1807.715468] 
[ 1807.717499] sysctl_sched
[ 1807.720698]   .sysctl_sched_latency                    : 24.000000
[ 1807.727982]   .sysctl_sched_min_granularity            : 10.000000
[ 1807.735265]   .sysctl_sched_wakeup_granularity         : 15.000000
[ 1807.742541]   .sysctl_sched_child_runs_first           : 0
[ 1807.749040]   .sysctl_sched_features                   : 16723515
[ 1807.756219]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
[ 1807.763979] 
[ 1807.766014] cpu#0, 3300.000 MHz
[ 1807.769884]   .nr_running                    : 0
[ 1807.775395]   .nr_switches                   : 23123
[ 1807.781298]   .nr_load_updates               : 0
[ 1807.786813]   .nr_uninterruptible            : 2
[ 1807.792326]   .next_balance                  : 4296.474969
[ 1807.798810]   .curr->pid                     : 0
[ 1807.804321]   .clock                         : 1807803.325501
[ 1807.811093]   .clock_task                    : 1799465.935201
[ 1807.817861]   .avg_idle                      : 889756
[ 1807.823850]   .max_idle_balance_cost         : 500000
[ 1807.829834] 
[ 1807.831829] rt_rq[0]:/system.slice/pmlogger_farm.service
[ 1807.838094]   .rt_nr_running                 : 0
[ 1807.843583]   .rt_nr_migratory               : 0
[ 1807.849070]   .rt_throttled                  : 0
[ 1807.854557]   .rt_time                       : 0.000000
[ 1807.860722]   .rt_runtime                    : 0.000000
[ 1807.866876] 
[ 1807.868837] rt_rq[0]:/system.slice/pmlogger.service
[ 1807.874595]   .rt_nr_running                 : 0
[ 1807.880067]   .rt_nr_migratory               : 0
[ 1807.885533]   .rt_throttled                  : 0
[ 1807.890999]   .rt_time                       : 0.000000
[ 1807.897141]   .rt_runtime                    : 0.000000
[ 1807.903279] 
[ 1807.905233] rt_rq[0]:/system.slice/run-user-0.mount
[ 1807.910971]   .rt_nr_running                 : 0
[ 1807.916411]   .rt_nr_migratory               : 0
[ 1807.921838]   .rt_throttled                  : 0
[ 1807.927268]   .rt_time                       : 0.000000
[ 1807.933376]   .rt_runtime                    : 0.000000
[ 1807.939483] 
    ...
[ 1814.108126] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1814.115690] rcu:     70-....: (59995 ticks this GP) idle=1aa/1/0x4000000000000002 softirq=31365/31365 fqs=13248 
[ 1814.132507]  (t=60024 jiffies g=31637 q=36407)
[ 1814.143222] Sending NMI from CPU 70 to CPUs 13:
[ 1814.155749] NMI backtrace for cpu 13
[ 1814.155750] CPU: 13 PID: 0 Comm: swapper/13 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1
[ 1814.155751] Hardware name: Cisco Systems Inc UCSC-C480-M5/UCSC-C480-M5, BIOS C480M5.4.2.2d.0.0808221350 08/08/2022
[ 1814.155751] RIP: 0010:io_serial_in+0x14/0x20
[ 1814.155752] Code: 00 00 d3 e6 48 63 f6 48 03 77 10 8b 06 e9 14 55 60 00 0f 1f 40 00 0f 1f 44 00 00 0f b6 8f b1 00 00 00 8b 57 08 d3 e6 01 f2 ec <0f> b6 c0 e9 f4 54 60 00 0f 1f 40 00 0f 1f 44 00 00 0f b6 8f b1 00
[ 1814.155752] RSP: 0018:ffffc08831c2c948 EFLAGS: 00000006
[ 1814.155753] RAX: ffffffff851fbf05 RBX: ffffffff87b0a9e0 RCX: 0000000000000000
[ 1814.155754] RDX: 00000000000003f9 RSI: 0000000000000001 RDI: ffffffff87b0a9e0
[ 1814.155754] RBP: 0000000000000000 R08: ffffffff851fa610 R09: 2020202020202020
[ 1814.155755] R10: 69746e75725f7472 R11: 75725f74722e2020 R12: 0000000000000001
[ 1814.155755] R13: 000000000000003b R14: ffffffff8707db40 R15: 000000000000003b
[ 1814.155756] FS:  0000000000000000(0000) GS:ffff9ba03e140000(0000) knlGS:0000000000000000
[ 1814.155756] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1814.155756] CR2: 00007fb0aa8c857e CR3: 0000004ed5210006 CR4: 00000000007706e0
[ 1814.155757] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1814.155757] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1814.155758] PKRU: 55555554
[ 1814.155758] Call Trace:
[ 1814.155758]  <IRQ>
[ 1814.155759]  serial8250_console_write+0xa1/0x320
[ 1814.155759]  console_unlock+0x363/0x4b0
[ 1814.155759]  vprintk_emit+0x151/0x250
[ 1814.155760]  printk+0x58/0x73
[ 1814.155760]  ? irq_work_queue+0x9/0x30
[ 1814.155761]  ? wake_up_klogd.part.30+0x30/0x40
[ 1814.155761]  ? vprintk_emit+0x125/0x250
[ 1814.155761]  ? printk+0x58/0x73
[ 1814.155762]  print_cpu+0x24c/0x4a0
[ 1814.155762]  sysrq_sched_debug_show+0x34/0x50
[ 1814.155762]  show_state_filter+0xb9/0xd0
[ 1814.155763]  kbd_event+0x2da/0xc40
[ 1814.155763]  input_to_handler+0xd3/0xf0
[ 1814.155763]  input_pass_values.part.7+0x11b/0x140
[ 1814.155764]  input_handle_event+0x138/0x5a0
[ 1814.155764]  input_event+0x4f/0x80
[ 1814.155765]  hid_input_report+0xfe/0x160
[ 1814.155766]  hid_irq_in+0x185/0x1b0
[ 1814.155766]  __usb_hcd_giveback_urb+0x6f/0x110
[ 1814.155767]  usb_giveback_urb_bh+0xb0/0x120
[ 1814.155767]  tasklet_action_common.isra.17+0x5a/0x110
[ 1814.155768]  __do_softirq+0xd7/0x2c8
[ 1814.155768]  irq_exit_rcu+0xd7/0xe0
[ 1814.155769]  irq_exit+0xa/0x10
[ 1814.155769]  do_IRQ+0x7f/0xd0
[ 1814.155769]  common_interrupt+0xf/0xf
[ 1814.155770]  </IRQ>
[ 1814.155770] RIP: 0010:cpu_idle_poll+0x35/0x110
[ 1814.155771] Code: 2d b0 7b a3 7a 0f 1f 44 00 00 e8 76 d5 79 ff fb 66 0f 1f 44 00 00 65 48 8b 04 25 40 5c 01 00 48 8b 00 a8 08 74 14 eb 25 f3 90 <65> 48 8b 04 25 40 5c 01 00 48 8b 00 a8 08 75 13 8b 05 c1 65 3d 01
[ 1814.155771] RSP: 0018:ffffc088003ebea8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffde
[ 1814.155772] RAX: 0000000000000001 RBX: 000000000000000d RCX: ffff9ba03e16bc00
[ 1814.155772] RDX: 000000000142571e RSI: 0000000026d489c6 RDI: ffff9ba03e16bc00
[ 1814.155773] RBP: 000000000000000d R08: 0000000000000002 R09: 000000000002a680
[ 1814.155773] R10: 0003d09e1de75cda R11: 0000000000000001 R12: 0000000000000000
[ 1814.155774] R13: 0000000000000000 R14: ffffffffffffffff R15: 0000000000000000
[ 1814.155774]  ? cpu_idle_poll+0x1a/0x110
[ 1814.155775]  do_idle+0x64/0x2d0
[ 1814.155775]  cpu_startup_entry+0x6f/0x80
[ 1814.155776]  secondary_startup_64_no_verify+0xc2/0xcb
[ 1814.155823] NMI backtrace for cpu 70
[ 1814.160991] 
[ 1814.165302] CPU: 70 PID: 3759 Comm: pmdalinux Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1
[ 1814.191107] Hardware name: Cisco Systems Inc UCSC-C480-M5/UCSC-C480-M5, BIOS C480M5.4.2.2d.0.0808221350 08/08/2022
[ 1814.217998] Call Trace:
[ 1814.232623]  <IRQ>
[ 1814.249390]  dump_stack+0x41/0x60
[ 1814.266173]  nmi_cpu_backtrace.cold.8+0x13/0x4f
[ 1814.282494]  ? lapic_can_unplug_cpu.cold.30+0x43/0x43
[ 1814.290908] 
[ 1814.299317]  nmi_trigger_cpumask_backtrace+0xe9/0xee
[ 1814.311179]  rcu_dump_cpu_stacks+0xc8/0xfc
[ 1814.317033]  rcu_sched_clock_irq.cold.99+0x116/0x395
[ 1814.327359]  ? tick_sched_do_timer+0x50/0x50
[ 1814.335624]  ? tick_sched_do_timer+0x50/0x50
[ 1814.345452]  update_process_times+0x55/0x80
[ 1814.355582]  tick_sched_handle+0x22/0x60
[ 1814.359524] 
[ 1814.364049]  tick_sched_timer+0x37/0x80
[ 1814.373575]  __hrtimer_run_queues+0x100/0x280
[ 1814.382622]  hrtimer_interrupt+0x100/0x220
[ 1814.393041]  smp_apic_timer_interrupt+0x6a/0x130
[ 1814.402384]  apic_timer_interrupt+0xf/0x20
[ 1814.412875]  </IRQ>
[ 1814.421966] RIP: 0010:smp_call_function_single+0x8c/0xf0
[ 1814.427323] 
[ 1814.432396] Code: 1f 44 00 00 f6 c4 02 74 72 85 c9 75 4a 49 c7 c0 80 bf 02 00 65 4c 03 05 8a c8 26 7b 41 8b 40 18 a8 01 74 0a f3 90 41 8b 40 18 <a8> 01 75 f6 41 83 48 18 01 48 89 d1 48 89 f2 4c 89 c6 e8 8d fe ff
[ 1814.442828] RSP: 0018:ffffc08834a97c00 EFLAGS: 00000202
[ 1814.450906]  ORIG_RAX: ffffffffffffff13
[ 1814.459012] RAX: 0000000000000001 RBX: 00000002540bb3a0 RCX: 0000000000000000
[ 1814.466976] RDX: 0000000000000000 RSI: ffffffff84c3c410 RDI: 0000000000000012
[ 1814.497400] RBP: ffffc08834a97c48 R08: ffff9ba03efabf80 R09: ffff9ba03e280000
[ 1814.514010] R10: ffffc08834a97cd8 R11: 0000000000000000 R12: 0000019861c2cd57
[ 1814.522311] 
[ 1814.530609] R13: 0000000000000001 R14: ffff98adf9dc3ed8 R15: ffff9bac3e895100
[ 1814.543510]  ? cpu_bugs_smt_update.cold.11+0xaa/0xaa
[ 1814.552029]  aperfmperf_snapshot_cpu+0x57/0x70
[ 1814.562680]  arch_freq_prepare_all+0x6c/0xb0
[ 1814.568970]  cpuinfo_open+0xe/0x20
[ 1814.589430]  proc_reg_open+0x71/0x140
[ 1814.606893]  ? proc_put_link+0x10/0x10
[ 1814.609997] 
[ 1814.615527]  do_dentry_open+0x132/0x350
[ 1814.623675]  path_openat+0x542/0x14f0
[ 1814.633964]  do_filp_open+0x93/0x100
[ 1814.645612]  ? getname_flags+0x4a/0x1e0
[ 1814.653690]  ? __check_object_size+0xac/0x173
[ 1814.665274]  do_sys_open+0x188/0x220
[ 1814.675796]  do_syscall_64+0x5b/0x1b0
[ 1814.681740] 
[ 1814.687300]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[ 1814.698040] RIP: 0033:0x7fb0d8231252
[ 1814.709462] Code: 25 00 00 41 00 3d 00 00 41 00 74 4c 48 8d 05 55 43 2a 00 8b 00 85 c0 75 6d 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a2 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[ 1814.720795] RSP: 002b:00007fff4167b620 EFLAGS: 00000246
[ 1814.727680]  ORIG_RAX: 0000000000000101
[ 1814.739328] RAX: ffffffffffffffda RBX: 0000563b5b50da10 RCX: 00007fb0d8231252
[ 1814.750200] RDX: 0000000000000000 RSI: 00007fff4167b7c0 RDI: 00000000ffffff9c
[ 1814.755181] 
[ 1814.760736] RBP: 0000000000000008 R08: 0000000000000008 R09: 0000000000000001
[ 1814.771841] R10: 0000000000000000 R11: 0000000000000246 R12: 0000563b5a49c660
[ 1814.783038] R13: 0000563b5a49c660 R14: 0000000000000001 R15: 00007fff4167c8a0
[ 1814.792035] Kernel panic - not syncing: RCU Stall
  • Neither [Ctrl]+[t] nor [Ctrl]+[Scroll Lock] is pressed when I try to login to vKVM console.
  • I just press [Ctrl]+[c] when I try to login to the vKVM console.

Environment

  • Red Hat Enterprise Linux 8
  • vKVM console on Cisco Integrated Managment Controller
  • Windows 10 on VMware VDI Desktop
  • VMware Horizon Client

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content