Timer tree corruption leads to missing wakeup and system freeze

Solution Verified - Updated -

Issue

  • What is CVE-2021-20317.
  • The Host server was hanged in a certain situation while the VM is running/destroying.
  • The issue happens with below logs.
[ 1940.772191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1940.780019] kworker/37:2    D    0  2875      2 0x80084080
[ 1940.780028] Workqueue: events slab_caches_to_rcu_destroy_workfn
[ 1940.780029] Call Trace:
[ 1940.780036]  ? __schedule+0x26d/0x660
[ 1940.780040]  schedule+0x2f/0xa0
[ 1940.780042]  schedule_timeout+0x246/0x2f0
[ 1940.780047]  ? __queue_work+0x103/0x3f0
[ 1940.780049]  ? __switch_to_asm+0x41/0x70
[ 1940.780051]  wait_for_completion+0x11f/0x190
[ 1940.780054]  ? wake_up_q+0x70/0x70
[ 1940.780058]  rcu_barrier+0x17e/0x1e0
[ 1940.780060]  slab_caches_to_rcu_destroy_workfn+0x8f/0xe0
[ 1940.780062]  process_one_work+0x1a7/0x3b0
[ 1940.780063]  worker_thread+0x30/0x390
[ 1940.780066]  ? create_worker+0x1a0/0x1a0
[ 1940.780067]  kthread+0x112/0x130
[ 1940.780069]  ? kthread_flush_work_fn+0x10/0x10
[ 1940.780070]  ret_from_fork+0x1f/0x40

[ 2142.705206] INFO: task perf:3201 blocked for more than 120 seconds.
[ 2142.711493]       Tainted: G        W        --------- -  - 4.18.0-193.64.1.el8_2.x86_64 #1
[ 2142.719838] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2142.727673] perf            D    0  3201   2724 0x00080084
[ 2142.727674] Call Trace:
[ 2142.727677]  ? __schedule+0x26d/0x660
[ 2142.727678]  schedule+0x2f/0xa0
[ 2142.727681]  schedule_timeout+0x193/0x2f0
[ 2142.727687]  ? __next_timer_interrupt+0xf0/0xf0
[ 2142.727688]  msleep+0x29/0x30
[ 2142.727693]  cpuinfo_open+0xe/0x20
[ 2142.727698]  proc_reg_open+0x71/0x130
[ 2142.727699]  ? proc_alloc_inode+0x60/0x60
[ 2142.727702]  do_dentry_open+0x132/0x330
[ 2142.727705]  path_openat+0x573/0x14d0
[ 2142.727708]  ? iomap_file_buffered_write+0x62/0x90
[ 2142.727709]  do_filp_open+0x93/0x100
[ 2142.727712]  ? __check_object_size+0xa8/0x16b
[ 2142.727714]  do_sys_open+0x184/0x220
[ 2142.727716]  do_syscall_64+0x5b/0x1a0
[ 2142.727717]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 2142.727718] RIP: 0033:0x7fc2c1dfa861

[ 2142.659039] Workqueue: events slab_caches_to_rcu_destroy_workfn
[ 2142.659043] Call Trace:
[ 2142.659045]  ? rcu_barrier+0x1e0/0x1e0
[ 2142.659051]  kthread+0x112/0x130
[ 2142.659055]  ? kthread_flush_work_fn+0x10/0x10
[ 2142.659057]  ret_from_fork+0x1f/0x40
[ 2142.659063]  ? __schedule+0x26d/0x660
[ 2142.659066]  schedule+0x2f/0xa0
[ 2142.659068]  schedule_timeout+0x246/0x2f0

[ 2798.316557] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 2798.317524] INFO: task kworker/37:2:2875 blocked for more than 120 seconds.
[ 2798.322480] rcu:     0-...!: (22 GPs behind) idle=a42/1/0x4000000000000000 softirq=42781/42781 fqs=0 
[ 2798.322482] rcu:     2-...!: (10873 GPs behind) idle=dd4/0/0x0 softirq=0/0 fqs=0 
[ 2798.322484] rcu:     3-...!: (10872 GPs behind) idle=d00/0/0x0 softirq=0/0 fqs=0 
[ 2798.322485] rcu:     4-...!: (10871 GPs behind) idle=d1c/0/0x0 softirq=0/0 fqs=0 
[ 2798.322487] rcu:     5-...!: (10870 GPs behind) idle=cf8/0/0x0 softirq=0/0 fqs=0 
[ 2798.322488] rcu:     6-...!: (10869 GPs behind) idle=cbc/0/0x0 softirq=0/0 fqs=0 
[ 2798.322490] rcu:     7-...!: (10869 GPs behind) idle=c98/0/0x0 softirq=0/0 fqs=0 
[ 2798.322491] rcu:     8-...!: (10868 GPs behind) idle=cb0/0/0x0 softirq=0/0 fqs=0 
[ 2798.322492] rcu:     9-...!: (10867 GPs behind) idle=c80/0/0x0 softirq=0/0 fqs=0 
[ 2798.322497] rcu:     10-...!: (10866 GPs behind) idle=c30/0/0x0 softirq=0/0 fqs=0 

[ 2798.656675] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2798.663984]  dump_stack+0x5c/0x80
[ 2798.663988]  nmi_cpu_backtrace.cold.5+0x13/0x4e
[ 2798.671287] kworker/0:1     D    0  2881      2 0x80084080
[ 2798.678595]  ? lapic_can_unplug_cpu.cold.25+0x3b/0x3b
[ 2798.678598]  nmi_trigger_cpumask_backtrace+0xde/0xe0
[ 2798.685932] Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
[ 2798.692081]  rcu_dump_cpu_stacks+0x9c/0xca
[ 2798.692085]  rcu_sched_clock_irq.cold.69+0x29b/0x35e
[ 2798.698949] Call Trace:
[ 2798.707298]  ? tick_sched_do_timer+0x60/0x60
[ 2798.707302]  update_process_times+0x28/0x60
[ 2798.715124]  tick_sched_handle+0x22/0x60
[ 2798.715126]  ? __schedule+0x26d/0x660
[ 2798.715129]  schedule+0x2f/0xa0
[ 2798.715132]  schedule_timeout+0x246/0x2f0
[ 2798.715134]  tick_sched_timer+0x37/0x70
[ 2798.715136]  __hrtimer_run_queues+0x100/0x280
[ 2798.715138]  ? internal_add_timer+0x42/0x60
[ 2798.715140]  ? add_timer+0x13f/0x1f0
[ 2798.715142]  wait_for_completion+0x11f/0x190
[ 2798.715145]  hrtimer_interrupt+0x100/0x220
[ 2798.715147]  ? wake_up_q+0x70/0x70
[ 2798.715151]  smp_apic_timer_interrupt+0x6a/0x140
[ 2798.715153]  __synchronize_srcu.part.16+0x81/0xb0
[ 2798.715156]  apic_timer_interrupt+0xf/0x20
[ 2798.715158]  ? __bpf_trace_rcu_utilization+0x10/0x10
[ 2798.715159]  </IRQ>
[ 2798.715163] RIP: 0010:cpuidle_enter_state+0xbc/0x420
[ 2798.715175]  irqfd_shutdown+0x38/0xa0 [kvm]

Environment

  • Red Hat Enterprise Linux 8

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content