[RT kernel] Process is set to RU state but stalled in CPU .

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 8.5
    • kernel-rt 4.18.0-348.23.1.rt7.153.el8_5
  • Red Hat Enterprise Linux 8.6
  • Red Hat Enterprise Linux 8.7

Issue

  • A specific process is set to RU state but is not executing on a CPU nor is it in any CPU's runqueue.
PID: 931918 COMMAND: "sh"        TASK: ffff959a97d63e80     CPU: 62   STATE: TASK_RUNNING 

  crash> runq -c 62
      CPU 62 RUNQUEUE: ffff959fbf9ac2c0
        CURRENT: PID: 0      TASK: ffff959003200000  COMMAND: "swapper/62"
        RT PRIO_ARRAY: ffff959fbf9ac500
           [no tasks queued]
        CFS RB_ROOT: ffff959fbf9ac370
           [no tasks queued]

Resolution

Red Hat Enterprise Linux 8.6
Red Hat Enterprise Linux 8.7

Root Cause

  • The deadly embrace is brought about by the CFS throttling mechanism waiting to add processes to the various CPU run queues which will never occur as the lock that prevents the softirqs from being auctioned is held waiting for another lock to be released once the owner gets CPU time.

Diagnostic Steps

       CPUS: 80
        DATE: Wed Jul 13 02:08:34 EDT 2022
      UPTIME: 19:23:20
LOAD AVERAGE: 94.33, 77.98, 48.58
       TASKS: 2808
    NODENAME: node1
     RELEASE: 4.18.0-348.23.1.rt7.153.el8_5.x86_64
     VERSION: #1 SMP PREEMPT_RT Tue Apr 12 14:01:36 EDT 2022
     MACHINE: x86_64  (1600 Mhz)
      MEMORY: 127.7 GB
       PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
         PID: 699
     COMMAND: "khungtaskd"
        TASK: ffff95881ac43e80  [THREAD_INFO: ffff95881ac43e80]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)
  • The sh process is currently in Runnable state [RU]. As it is the next waiter in the list, we know this task will acquire the lock the next time it runs on the CPU:

      PID: 931918   COMMAND: "sh"            TASK: ffff959a97d63e80         CPU: 62   STATE: TASK_RUNNING 
      PID: 517  COMMAND: "rcuop/62"      TASK: ffff959003778000         CPU: 1    STATE: TASK_UNINTERRUPTIBLE
      PID: 931805   COMMAND: "runc"          TASK: ffff959ab41cddc0         CPU: 61   STATE: TASK_UNINTERRUPTIBLE 
      PID: 931793   COMMAND: "conmon"        TASK: ffff9586bb0e3e80             CPU: 21   STATE: TASK_UNINTERRUPTIBLE 
      PID: 7983 COMMAND: "crio"          TASK: ffff959b012b3e80         CPU: 20   STATE: TASK_UNINTERRUPTIBLE
      PID: 771529   COMMAND: "crio"          TASK: ffff9580f5360000         CPU: 60   STATE: TASK_UNINTERRUPTIBLE 
      PID: 878  COMMAND: "kworker/2:1"   TASK: ffff95808cae9f40         CPU: 2    STATE: TASK_UNINTERRUPTIBLE 
      PID: 3578758  COMMAND: "kworker/0:2"   TASK: ffff9587a4945dc0         CPU: 0    STATE: TASK_UNINTERRUPTIBLE 
      PID: 926300   COMMAND: "kworker/41:2"  TASK: ffff9580e79fddc0         CPU: 41   STATE: TASK_UNINTERRUPTIBLE 
      PID: 1791015  COMMAND: "iperf"     TASK: ffff9590651d9f40         CPU: 36   STATE: TASK_UNINTERRUPTIBLE
    
  • Currently PID 931918 sh appears to still be waiting in schedule to be started on a CPU.

      crash> bt 931918
      PID: 931918   TASK: ffff959a97d63e80  CPU: 62   COMMAND: "sh"
       #0 [ffffb4de2886fa18] __schedule at ffffffffa5f85fdf
       #1 [ffffb4de2886faa8] preempt_schedule_lock at ffffffffa5f86739
       #2 [ffffb4de2886fab0] rt_spin_lock_slowlock_locked at ffffffffa5f8956e
       #3 [ffffb4de2886faf0] rt_spin_lock_slowlock at ffffffffa5f89770
       #4 [ffffb4de2886fb60] rt_spin_lock at ffffffffa5f8bb8a
       #5 [ffffb4de2886fb68] get_page_from_freelist at ffffffffa58db53f
       #6 [ffffb4de2886fc88] __alloc_pages_nodemask at ffffffffa58dd9e5
       #7 [ffffb4de2886fcf8] alloc_pages_vma at ffffffffa58f8ab5
       #8 [ffffb4de2886fd38] wp_page_copy at ffffffffa58b9a49
       #9 [ffffb4de2886fda8] do_wp_page at ffffffffa58bc33f
      #10 [ffffb4de2886fde8] __handle_mm_fault at ffffffffa58bded3
      #11 [ffffb4de2886fe90] handle_mm_fault at ffffffffa58be282
      #12 [ffffb4de2886feb8] __do_page_fault at ffffffffa566e776
      #13 [ffffb4de2886ff18] do_page_fault at ffffffffa566eb24
      #14 [ffffb4de2886ff50] page_fault at ffffffffa600109e
          RIP: 00007f0997da34a6  RSP: 00007fffd0f0bb60  RFLAGS: 00010202
          RAX: 000055bb5d8c1fa0  RBX: 000055bb5d8c39e0  RCX: 000055bb5dadbf50
          RDX: 000055bb5d8c4b20  RSI: 00007fffd0f0bc00  RDI: 00007f0997fc09f0
          RBP: 00007fffd0f0bc70   R8: 0000000000000000   R9: 0000000000000002
          R10: 00007f0997fc1170  R11: 0000000000000206  R12: 0000000000000000
          R13: 0000000000000000  R14: 000055bb5d8c0000  R15: 00007f0997fc1170
          ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
  • However, this task is not currently on the runq specifically for CPU 62:
      crash> runq -c 62
      CPU 62 RUNQUEUE: ffff959fbf9ac2c0
        CURRENT: PID: 0      TASK: ffff959003200000  COMMAND: "swapper/62"
        RT PRIO_ARRAY: ffff959fbf9ac500
           [no tasks queued]
        CFS RB_ROOT: ffff959fbf9ac370
           [no tasks queued]

     Nor in fact, is the process found on the runq for any CPU:

      crash> p runqueues:a | grep ffff959a97d63e80 | wc -l
      0

  • The task_struct for PID 931918 sh does have the on_rq flag set:
      crash> struct task_struct ffff959a97d63e80 
      struct task_struct {
        thread_info = {
          flags = 0x80000,
          status = 0x0,
          preempt_lazy_count = 0x1
        },
        state = 0x0,
        saved_state = 0x0,
        stack = 0xffffb4de2886c000,
        ---------<snip>-------------------          
        on_cpu = 0x0,
        cpu = 0x3e,
        wakee_flips = 0x5,
        wakee_flip_decay_ts = 0x1041cb050,
        last_wakee = 0xffff959046671f40,
        recent_used_cpu = 0x3e,
        wake_cpu = 0x3e,
        on_rq = 0x1,       <--------------------------
        prio = 0x78,
        static_prio = 0x78,
        normal_prio = 0x78,
        rt_priority = 0x0,
        sched_class = 0xffffffffa6423d80 <fair_sched_class>,
        se = {
          load = {
            weight = 0x100000,
            inv_weight = 0x400000
          },
          runnable_weight = 0x0,
          run_node = {
            __rb_parent_color = 0xffff959ab3c92018,
            rb_right = 0x0,
            rb_left = 0x0
          },
          group_node = {
            next = 0xffff959ab44bdeb0,
            prev = 0xffff959ab44b80f0
          },
          on_rq = 0x1,
          exec_start = 0x3f031dc6c510,
          sum_exec_runtime = 0xa8c64,
          vruntime = 0x17544c4559026,
          prev_sum_exec_runtime = 0x0,
          nr_migrations = 0x0,

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments