System hangs due to llt tasks going into blocked state

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 8
  • Third party module [llt]

Issue

  • System hangs due to llt tasks going into blocked state
  • Call traces of llt tasks are stuck in llt_cv_wait_timeout() function calls.
  • The processes going into Uninterruptible sleep and getting stuck in the code path of llt module.

crash> bt 3740 PID: 3740 TASK: ffff9b821934d280 CPU: 7 COMMAND: "llt_hb/7" #0 [ffff9b8219367d10] __schedule at ffffffff9a5b78d8 #1 [ffff9b8219367d78] schedule at ffffffff9a5b7ca9 #2 [ffff9b8219367d88] schedule_timeout at ffffffff9a5b5778 #3 [ffff9b8219367e38] llt_cv_wait_timeout at ffffffffc0e82216 [llt] #4 [ffff9b8219367e98] llt_hb_thread at ffffffffc0e951ae [llt] #5 [ffff9b8219367ec8] kthread at ffffffff99ecb621 #6 [ffff9b8219367f50] ret_from_fork_nospec_begin at ffffffff9a5c51dd

Resolution

  • The [llt] module is not shipped by Red Hat. Contact the supplier of the [llt] module for further troubleshooting and diagnosis.

Diagnostic Steps

Processes going into hung state due to [llt] as shown below:

  • System Information:
       CPUS: 8
        DATE: Mon Aug  7 01:04:51 EDT 2023
      UPTIME: 5 days, 21:15:15
LOAD AVERAGE: 8.29, 8.58, 8.77
       TASKS: 2055
    NODENAME: XXXXXXXXXX
     RELEASE: 3.10.0-1160.92.1.el7.x86_64
     VERSION: #1 SMP Thu May 18 11:23:40 UTC 2023
     MACHINE: x86_64  (2095 Mhz)
      MEMORY: 76 GB
       PANIC: "SysRq : Trigger a crash"
         PID: 37678
     COMMAND: "bash"
        TASK: ffff9b7ec0e00000  [THREAD_INFO: ffff9b7d4a534000]
         CPU: 7
       STATE: TASK_RUNNING (SYSRQ)
  • Process(es) status:
crash> ps -S
  RU: 10
  IN: 2037
  UN: 8

crash> ps -S|grep UN
  UN: 8

crash> ps -m|grep UN
[0 00:00:01.376] [UN]  PID: 3736     TASK: ffff9b8219349080  CPU: 3    COMMAND: "llt_hb/3"
[0 00:00:02.115] [UN]  PID: 3737     TASK: ffff9b821934a100  CPU: 4    COMMAND: "llt_hb/4"
[0 00:00:02.605] [UN]  PID: 3734     TASK: ffff9b8219306300  CPU: 1    COMMAND: "llt_hb/1"
[0 00:00:02.723] [UN]  PID: 3733     TASK: ffff9b8219305280  CPU: 0    COMMAND: "llt_hb/0"
[0 00:00:03.045] [UN]  PID: 3739     TASK: ffff9b821934c200  CPU: 6    COMMAND: "llt_hb/6"
[0 00:00:03.173] [UN]  PID: 3738     TASK: ffff9b821934b180  CPU: 5    COMMAND: "llt_hb/5"
[0 00:00:03.536] [UN]  PID: 3735     TASK: ffff9b8219348000  CPU: 2    COMMAND: "llt_hb/2"
[0 00:00:03.746] [UN]  PID: 3740     TASK: ffff9b821934d280  CPU: 7    COMMAND: "llt_hb/7"
  • Backtrace of Processes in UN state
crash> bt 3735
PID: 3735     TASK: ffff9b8219348000  CPU: 2    COMMAND: "llt_hb/2"
 #0 [ffff9b8219353d10] __schedule at ffffffff9a5b78d8
 #1 [ffff9b8219353d78] schedule at ffffffff9a5b7ca9
 #2 [ffff9b8219353d88] schedule_timeout at ffffffff9a5b5778
 #3 [ffff9b8219353e38] llt_cv_wait_timeout at ffffffffc0e82216 [llt]
 #4 [ffff9b8219353e98] llt_hb_thread at ffffffffc0e951ae [llt]
 #5 [ffff9b8219353ec8] kthread at ffffffff99ecb621
 #6 [ffff9b8219353f50] ret_from_fork_nospec_begin at ffffffff9a5c51dd

crash> bt 3738
PID: 3738     TASK: ffff9b821934b180  CPU: 5    COMMAND: "llt_hb/5"
 #0 [ffff9b821935fd10] __schedule at ffffffff9a5b78d8
 #1 [ffff9b821935fd78] schedule at ffffffff9a5b7ca9
 #2 [ffff9b821935fd88] schedule_timeout at ffffffff9a5b5778
 #3 [ffff9b821935fe38] llt_cv_wait_timeout at ffffffffc0e82216 [llt]
 #4 [ffff9b821935fe98] llt_hb_thread at ffffffffc0e951ae [llt]
 #5 [ffff9b821935fec8] kthread at ffffffff99ecb621
 #6 [ffff9b821935ff50] ret_from_fork_nospec_begin at ffffffff9a5c51dd

crash> bt 3738
PID: 3738     TASK: ffff9b821934b180  CPU: 5    COMMAND: "llt_hb/5"
 #0 [ffff9b821935fd10] __schedule at ffffffff9a5b78d8
 #1 [ffff9b821935fd78] schedule at ffffffff9a5b7ca9
 #2 [ffff9b821935fd88] schedule_timeout at ffffffff9a5b5778
 #3 [ffff9b821935fe38] llt_cv_wait_timeout at ffffffffc0e82216 [llt]
 #4 [ffff9b821935fe98] llt_hb_thread at ffffffffc0e951ae [llt]
 #5 [ffff9b821935fec8] kthread at ffffffff99ecb621
 #6 [ffff9b821935ff50] ret_from_fork_nospec_begin at ffffffff9a5c51dd

crash> foreach UN bt | awk '/#4 / { print $3,$5 }' | sort | uniq -c | sort -nr | head -4
      8 llt_hb_thread ffffffffc0e951ae
  • Dis-assembly:
crash> sym llt_hb_thread
ffffffffc0e95010 (t) llt_hb_thread [llt]  <<--
  • Third party Modules:
crash> mod -t
NAME                      TAINTS
mfe_aac_100712495         OE
vxspec                    POE
veki                      POE
vxfs                      POE
vxportal                  POE
vxcafs                    POE
dmpaa                     POE
fdd                       POE
amf                       POE
llt                       POE  <<--

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments