System crashes when paths to the FC storage array are flapping.

Solution Verified - Updated -

Issue

  • System crashes when paths to the FC storage array are flapping.
Jan  8 19:08:17 hostname kernel: lpfc 0000:3b:00.1: 1:(0):0713 SCSI layer issued Device Reset (3, 0) return x2002
Jan  8 19:08:17 hostname kernel: sd 16:0:3:0: Power-on or device reset occurred
Jan  8 19:08:17 hostname kernel: lpfc 0000:3b:00.1: 1:(0):0713 SCSI layer issued Device Reset (3, 2) return x2002
Jan  8 19:08:17 hostname kernel: sd 16:0:3:2: Power-on or device reset occurred
Jan  8 19:08:17 hostname kernel: lpfc 0000:3b:00.1: 1:(0):0713 SCSI layer issued Device Reset (3, 4) return x2002
Jan  8 19:08:17 hostname kernel: scsi 16:0:3:4: Power-on or device reset occurred
Jan  8 19:08:17 hostname kernel: scsi 16:0:3:4: Parameters changed
Jan  8 19:08:18 hostname kernel: lpfc 0000:3b:00.1: 1:(0):0723 SCSI layer issued Target Reset (1, 1) return x2002
Jan  8 19:08:18 hostname kernel: sd 16:0:1:1: Power-on or device reset occurred
Jan  8 19:08:28 hostname kernel: lpfc 0000:3b:00.1: 1:(0):0714 SCSI layer issued Bus Reset Data: x2002
Jan  8 19:08:30 hostname systemd-udevd: worker [55154] /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host16/rport-16:0-3/target16:0:0/16:0:0:0/block/sdda is taking a long time
Jan  8 19:08:37 hostname kernel: VXFEN INFO V-11-1-36 VxFEN configured at protocol version 30
Jan  8 19:08:48 hostname kernel: lpfc 0000:3b:00.1: 1:(0):3172 SCSI layer issued Host Reset Data:
Jan  8 19:08:58 hostname kernel: rport-16:0-4: blocked FC remote port time out: removing target and saving binding
Jan  8 19:08:58 hostname kernel: sd 16:0:1:3: rejecting I/O to offline device
Jan  8 19:08:58 hostname kernel: sd 16:0:1:3: killing request
Jan  8 19:08:58 hostname kernel: sd 16:0:1:4: rejecting I/O to offline device
Jan  8 19:08:58 hostname kernel: sd 16:0:1:4: [sdcy] killing request
Jan  8 19:08:58 hostname kernel: rport-16:0-3: blocked FC remote port time out: removing target and saving binding
Jan  8 19:08:58 hostname kernel: sd 16:0:1:4: [sdcy] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=158s
Jan  8 19:08:58 hostname kernel: sd 16:0:0:1: rejecting I/O to offline device
[......]
Jan  8 19:09:18 hostname kernel: lpfc 0000:3b:00.1: 1:1303 Link Up Event x1 received Data: x1 x0 x10 x0 x0 x0 0
[......]
Jan  8 19:10:22 hostname kernel: lpfc 0000:3b:00.1: 1:(0):2753 PLOGI failure DID:32C980 Status:x3/x31420002
Jan  8 19:10:22 hostname kernel: lpfc 0000:3b:00.1: 1:(0):2753 PLOGI failure DID:323C40 Status:x3/x31420002
Jan  8 19:10:22 hostname kernel: lpfc 0000:3b:00.1: 1:(0):2753 PLOGI failure DID:18FB00 Status:x3/x31420002
Jan  8 19:10:22 hostname kernel: lpfc 0000:3b:00.1: 1:(0):2753 PLOGI failure DID:189F00 Status:x3/x31420002
[......]
Jan  8 19:17:16 hostname kcm[9460]: shutting down
Jan  8 19:17:16 hostname systemd-logind: Removed session c5.
Jan  8 19:17:16 hostname sshd[9665]: Received signal 15; terminating.
Jan  8 19:17:16 hostname mono: /opt/nnt/gen7agent/bin/Gen7Agent.Service.exe: Stopping service
Jan  8 19:17:16 hostname ntpd[9923]: ntpd exiting on signal 15 

From vmcore :

        CPUS: 24
        DATE: Sun Jan  8 17:20:00 +08 2023
      UPTIME: 28 days, 11:28:12
LOAD AVERAGE: 2.24, 2.17, 1.89
       TASKS: 28190
    NODENAME: hostname
     RELEASE: 3.10.0-1160.80.1.el7.x86_64
     VERSION: #1 SMP Sat Oct 8 18:13:21 UTC 2022
     MACHINE: x86_64  (3400 Mhz)
      MEMORY: 511.5 GB
       PANIC: "BUG: unable to handle kernel paging request at 000000820000019c"
         PID: 59237
     COMMAND: "vx_worklist_thr"
        TASK: ffff980135f25280  [THREAD_INFO: ffff980135f5c000]
         CPU: 19
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 59237    TASK: ffff980135f25280  CPU: 19   COMMAND: "vx_worklist_thr"
 #0 [ffff980135f5f9b0] machine_kexec at ffffffffa4069504
 #1 [ffff980135f5fa10] __crash_kexec at ffffffffa4129d32
 #2 [ffff980135f5fae0] crash_kexec at ffffffffa4129e28
 #3 [ffff980135f5faf8] oops_end at ffffffffa47bc818
 #4 [ffff980135f5fb20] no_context at ffffffffa407970c
 #5 [ffff980135f5fb70] __bad_area_nosemaphore at ffffffffa40799ea
 #6 [ffff980135f5fbc0] bad_area_nosemaphore at ffffffffa4079b14
 #7 [ffff980135f5fbd0] __do_page_fault at ffffffffa47bf8d0
 #8 [ffff980135f5fc40] do_page_fault at ffffffffa47bfb05
 #9 [ffff980135f5fc70] page_fault at ffffffffa47bb7b8
    [exception RIP: vx_iflush_list+155]                                      <<< this is where the system crashed
    RIP: ffffffffc113958b  RSP: ffff980135f5fd28  RFLAGS: 00010046
    RAX: 0000000004080000  RBX: 0000008200000100  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: 0000000000000004  RDI: ffff97e1fab9ed00
    RBP: ffff980135f5fd88   R8: 0000000000000000   R9: 0000000000000001
    R10: 00000000e129f501  R11: ffff97f198b924a8  R12: 0000000000000001
    R13: 0000000000000000  R14: ffff9801ccac5e70  R15: ffff9801ccac5e40
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff980135f5fd90] vx_iflush at ffffffffc113a1d2 [vxfs]
#11 [ffff980135f5fe28] vx_workitem_process at ffffffffc1138fbc [vxfs]
#12 [ffff980135f5fe48] vx_worklist_process at ffffffffc1141fe5 [vxfs]
bt#13 [ffff980135f5fe78] vx_worklist_thread at ffffffffc1142098 [vxfs]
#14 [ffff980135f5fea8] vx_kthread_init at ffffffffc11ce5d7 [vxfs]
#15 [ffff980135f5fec8] kthread at ffffffffa40cb4d1
#16 [ffff980135f5ff50] ret_from_fork_nospec_begin at ffffffffa47c51dd

Environment

  • Red Hat Enterprise Linux 7.9
    • kernel-3.10.0-1160.80.1.el7.x86_64

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content