System crashes when paths to the FC storage array are flapping.
Issue
- System crashes when paths to the FC storage array are flapping.
Jan 8 19:08:17 hostname kernel: lpfc 0000:3b:00.1: 1:(0):0713 SCSI layer issued Device Reset (3, 0) return x2002
Jan 8 19:08:17 hostname kernel: sd 16:0:3:0: Power-on or device reset occurred
Jan 8 19:08:17 hostname kernel: lpfc 0000:3b:00.1: 1:(0):0713 SCSI layer issued Device Reset (3, 2) return x2002
Jan 8 19:08:17 hostname kernel: sd 16:0:3:2: Power-on or device reset occurred
Jan 8 19:08:17 hostname kernel: lpfc 0000:3b:00.1: 1:(0):0713 SCSI layer issued Device Reset (3, 4) return x2002
Jan 8 19:08:17 hostname kernel: scsi 16:0:3:4: Power-on or device reset occurred
Jan 8 19:08:17 hostname kernel: scsi 16:0:3:4: Parameters changed
Jan 8 19:08:18 hostname kernel: lpfc 0000:3b:00.1: 1:(0):0723 SCSI layer issued Target Reset (1, 1) return x2002
Jan 8 19:08:18 hostname kernel: sd 16:0:1:1: Power-on or device reset occurred
Jan 8 19:08:28 hostname kernel: lpfc 0000:3b:00.1: 1:(0):0714 SCSI layer issued Bus Reset Data: x2002
Jan 8 19:08:30 hostname systemd-udevd: worker [55154] /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host16/rport-16:0-3/target16:0:0/16:0:0:0/block/sdda is taking a long time
Jan 8 19:08:37 hostname kernel: VXFEN INFO V-11-1-36 VxFEN configured at protocol version 30
Jan 8 19:08:48 hostname kernel: lpfc 0000:3b:00.1: 1:(0):3172 SCSI layer issued Host Reset Data:
Jan 8 19:08:58 hostname kernel: rport-16:0-4: blocked FC remote port time out: removing target and saving binding
Jan 8 19:08:58 hostname kernel: sd 16:0:1:3: rejecting I/O to offline device
Jan 8 19:08:58 hostname kernel: sd 16:0:1:3: killing request
Jan 8 19:08:58 hostname kernel: sd 16:0:1:4: rejecting I/O to offline device
Jan 8 19:08:58 hostname kernel: sd 16:0:1:4: [sdcy] killing request
Jan 8 19:08:58 hostname kernel: rport-16:0-3: blocked FC remote port time out: removing target and saving binding
Jan 8 19:08:58 hostname kernel: sd 16:0:1:4: [sdcy] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=158s
Jan 8 19:08:58 hostname kernel: sd 16:0:0:1: rejecting I/O to offline device
[......]
Jan 8 19:09:18 hostname kernel: lpfc 0000:3b:00.1: 1:1303 Link Up Event x1 received Data: x1 x0 x10 x0 x0 x0 0
[......]
Jan 8 19:10:22 hostname kernel: lpfc 0000:3b:00.1: 1:(0):2753 PLOGI failure DID:32C980 Status:x3/x31420002
Jan 8 19:10:22 hostname kernel: lpfc 0000:3b:00.1: 1:(0):2753 PLOGI failure DID:323C40 Status:x3/x31420002
Jan 8 19:10:22 hostname kernel: lpfc 0000:3b:00.1: 1:(0):2753 PLOGI failure DID:18FB00 Status:x3/x31420002
Jan 8 19:10:22 hostname kernel: lpfc 0000:3b:00.1: 1:(0):2753 PLOGI failure DID:189F00 Status:x3/x31420002
[......]
Jan 8 19:17:16 hostname kcm[9460]: shutting down
Jan 8 19:17:16 hostname systemd-logind: Removed session c5.
Jan 8 19:17:16 hostname sshd[9665]: Received signal 15; terminating.
Jan 8 19:17:16 hostname mono: /opt/nnt/gen7agent/bin/Gen7Agent.Service.exe: Stopping service
Jan 8 19:17:16 hostname ntpd[9923]: ntpd exiting on signal 15
From vmcore :
CPUS: 24
DATE: Sun Jan 8 17:20:00 +08 2023
UPTIME: 28 days, 11:28:12
LOAD AVERAGE: 2.24, 2.17, 1.89
TASKS: 28190
NODENAME: hostname
RELEASE: 3.10.0-1160.80.1.el7.x86_64
VERSION: #1 SMP Sat Oct 8 18:13:21 UTC 2022
MACHINE: x86_64 (3400 Mhz)
MEMORY: 511.5 GB
PANIC: "BUG: unable to handle kernel paging request at 000000820000019c"
PID: 59237
COMMAND: "vx_worklist_thr"
TASK: ffff980135f25280 [THREAD_INFO: ffff980135f5c000]
CPU: 19
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 59237 TASK: ffff980135f25280 CPU: 19 COMMAND: "vx_worklist_thr"
#0 [ffff980135f5f9b0] machine_kexec at ffffffffa4069504
#1 [ffff980135f5fa10] __crash_kexec at ffffffffa4129d32
#2 [ffff980135f5fae0] crash_kexec at ffffffffa4129e28
#3 [ffff980135f5faf8] oops_end at ffffffffa47bc818
#4 [ffff980135f5fb20] no_context at ffffffffa407970c
#5 [ffff980135f5fb70] __bad_area_nosemaphore at ffffffffa40799ea
#6 [ffff980135f5fbc0] bad_area_nosemaphore at ffffffffa4079b14
#7 [ffff980135f5fbd0] __do_page_fault at ffffffffa47bf8d0
#8 [ffff980135f5fc40] do_page_fault at ffffffffa47bfb05
#9 [ffff980135f5fc70] page_fault at ffffffffa47bb7b8
[exception RIP: vx_iflush_list+155] <<< this is where the system crashed
RIP: ffffffffc113958b RSP: ffff980135f5fd28 RFLAGS: 00010046
RAX: 0000000004080000 RBX: 0000008200000100 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff97e1fab9ed00
RBP: ffff980135f5fd88 R8: 0000000000000000 R9: 0000000000000001
R10: 00000000e129f501 R11: ffff97f198b924a8 R12: 0000000000000001
R13: 0000000000000000 R14: ffff9801ccac5e70 R15: ffff9801ccac5e40
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff980135f5fd90] vx_iflush at ffffffffc113a1d2 [vxfs]
#11 [ffff980135f5fe28] vx_workitem_process at ffffffffc1138fbc [vxfs]
#12 [ffff980135f5fe48] vx_worklist_process at ffffffffc1141fe5 [vxfs]
bt#13 [ffff980135f5fe78] vx_worklist_thread at ffffffffc1142098 [vxfs]
#14 [ffff980135f5fea8] vx_kthread_init at ffffffffc11ce5d7 [vxfs]
#15 [ffff980135f5fec8] kthread at ffffffffa40cb4d1
#16 [ffff980135f5ff50] ret_from_fork_nospec_begin at ffffffffa47c51dd
Environment
- Red Hat Enterprise Linux 7.9
- kernel-3.10.0-1160.80.1.el7.x86_64
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.