After qla2xxx "Mailbox command timeout occured" error, adapter fails to recover

Solution Verified - Updated -

Issue

  • First indication of problem in messages: "qla2xxx 0000:0f:00.0: Mailbox command timeout occured. Scheduling ISP abort."
  • After above error, the system can no longer see SAN connections or reload qla2xxx drivers
  • qla2xxx recovery process fails to recover adapter at this point. Unload/reload of the driver works, but in XEN environment this fails due to memory allocation problem at load time (not enough contiguous memory available within kernel)
  • When able to unload/reload driver problems is fixed
Jun 21 19:50:16 node kernel: qla2xxx 0000:0f:00,0: Mailbox command timeout occured, Scheduling ISP abort, eeh_busy: 0x0
Jun 21 19:50:16 node kernel: qla2xxx 0000:0f:00,0: Performing ISP error recovery - ha= ffff88012a1584f8,
Jun 21 19:50:17 node kernel: qla2xxx 0000:0f:00,0: LIP reset occured (f7f7),
Jun 21 19:50:17 node kernel: qla2xxx 0000:0f:00,0: LIP occured (f7f7),
Jun 21 19:50:17 node kernel: qla2xxx 0000:0f:00,0: LIP reset occured (f7f7),
Jun 21 19:50:17 node kernel: qla2xxx 0000:0f:00,0: LOOP UP detected (4 Gbps),
Jun 21 19:50:53 node kernel: qla2xxx 0000:0f:00,0: Mailbox command timeout occured, Issuing ISP abort,
Jun 21 19:50:53 node kernel: qla2xxx 0000:0f:00,0: Performing ISP error recovery - ha= ffff88012a1584f8,
Jun 21 19:51:23 node kernel: qla2xxx 0000:0f:00,0: Failed mailbox send register test
Jun 21 19:51:53 node kernel:  rport-8:0-0: blocked FC remote port time out: saving binding
Jun 21 19:51:53 node kernel: qla2xxx 0000:0f:00,0: [ERROR] Failed to load segment 0 of firmware
Jun 21 19:52:23 node kernel: qla2xxx 0000:0f:00,0: [ERROR] Failed to load segment 0 of firmware
Jun 21 19:53:53 node kernel: qla2xxx 0000:0f:00,0: scsi(8:0:0): Abort command issued -- 0 1 2002,
Jun 21 19:53:53 node kernel: qla2xxx 0000:0f:00,0: scsi(8:0:0): DEVICE RESET ISSUED,
  • In the event of Qlogic firmware recovery process, CPU hard lockup could be observed and on X86_64 machines, NMI watchdog timer may kick in, causing system reset.
qla2xxx 0000:06:01.0: [ERROR] Failed to load segment 0 of firmware
qla2xxx 0000:06:01.0: qla2x00_abort_isp: **** FAILED ****
qla2xxx 0000:06:01.0: Performing ISP error recovery - ha= ffff8101fed3c4f8.
qla2xxx 0000:06:01.0: [ERROR] Failed to load segment 0 of firmware
qla2xxx 0000:06:01.0: [ERROR] Failed to load segment 0 of firmware
qla2xxx 0000:06:01.0: qla2x00_abort_isp: **** FAILED ****
qla2xxx 0000:06:01.0: Performing ISP error recovery - ha= ffff8101fed3c4f8.
qla2xxx 0000:06:01.0: scsi(2:0:3): DEVICE RESET ISSUED.
qla2xxx 0000:06:01.0: Failed mailbox send register test
qla2xxx 0000:06:01.0: qla2x00_abort_isp: **** FAILED ****
qla2xxx 0000:06:01.0: Performing ISP error recovery - ha= ffff8101fed3c4f8.
NMI Watchdog detected LOCKUP on CPU 1
PID: 602    TASK: ffff8103ffae9830  CPU: 1   COMMAND: "qla2xxx_2_dpc"
 #0 [ffff81020710cdc0] crash_kexec at ffffffff800b127a
 #1 [ffff81020710ce80] die_nmi at ffffffff80065285
 #2 [ffff81020710cea0] nmi_watchdog_tick at ffffffff80065a66
 #3 [ffff81020710cef0] default_do_nmi at ffffffff80065609
 #4 [ffff81020710cf40] do_nmi at ffffffff800658f1
 #5 [ffff81020710cf50] nmi at ffffffff80064ecf
    [exception RIP: __delay+10]
    RIP: ffffffff8000cb01  RSP: ffff8101feefddd8  RFLAGS: 00000002
    RAX: 0000000000002201  RBX: ffff8101fed3c4f8  RCX: 00000000b99134b9
    RDX: 0000000000164179  RSI: 0000000000000046  RDI: 000000000000329e
    RBP: ffffc2000018c000   R8: 0000000000000002   R9: ffff8101feefdda4
    R10: 0000000000000004  R11: ffffffff8022f8cf  R12: 0000000000009411
    R13: 0000000000000246  R14: ffff8101ffd81be8  R15: ffffffff800a3ab7
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #6 [ffff8101feefddd8] __delay at ffffffff8000cb01
 #7 [ffff8101feefddd8] qla24xx_reset_chip at ffffffff88153302 [qla2xxx]
 #8 [ffff8101feefde18] qla2x00_abort_isp_cleanup at ffffffff8814f6b4 [qla2xxx]
 #9 [ffff8101feefde38] qla2x00_abort_isp at ffffffff88152a34 [qla2xxx]
#10 [ffff8101feefde68] qla2x00_do_dpc at ffffffff8814c55a [qla2xxx]
#11 [ffff8101feefdee8] kthread at ffffffff80032c23
#12 [ffff8101feefdf48] kernel_thread at ffffffff8005dfc1

Environment

  • Red Hat Enterprise Linux
  • Qlogic driver

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content