After qla2xxx "Mailbox command timeout occured" error, adapter fails to recover

Solution Unverified - Updated -

Issue

  • First indication of problem in messages: "qla2xxx 0000:0f:00.0: Mailbox command timeout occured. Scheduling ISP abort."
  • After above error, the system can no longer see SAN connections or reload qla2xxx drivers
  • qla2xxx recovery process fails to recover adapter at this point. Unload/reload of the driver works, but in XEN environment this fails due to memory allocation problem at load time (not enough contiguous memory available within kernel)
  • When able to unload/reload driver problems is fixed
Jun 21 19:50:16 node kernel: qla2xxx 0000:0f:00,0: Mailbox command timeout occured, Scheduling ISP abort, eeh_busy: 0x0
Jun 21 19:50:16 node kernel: qla2xxx 0000:0f:00,0: Performing ISP error recovery - ha= ffff88012a1584f8,
Jun 21 19:50:17 node kernel: qla2xxx 0000:0f:00,0: LIP reset occured (f7f7),
Jun 21 19:50:17 node kernel: qla2xxx 0000:0f:00,0: LIP occured (f7f7),
Jun 21 19:50:17 node kernel: qla2xxx 0000:0f:00,0: LIP reset occured (f7f7),
Jun 21 19:50:17 node kernel: qla2xxx 0000:0f:00,0: LOOP UP detected (4 Gbps),
Jun 21 19:50:53 node kernel: qla2xxx 0000:0f:00,0: Mailbox command timeout occured, Issuing ISP abort,
Jun 21 19:50:53 node kernel: qla2xxx 0000:0f:00,0: Performing ISP error recovery - ha= ffff88012a1584f8,
Jun 21 19:51:23 node kernel: qla2xxx 0000:0f:00,0: Failed mailbox send register test
Jun 21 19:51:53 node kernel:  rport-8:0-0: blocked FC remote port time out: saving binding
Jun 21 19:51:53 node kernel: qla2xxx 0000:0f:00,0: [ERROR] Failed to load segment 0 of firmware
Jun 21 19:52:23 node kernel: qla2xxx 0000:0f:00,0: [ERROR] Failed to load segment 0 of firmware
Jun 21 19:53:53 node kernel: qla2xxx 0000:0f:00,0: scsi(8:0:0): Abort command issued -- 0 1 2002,
Jun 21 19:53:53 node kernel: qla2xxx 0000:0f:00,0: scsi(8:0:0): DEVICE RESET ISSUED,
  • In the event of Qlogic firmware recovery process, CPU hard lockup could be observed and on X86_64 machines, NMI watchdog timer may kick in, causing system reset.
qla2xxx 0000:06:01.0: [ERROR] Failed to load segment 0 of firmware
qla2xxx 0000:06:01.0: qla2x00_abort_isp: **** FAILED ****
qla2xxx 0000:06:01.0: Performing ISP error recovery - ha= ffff8101fed3c4f8.
qla2xxx 0000:06:01.0: [ERROR] Failed to load segment 0 of firmware
qla2xxx 0000:06:01.0: [ERROR] Failed to load segment 0 of firmware
qla2xxx 0000:06:01.0: qla2x00_abort_isp: **** FAILED ****
qla2xxx 0000:06:01.0: Performing ISP error recovery - ha= ffff8101fed3c4f8.
qla2xxx 0000:06:01.0: scsi(2:0:3): DEVICE RESET ISSUED.
qla2xxx 0000:06:01.0: Failed mailbox send register test
qla2xxx 0000:06:01.0: qla2x00_abort_isp: **** FAILED ****
qla2xxx 0000:06:01.0: Performing ISP error recovery - ha= ffff8101fed3c4f8.
NMI Watchdog detected LOCKUP on CPU 1
PID: 602    TASK: ffff8103ffae9830  CPU: 1   COMMAND: "qla2xxx_2_dpc"
 #0 [ffff81020710cdc0] crash_kexec at ffffffff800b127a
 #1 [ffff81020710ce80] die_nmi at ffffffff80065285
 #2 [ffff81020710cea0] nmi_watchdog_tick at ffffffff80065a66
 #3 [ffff81020710cef0] default_do_nmi at ffffffff80065609
 #4 [ffff81020710cf40] do_nmi at ffffffff800658f1
 #5 [ffff81020710cf50] nmi at ffffffff80064ecf
    [exception RIP: __delay+10]
    RIP: ffffffff8000cb01  RSP: ffff8101feefddd8  RFLAGS: 00000002
    RAX: 0000000000002201  RBX: ffff8101fed3c4f8  RCX: 00000000b99134b9
    RDX: 0000000000164179  RSI: 0000000000000046  RDI: 000000000000329e
    RBP: ffffc2000018c000   R8: 0000000000000002   R9: ffff8101feefdda4
    R10: 0000000000000004  R11: ffffffff8022f8cf  R12: 0000000000009411
    R13: 0000000000000246  R14: ffff8101ffd81be8  R15: ffffffff800a3ab7
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #6 [ffff8101feefddd8] __delay at ffffffff8000cb01
 #7 [ffff8101feefddd8] qla24xx_reset_chip at ffffffff88153302 [qla2xxx]
 #8 [ffff8101feefde18] qla2x00_abort_isp_cleanup at ffffffff8814f6b4 [qla2xxx]
 #9 [ffff8101feefde38] qla2x00_abort_isp at ffffffff88152a34 [qla2xxx]
#10 [ffff8101feefde68] qla2x00_do_dpc at ffffffff8814c55a [qla2xxx]
#11 [ffff8101feefdee8] kthread at ffffffff80032c23
#12 [ffff8101feefdf48] kernel_thread at ffffffff8005dfc1

Environment

  • Red Hat Enterprise Linux 5.4 XEN
  • Red Hat Enterprise Linux 5.1
  • Red Hat Enterprise Linux 5.9
  • Qlogic driver

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In