Emulex HBA unresponsive during error recovery

Solution Unverified - Updated -

Issue

  • lpfc driver takes a very long time to handle errors.
  • IO loss / lpfc events causes filesystems to be unusable
Oct  7 21:29:20 hostname kernel: lpfc 0000:04:00.3: 1:0338 IOCB wait timeout error - no wake response Data x3c
Oct  7 21:29:20 hostname kernel: lpfc 0000:04:00.3: 1:(0):0727 TMF FCP_LUN_RESET to TGT 0 LUN 0 failed (0, 0) iocb_flag x204
Oct  7 21:29:20 hostname kernel: lpfc 0000:04:00.3: 1:(0):0713 SCSI layer issued Device Reset (0, 0) return x2007
Oct  7 21:29:22 hostname kernel: lpfc 0000:04:00.2: 0:0338 IOCB wait timeout error - no wake response Data x3c
Oct  7 21:29:22 hostname kernel: lpfc 0000:04:00.2: 0:(0):0727 TMF FCP_LUN_RESET to TGT 0 LUN 0 failed (0, 0) iocb_flag x204
Oct  7 21:29:22 hostname kernel: lpfc 0000:04:00.2: 0:(0):0713 SCSI layer issued Device Reset (0, 0) return x2007
Oct  7 21:30:30 hostname kernel: lpfc 0000:04:00.3: 1:0338 IOCB wait timeout error - no wake response Data x3c
Oct  7 21:30:30 hostname kernel: lpfc 0000:04:00.3: 1:(0):0727 TMF FCP_LUN_RESET to TGT 0 LUN 0 failed (0, 0) iocb_flag x204
Oct  7 21:30:30 hostname kernel: lpfc 0000:04:00.3: 1:(0):0713 SCSI layer issued Device Reset (0, 0) return x2007
Oct  7 21:30:32 hostname kernel: lpfc 0000:04:00.2: 0:0338 IOCB wait timeout error - no wake response Data x3c
Oct  7 21:30:32 hostname kernel: lpfc 0000:04:00.2: 0:(0):0727 TMF FCP_LUN_RESET to TGT 0 LUN 0 failed (0, 0) iocb_flag x204
Oct  7 21:30:32 hostname kernel: lpfc 0000:04:00.2: 0:(0):0713 SCSI layer issued Device Reset (0, 0) return x2007
Oct  7 21:31:30 hostname kernel: lpfc 0000:04:00.3: 1:0338 IOCB wait timeout error - no wake response Data x3c
Oct  7 21:31:30 hostname kernel: lpfc 0000:04:00.3: 1:(0):0727 TMF FCP_LUN_RESET to TGT 0 LUN 17 failed (0, 0) iocb_flag x204
Oct  7 21:31:30 hostname kernel: lpfc 0000:04:00.3: 1:(0):0713 SCSI layer issued Device Reset (0, 17) return x2007
Oct  7 21:31:32 hostname kernel: lpfc 0000:04:00.2: 0:0338 IOCB wait timeout error - no wake response Data x3c

[.... ]
Oct  7 21:32:42 166-L-DB-1A kernel: lpfc 0000:04:00.2: 0:(0):0713 SCSI layer issued Device Reset (0, 0) return x2007
Oct  7 21:33:42 166-L-DB-1A multipathd: vg_oracle1_1: sda - tur checker reports path is down
Oct  7 21:33:42 166-L-DB-1A multipathd: checker failed path 8:0 in map vg_oracle1_1
Oct  7 21:33:42 166-L-DB-1A kernel: lpfc 0000:04:00.2: 0:0338 IOCB wait timeout error - no wake response Data x3c
Oct  7 21:33:42 166-L-DB-1A kernel: lpfc 0000:04:00.2: 0:(0):0727 TMF FCP_TARGET_RESET to TGT 0 LUN 0 failed (0, 0) iocb_flag x204
Oct  7 21:33:42 166-L-DB-1A kernel: lpfc 0000:04:00.2: 0:(0):0700 Bus Reset on target 0 failed
Oct  7 21:33:42 166-L-DB-1A kernel: lpfc 0000:04:00.2: 0:(0):0714 SCSI layer issued Bus Reset Data: x2003
Oct  7 21:33:42 166-L-DB-1A kernel: sd 0:0:0:0: Device offlined - not ready after error recovery   <<<<<<<<<
Oct  7 21:33:42 166-L-DB-1A kernel: sd 0:0:0:0: Device offlined - not ready after error recovery
Oct  7 21:33:42 166-L-DB-1A kernel: sd 0:0:0:0: Device offlined - not ready after error recovery
Oct  7 21:33:42 166-L-DB-1A kernel: sd 0:0:0:0: [sda] Unhandled error code
Oct  7 21:33:42 166-L-DB-1A kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_REQUEUE driverbyte=DRIVER_OK
Oct  7 21:33:42 166-L-DB-1A kernel: sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 12 c7 f1 60 00 00 08 00
Oct  7 21:33:42 166-L-DB-1A kernel: end_request: I/O error, dev sda, sector 315093344

Environment

  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 5.8
  • lpfc driver
  • FCoE, Fibre Channel

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content