Device mapper multipath path failovers are taking long time than expected for the paths through SAS HBA controller
Issue
- Device mapper multipath path failovers are taking long time than expected for the paths through
SAS HBA controller. - We have configured DM-multipathing for two SAS paths. When one of the SAS paths fail, device mapper multipathing takes around 30s to activate the other path and hence lose 30s of I/O to LUNs. It should ideally take no time to activate alternate path and resume I/O.
- Writes hang when one SAS cable is pulled out and write continues on another path after 30s with loss of 30s data.
Jan 6 11:41:49 node1 kernel: mpt2sas0: Discovery: (start)
Jan 6 11:41:49 node1 kernel: mpt2sas0: discovery event: (start)
Jan 6 11:41:49 node1 kernel: mpt2sas0: Device Status Change
Jan 6 11:41:49 node1 kernel: mpt2sas0: device status change: (internal device reset)
Jan 6 11:41:49 node1 kernel: handle(0x000a), sas address(0x5005076803689e39), tag(65535)
Jan 6 11:41:49 node1 kernel:
Jan 6 11:41:49 node1 kernel: mpt2sas0: SAS Topology Change List
Jan 6 11:41:49 node1 kernel: sd 3:0:4:0: <6>mpt2sas0: SDEV_BLOCK: handle(0x000a)
Jan 6 11:41:49 node1 kernel: sd 3:0:4:1: <6>mpt2sas0: SDEV_BLOCK: handle(0x000a)
Jan 6 11:41:49 node1 kernel: mpt2sas0: sas topology change: (responding)
Jan 6 11:41:49 node1 kernel: handle(0x0000), enclosure_handle(0x0001) start_phy(04), count(4)
Jan 6 11:41:49 node1 kernel: phy(04), attached_handle(0x000a): delay target remove: link rate: new(0x00), old(0x0a)
Jan 6 11:41:49 node1 kernel: phy(05), attached_handle(0x000a): link rate change: link rate: new(0x00), old(0x0a)
Jan 6 11:41:49 node1 kernel: phy(06), attached_handle(0x000a): link rate change: link rate: new(0x00), old(0x0a)
Jan 6 11:41:49 node1 kernel: phy(07), attached_handle(0x000a): link rate change: link rate: new(0x00), old(0x0a)
Jan 6 11:41:49 node1 kernel: mpt2sas0: updating handles for sas_host(0x500605b004867c00)
Jan 6 11:41:49 node1 kernel: mpt2sas0: Device Status Change
Jan 6 11:41:49 node1 kernel: mpt2sas0: device status change: (internal device reset complete)
Jan 6 11:41:49 node1 kernel: handle(0x000a), sas address(0x5005076803689e39), tag(65535)
Jan 6 11:41:49 node1 kernel:
Jan 6 11:41:49 node1 kernel: mpt2sas0: Discovery: (stop)
Jan 6 11:41:49 node1 kernel: mpt2sas0: discovery event: (stop)
Jan 6 11:42:19 node1 multipathd: 8:48: mark as failed
Jan 6 11:42:19 node1 multipathd: dsi_part2: remaining active paths: 1 <------failed path detected after 30 seconds by multipathd
Jan 6 11:42:19 node1 kernel: mpt2sas0: Discovery: (start)
Jan 6 11:42:19 node1 kernel: mpt2sas0: SAS Topology Change List
Jan 6 11:42:19 node1 kernel: mpt2sas0: setting delete flag: handle(0x000a), sas_addr(0x5005076803689e39)
Jan 6 11:42:19 node1 kernel: sd 3:0:4:0: <6>mpt2sas0: SDEV_RUNNING: sas address(0x5005076803689e39)
Jan 6 11:42:19 node1 kernel: sd 3:0:4:1: <6>mpt2sas0: SDEV_RUNNING: sas address(0x5005076803689e39)
Jan 6 11:42:19 node1 kernel: mpt2sas0: tr_send:handle(0x000a), (open), smid(7932), cb(7)
Jan 6 11:42:19 node1 kernel: sd 3:0:4:0: [sdc] Done: SUCCESS
Jan 6 11:42:19 node1 kernel: sd 3:0:4:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jan 6 11:42:19 node1 kernel: sd 3:0:4:0: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00
Environment
- Red Hat Enterprise Linux (RHEL) 6
- DM-Multipath
- LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.