Device mapper multipath path failovers are taking long time than expected for the paths through SAS HBA controller

Solution Verified - Updated -

Issue

  • Device mapper multipath path failovers are taking long time than expected for the paths through SAS HBA controller.
  • We have configured DM-multipathing for two SAS paths. When one of the SAS paths fail, device mapper multipathing takes around 30s to activate the other path and hence lose 30s of I/O to LUNs. It should ideally take no time to activate alternate path and resume I/O.
  • Writes hang when one SAS cable is pulled out and write continues on another path after 30s with loss of 30s data.
    Jan  6 11:41:49 node1 kernel: mpt2sas0: Discovery: (start)
    Jan  6 11:41:49 node1 kernel: mpt2sas0: discovery event: (start)
    Jan  6 11:41:49 node1 kernel: mpt2sas0: Device Status Change
    Jan  6 11:41:49 node1 kernel: mpt2sas0: device status change: (internal device reset)
    Jan  6 11:41:49 node1 kernel:        handle(0x000a), sas address(0x5005076803689e39), tag(65535)
    Jan  6 11:41:49 node1 kernel: 
    Jan  6 11:41:49 node1 kernel: mpt2sas0: SAS Topology Change List
    Jan  6 11:41:49 node1 kernel: sd 3:0:4:0: <6>mpt2sas0: SDEV_BLOCK: handle(0x000a)
    Jan  6 11:41:49 node1 kernel: sd 3:0:4:1: <6>mpt2sas0: SDEV_BLOCK: handle(0x000a)
    Jan  6 11:41:49 node1 kernel: mpt2sas0: sas topology change: (responding)
    Jan  6 11:41:49 node1 kernel:        handle(0x0000), enclosure_handle(0x0001) start_phy(04), count(4)
    Jan  6 11:41:49 node1 kernel:        phy(04), attached_handle(0x000a): delay target remove: link rate: new(0x00), old(0x0a)
    Jan  6 11:41:49 node1 kernel:        phy(05), attached_handle(0x000a): link rate change: link rate: new(0x00), old(0x0a)
    Jan  6 11:41:49 node1 kernel:        phy(06), attached_handle(0x000a): link rate change: link rate: new(0x00), old(0x0a)
    Jan  6 11:41:49 node1 kernel:        phy(07), attached_handle(0x000a): link rate change: link rate: new(0x00), old(0x0a)
    Jan  6 11:41:49 node1 kernel: mpt2sas0: updating handles for sas_host(0x500605b004867c00)
    Jan  6 11:41:49 node1 kernel: mpt2sas0: Device Status Change
    Jan  6 11:41:49 node1 kernel: mpt2sas0: device status change: (internal device reset complete)
    Jan  6 11:41:49 node1 kernel:        handle(0x000a), sas address(0x5005076803689e39), tag(65535)
    Jan  6 11:41:49 node1 kernel: 
    Jan  6 11:41:49 node1 kernel: mpt2sas0: Discovery: (stop)
    Jan  6 11:41:49 node1 kernel: mpt2sas0: discovery event: (stop)
    Jan  6 11:42:19 node1 multipathd: 8:48: mark as failed
    Jan  6 11:42:19 node1 multipathd: dsi_part2: remaining active paths: 1  <------failed path detected after 30 seconds by multipathd
    Jan  6 11:42:19 node1 kernel: mpt2sas0: Discovery: (start)
    Jan  6 11:42:19 node1 kernel: mpt2sas0: SAS Topology Change List
    Jan  6 11:42:19 node1 kernel: mpt2sas0: setting delete flag: handle(0x000a), sas_addr(0x5005076803689e39)
    Jan  6 11:42:19 node1 kernel: sd 3:0:4:0: <6>mpt2sas0: SDEV_RUNNING: sas address(0x5005076803689e39)
    Jan  6 11:42:19 node1 kernel: sd 3:0:4:1: <6>mpt2sas0: SDEV_RUNNING: sas address(0x5005076803689e39)
    Jan  6 11:42:19 node1 kernel: mpt2sas0: tr_send:handle(0x000a), (open), smid(7932), cb(7)
    Jan  6 11:42:19 node1 kernel: sd 3:0:4:0: [sdc] Done: SUCCESS
    Jan  6 11:42:19 node1 kernel: sd 3:0:4:0: [sdc]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
    Jan  6 11:42:19 node1 kernel: sd 3:0:4:0: [sdc] CDB: Test Unit Ready: 00 00 00 00 00 00

Environment

  • Red Hat Enterprise Linux (RHEL) 6
    • DM-Multipath
    • LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.