DM-Multipath incorrectly grouped the paths from 2 different devices during storage side reconfiguration

Solution Verified - Updated -

Issue

  • After a reconfiguration on IBM SVC it was observed that two different SAN devices were grouped under the same multipath device map as seen in following snip:

    Below were the 2 separate multipath devices before this reconfiguration was done:

    mpathb (3600bbbbbbbbbbbbbbbbbbbbbbbbbbbbb) dm-337 IBM,2145
    size=480G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='round-robin 0' prio=50 status=active
    | |- 1:0:10:108 sdbsa 67:1888  active ready running
    | |- 4:0:8:108  sdbsi 67:2016  active ready running
    | |- 1:0:9:108  sdbse 67:1952  active ready running
    | `- 4:0:11:108 sdbsk 68:1792  active ready running
    `-+- policy='round-robin 0' prio=10 status=enabled
      |- 1:0:17:108 sdbsx 68:2000  active ready running
      |- 4:0:15:108 sdbsm 68:1824  active ready running
      |- 1:0:16:108 sdbsc 67:1920  active ready running
      `- 4:0:16:108 sdbso 68:1856  active ready running
    
    mpathc (3600ccccccccccccccccccccccccccccc) dm-346 IBM,2145
    size=480G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='round-robin 0' prio=50 status=active
    | |- 1:0:11:2   sdbgj 8:1648   active ready running
    | |- 4:0:10:2   sdbqp 65:1808  active ready running
    | |- 1:0:8:2    sdbqg 8:1920   active ready running
    | `- 4:0:9:2    sdbrq 66:1984  active ready running
    `-+- policy='round-robin 0' prio=10 status=enabled
      |- 1:0:14:2   sdbgs 65:1536  active ready running
      |- 4:0:12:2   sdbqy 65:1952  active ready running
      |- 1:0:15:2   sdbpx 135:1776 active ready running
      `- 4:0:17:2   sdbrh 66:1840  active ready running
    
  • After the reconfiguration from Storage side, multipath -ll command was showing paths from above 2 devices clubbed together. This had resulted in IO on incorrect devices and the database was crashed:

    mpathc (3600ccccccccccccccccccccccccccccc) dm-346 IBM,2145
    size=480G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='round-robin 0' prio=50 status=active
    | |- 1:0:11:2   sdbgj 8:1648   active ready running
    | |- 4:0:10:2   sdbqp 65:1808  active ready running
    | |- 1:0:8:2    sdbqg 8:1920   active ready running
    | |- 4:0:9:2    sdbrq 66:1984  active ready running
    | |- 1:0:10:108 sdbsa 67:1888  active ready running  <---problem
    | |- 4:0:8:108  sdbsi 67:2016  active ready running  <---problem
    | |- 1:0:9:108  sdbse 67:1952  active ready running  <---problem
    | `- 4:0:11:108 sdbsk 68:1792  active ready running  <---problem
    `-+- policy='round-robin 0' prio=10 status=enabled
      |- 1:0:14:2   sdbgs 65:1536  active ready running
      |- 4:0:12:2   sdbqy 65:1952  active ready running
      |- 1:0:15:2   sdbpx 135:1776 active ready running
      |- 4:0:17:2   sdbrh 66:1840  active ready running
      |- 1:0:16:108 sdbsc 67:1920  active ready running  <---problem
      |- 4:0:15:108 sdbsm 68:1824  active ready running  <---problem
      `- 4:0:16:108 sdbso 68:1856  active ready running  <---problem
    
  • This issue was fixed after flushing the affected multipath devices with following command and then re-scanning it:

    $ multipath -f <multipath-device-name>
    $ multipath -v2
    $ multipath -ll
    

Environment

  • Red Hat Enterprise Linux 6, 7
  • device-mapper-multipath

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content