DM-Multipath incorrectly grouped the paths from 2 different devices during storage side reconfiguration

Solution Verified - Updated -

Issue

  • After a reconfiguration on IBM SVC it was observed that two different SAN devices were grouped under the same multipath device map as seen in following snip:

    Below were the 2 separate multipath devices before this reconfiguration was done:

    mpathb (3600bbbbbbbbbbbbbbbbbbbbbbbbbbbbb) dm-337 IBM,2145
    size=480G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='round-robin 0' prio=50 status=active
    | |- 1:0:10:108 sdbsa 67:1888  active ready running
    | |- 4:0:8:108  sdbsi 67:2016  active ready running
    | |- 1:0:9:108  sdbse 67:1952  active ready running
    | `- 4:0:11:108 sdbsk 68:1792  active ready running
    `-+- policy='round-robin 0' prio=10 status=enabled
      |- 1:0:17:108 sdbsx 68:2000  active ready running
      |- 4:0:15:108 sdbsm 68:1824  active ready running
      |- 1:0:16:108 sdbsc 67:1920  active ready running
      `- 4:0:16:108 sdbso 68:1856  active ready running
    
    mpathc (3600ccccccccccccccccccccccccccccc) dm-346 IBM,2145
    size=480G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='round-robin 0' prio=50 status=active
    | |- 1:0:11:2   sdbgj 8:1648   active ready running
    | |- 4:0:10:2   sdbqp 65:1808  active ready running
    | |- 1:0:8:2    sdbqg 8:1920   active ready running
    | `- 4:0:9:2    sdbrq 66:1984  active ready running
    `-+- policy='round-robin 0' prio=10 status=enabled
      |- 1:0:14:2   sdbgs 65:1536  active ready running
      |- 4:0:12:2   sdbqy 65:1952  active ready running
      |- 1:0:15:2   sdbpx 135:1776 active ready running
      `- 4:0:17:2   sdbrh 66:1840  active ready running
    
  • After the reconfiguration from Storage side, multipath -ll command was showing paths from above 2 devices clubbed together. This had resulted in IO on incorrect devices and the database was crashed:

    mpathc (3600ccccccccccccccccccccccccccccc) dm-346 IBM,2145
    size=480G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='round-robin 0' prio=50 status=active
    | |- 1:0:11:2   sdbgj 8:1648   active ready running
    | |- 4:0:10:2   sdbqp 65:1808  active ready running
    | |- 1:0:8:2    sdbqg 8:1920   active ready running
    | |- 4:0:9:2    sdbrq 66:1984  active ready running
    | |- 1:0:10:108 sdbsa 67:1888  active ready running  <---problem
    | |- 4:0:8:108  sdbsi 67:2016  active ready running  <---problem
    | |- 1:0:9:108  sdbse 67:1952  active ready running  <---problem
    | `- 4:0:11:108 sdbsk 68:1792  active ready running  <---problem
    `-+- policy='round-robin 0' prio=10 status=enabled
      |- 1:0:14:2   sdbgs 65:1536  active ready running
      |- 4:0:12:2   sdbqy 65:1952  active ready running
      |- 1:0:15:2   sdbpx 135:1776 active ready running
      |- 4:0:17:2   sdbrh 66:1840  active ready running
      |- 1:0:16:108 sdbsc 67:1920  active ready running  <---problem
      |- 4:0:15:108 sdbsm 68:1824  active ready running  <---problem
      `- 4:0:16:108 sdbso 68:1856  active ready running  <---problem
    
  • This issue was fixed after flushing the affected multipath devices with following command and then re-scanning it:

    $ multipath -f <multipath-device-name>
    $ multipath -v2
    $ multipath -ll
    

Environment

  • Red Hat Enterprise Linux 6, 7, 8
  • device-mapper-multipath

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In