Why are underlying paths of a multipath device not coming online after the path is recovered?

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 5 , 6
  • device-mapper-multipath

Issue

  • Why underlying paths of multipath device are not coming online after path is recovered?
  • Why is there a lock on the disk and why don't recover the paths ?
  • Paths coming online only after multipath -v3 command execution, why?

Resolution

  • Start the multipath daemon using:
    # service multipathd start
    
  • To make this change permanent and start multipath at boot time, perform the below steps:
  1. check if the multipath service at boot time
    # chkconfig --list multipathd
    multipathd          0:Off     1:Off     2:Off     3:Off     4:Off     5:Off     6:Off
    
  2. enable the multipath service at boot time
    # chkconfig multipathd on
    
  3. check if the multipath service now is enabled to start at boot time
    # chkconfig --list multipathd
    multipathd          0:Off     1:Off     2:On      3:On      4:On      5:On      6:Off
    

Root Cause

Multipath daemon is in charge of checking for failed paths. When this happens, it will reconfigure the multipath map the path belongs to, so that this map regains its maximum performance and redundancy.

This daemon executes the external multipath config tool when events occur. In  turn,  the  multipath  tool signals  the  multipathd  daemon  when it  is done with devmap reconfiguration, so that it can refresh it's failed path list.

Diagnostic Steps

An example of the output that can be seen with multipath -ll when multipathd is not running after failing an restoring the paths looks like this:

mpath0 (3600508abcdef137e0001b09876543210) dm-7 HP,HSV100
[size=35G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:0:14 sdab 65:176 [failed][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 0:0:1:14 sdah 66:16  [failed][ghost]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:0:14 sdd  8:48   [failed][ready]
\_ round-robin 0 [prio=0][enabled]
 \_ 1:0:1:14 sdj  8:144  [failed][ghost]

Note that the path-checker returns a [failed] but the hardware (link) is marked as [ready].

On RHEL6, if a path is not available, the scsi device will be normally removed.
In case multipath daemon is not running, you could see a output like this:

LUN1 (36005076801900319f000000000000123) dm-17 IBM,2145
size=30G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 2:0:1:2  sdds 71:160  active ready running
  |- 1:0:1:2  sddr 71:144  active ready running
  |- #:#:#:#  -    #:#     failed faulty running
  `- #:#:#:#  -    #:#     failed faulty running

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.