Device-mapper-multipath can incorrectly re-enables dead paths on certain EMC Clariion SAN's with Red Hat Enterprise Linux
Issue
Multipathd incorrectly re-enables paths to a LUN on an EMC clariion array when the LUN itself has failed on the array (ie, the physical paths between the host and the array are fine, but all attempts to do IO to the LUN fail because it is offline due to backend disk failures).
While performing I/O to one of the above LUNs, the backend disk are physically removed from the array to simulate a disk failure. Rather than the I/O failing back to the application level, it hangs forever; this is because although the Red Hat kernel multipath driver correctly fails the paths, the multipathd daemon keeps re-enabling the paths; it appears it can't tell the difference between a passive path to a working LUN, and a path to a failed LUN.
The idea is that in the event of failure of a disk, the IO should eventually fail at the application level so the application can know the disk has failed and take appropriate action (the application in this instance being Oracle ASM).
Environment
- Red Hat Enterprise Linux (RHEL); including
- Red Hat Enterprise Linux 5 kernels before kernel-2.6.18-274.el5
- Red Hat Enterprise Linux 6 kernels before kernel-2.6.32-220.el6
-
Device-mapper-multipath (any version)
- Configured with
queue_if_no_pathorno_path_retry> 1 - Using emc_clariion path checker.
- Configured with
-
EMC Clariion SAN
- Physically removing a disk from SAN without preparing SAN first (simulating hard disk failure)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.