DM-multipath paths take a long time to fail when my Netapp Metrocluster SAN fails over on Red Hat Enterprise Linux 5
Issue
- I have observed that the Multipath driver never check paths in parallel, all paths were checked in serial. Is there any way to change this behaviour? Ideally we would check all multipath paths simultaneously.
Environment
- Red Hat Enterprise Linux Server 5 (with the High Availability or Resilient Storage Add Ons)
-
Red Hat High Availability Cluster with 2 or more nodes
- Stretch cluster with at least one cluster node at two different sites.
-
rgmanager to manage cluster resources
- Service has filesystem resources (fs.sh) that monitor whether filesystems are readable/writeable.
self_fence="1"is configured.
- Service has filesystem resources (fs.sh) that monitor whether filesystems are readable/writeable.
-
EMC Netapp Metrocluster SAN located at 2 sites.
- One controller is the primary controller, and the other controller acts as the backup controller.
- Multiple paths can be seen from each controller simultaneously, but only paths from the active controller are "up" (in this example, there are 4 paths from each controller):
mpath1 (360a9800044316b2d422b443962396a5a) dm-1 NETAPP,LUN [size=10G][features=0][hwhandler=1 alua][rw] \_ round-robin 0 [prio=0][active] \_ 0:0:0:1 sdb 8:16 [active][undef] \_ 0:0:2:1 sdf 8:80 [active][undef] \_ 1:0:0:1 sdk 8:160 [active][undef] \_ 1:0:2:1 sdp 8:240 [active][undef] \_ round-robin 0 [prio=0][enabled] \_ 0:0:1:1 sdd 8:48 [failed][undef] \_ 0:0:3:1 sdi 8:128 [failed][undef] \_ 1:0:1:1 sdm 8:192 [failed][undef] \_ 1:0:3:1 sds 65:32 [failed][undef] -
Device-mapper-multipath
- Paths are configured with
features="0"and"no_path_retry=fail. - When the ISL link fails between Netapp Metrocluster SAN's, the multipathd path checkers do not return immediately and do not fail until they time out.
- Each path checker is executed sequentially so the first must time out before the second can start, and the second must time out before the third can start, etc.
- As a result, it takes (no_of_paths x scsi_timeout) seconds for device-mapper-multipath to determine that all paths have failed and fail the outstanding I/O (which is hung in the meantime, despite setting no_path_retry=fail).
checker_timeoutis not configured in /etc/multipath.conf, and the defaultscsi_timeoutis set to 60 seconds.
- Paths are configured with
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
