Correct setup for multipath on RHEL 5.7 with EMC Clariion SAN / ALUA mode
Hi,
I'm building a set of RHEL 5.7 servers on Cisco UCS hardware with an EMC SAN (ALUA/type 4 mode).
With default multipathd config I experience I/O errors when accessing the passive/low prirority paths directly.
However if I disable the "healthy" paths access to the multipath device works as expected.
I have tried various configuration settings based on the following documents:
- https://bugzilla.redhat.com/show_bug.cgi?id=482737
- https://access.redhat.com/kb/docs/DOC-47889
- https://access.redhat.com/kb/docs/DOC-48959
This is my current configuration:
device {vendor "DGC "product ".*"path_grouping_policy group_by_priogetuid_callout "/sbin/scsi_id -g -u -s /block/%n"prio_callout "/sbin/mpath_prio_emc /dev/%n"path_checker emc_clariionpath_selector "round-robin 0"features "1 queue_if_no_path"no_path_retry 300hardware_handler "1 alua"failback immediate}
mpath2 (36006016085102d00be0d0344ebdde011) dm-2 DGC,VRAID[size=20G][features=1 queue_if_no_path][hwhandler=1 emc][rw]\_ round-robin 0 [prio=2][active]\_ 0:0:1:16 sdi 8:128 [active][ready]\_ 1:0:1:16 sdo 8:224 [active][ready]\_ round-robin 0 [prio=0][enabled]\_ 0:0:0:16 sdd 8:48 [active][ready]\_ 1:0:0:16 sdk 8:160 [active][ready]
Responses
CLARiiONs, even with ALUA, aren't real "active/active" arrays. So, that begs the question, "why are you trying to directly-access the passive paths"? Any time you go to the passive paths on a CLARiiON, you create a trespass event. Due to the time it takes to trespass the LUNs to the new storage processor (and that the trespass operation typically causes the passive SP to become the active SP), you'll get errors.
There's any number of reasons to see "diagnostic messages" on boot. I tend to call them "diagnostic messages" rather than "errors" because there's any number of messages that can be logged that are "normal" even if they show up as errors (e.g., bus-resets associated with LIP events caused by new devices connecting to an active loop; link-up/-down messages on heartbeat networks; etc). Without seeing the full context of the messages, it's hard to tell whether they're of an informational nature or if they're indicative of a real problem. What I would tend to do would be to look for errors once volumes are onlined and filesystems are started (i.e., once application data is being sent across the HBA ↔ SP link). Usually, the multipath daemon (and similar processes) are fairly good at declaring the link down if there's actual error conditions.
Hi Martin,
I can't be 100% sure without first seeing more from your system, but it sounds to me like LVM is scanning the passive paths. By default LVM will scann all block devices; and that includes /dev/sd* as well as /dev/mapper/mpath*. /dev/mpath* will be made up of one or more /dev/sd* devices, which should not be accessed directly. You know this and I know this, but by defualt LVM doesn't know this. I have a feeling setting some LVM filters up will fix this problem for you.
This article shows some examples of how to do this without filtering out your root device. Hopefully putting in this quick change will see the issue go away.
Cheers,
Rohan
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
