iSCSI Multipathing and RHEV 3.0
I was pounding my head against the wall trying to figure out why my fully redundant iSCSI storage networking systems were not showing multiple paths in the LUN/Volume assignment panel when creating a new storage domain.
I then had the pleasure of speaking with one of the RHEV experts at Red Hat. They hit me with the stupid stick on many conceptual fronts, and I've had lots of "AHA, Eureka!" moments that I thought I'd share, in case there are other people having the same challenges I did.
RHEV is, by nature, configured for redundancy on all fronts because the Storage Pool Management role can migrate from one RHEV host to another. This will protect you from a single link failure at any point in the SAN up to and including full RHEV Host failure. In the case of such a failure, the storage pool manager (SPM) role will be migrated to another host in the cluster, and as soon as storage connection is verified, the virtual machines running on the failing host will be migrated to the functional host by RHEVM. That's pretty cool in itself, but wait, there's more!
If you have the infrastructure for fully cross connected multipathing, you get even greater redundancy. The RHEV expert helped me to implement Mode 1 port bonding on each of my RHEV hosts, or Active/Backup mode. This will detect link failure on either interface, and fail over to the working interface...this protects me from multiple failures, end to end, front to back. My storage controller has redundant active/passive controllers, so if one of the controllers fails, the passive one will take over. If one of the iSCSI fabric switches fails, the port bonding will detect link failure and fail over to the other half of the bond pair. Combine this with the already robust failover in the RHEV clustered host configuration, and you really have to have a major disaster to lose connectivity to your iSCSI storage SAN.
I certainly don't want to steal Red Hat's thunder, so if you are interested in the details of how to configure this kind of setup, feel free to post a comment, but I have a feeling there will be some KB articles forthcoming from Red Hat on this that will be far more thorough than what I might write.
Best regards,
Kermit Short
OK I couldn't figure out how to add an image to a comment, so I'll add it here. Please refer to my comment below for the explanation of all this spaghetti.
Responses
What fencing device are you using? That by itself can be a single point of failure. For example, if you use iLO fencing; and the iLO board malfunctions or loses power on a single server, the VMs of that server will not start on the other hypervisors and would require manual intervention. I'm just waiting for the day when oVirt/RHEV would support backup fencing devices & SCSI reservation fencing.
I'd recommend having a good recovery strategy if your RHEV-M server fails as well. However, when it fails, you'd eventually have to have (a planned) downtime on all your VMs simultaneously:
https://access.redhat.com/knowledge/solutions/176493
If you cannot afford downtime on all your VMs, you'd have to take the time and money (=more addons, HBAs and an extra server) to make the manager highly available as well.
With these points in mind; I feel like RHEV may be the least highly available virtualization product. I wish they made the manager an appliance (thus having automatic failover), had the critical VM information on the SAN rather than kept locally, and made the hypervisors a bit more smart so that it can be controlled invidually and handle HA by themselves rather than depend on one physical machine to give them directions. If they did all that, you'd only need 2 physical machines for a fully redundant virtualization solution, rather than the minimum of 4 (2 managers & 2 hypervisors) needed now.
RE: Rizvi -
> I wish they made the manager an appliance (thus having automatic failover), had the
> critical VM information on the SAN rather than kept locally, ... If they did all that, you'd
> only need 2 physical machines for a fully redundant virtualization solution, rather than the
> minimum of 4 (2 managers & 2 hypervisors) needed now.
Note you can get close to the goal of 2 physical machines by setting up a RHEL host, carving out some SAN storage, and setting up your RHEV-M server as a RHEL virtual machine using virt-manager. That gets you to 3 physical machines plus a SAN or other redundant shared storage.
If the RHEL host with the RHEV-M VM dies, yes, RHEV-M is offline. But all your VMs will continue to run and you can quickly build up a new RHEL host, connect to the shared storage, provision a new VM with your RHEV-M virtual disks, and fire up your RHEV-M VM again. You don't need to drop your VMs or anything disruptive like that to recover.
This seems like a good compromise between a fully redundant RHEV-M deployment with 2 RHEV-M clustered hosts, and a risky RHEV-M deployment with everything on local bare metal. And I can report first-hand that it works.
- Greg
Kermit, for iSCSI on RHEV, I would rather recommend the MPIO approach. You set as many TCP/IP Subnets as the number of iSCSI Controller Ports you have - on the storage subsystem side and on the RHEV Hosts. The auto-discovery is then able to see all the paths when only pointed to one.
The multipath daemon takes care of failed paths seamlessly. Secondly, if you tinker with your multipath configuration on the RHEV Hosts, you can fine-tune the multipathing policy to your needs.
Let's say you have multiple ports per iSCSI Controller, you would assign IPs from unique subnets to each (where each subnet is exclusive for iSCSI use), and IPs in the same subnets to corresponding ports on the RHEV Hosts. Persist the multipath.conf file on RHEV Hosts (since otherwise they are a readonly image and the contents of /config/ get superimposed after boot), and customise that file.
Example:
iSCSI Controller A, Port 1: 192.168.255.1/24
iSCSI Controller A, Port 2: 192.168.254.1/24
RHEV Host 1, iSCSI Port 1: 192.168.255.101/24
RHEV Host 1, iSCSI Port 2: 192.168.254.101/24
RHEV Host 2, iSCSI Port 1: 192.168.255.102/24
RHEV Host 2, iSCSI Port 2: 192.168.254.102/24
root@rhevh-01# persist /etc/multipath.conf
Then add...
path_grouping_policy multibus
rr_min_io_rq 2
...to that file. Put the Host in Maintenance Mode and reboot it. Same for all the Hosts.
So long as the iSCSI traffic is multipathed to the _same_ controller when using the multibus path grouping policy, you will achieve an aggregated throughput! For multiple controllers, I only have experience with IBM ALUA concepts which may not apply generically. However, as long as your target multiple ports on the same controller, this approach will give you aggregated throughput with failover. On multiple controllers without Active-Active configuration possibilities or drivers in RHEL, this approach will still give you failover, and aggregate as much as possible to the primary controller - depending on your topology.
Rizvi,
We have been running our own out of band fencing service which uses a ping heartbeat (across many subnets) for all the Hosts. Upon detecting a failure, the script Fences the Hosts using the RHEV-M API: https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Virtualization/3.0/html-single/REST_API_Guide/index.html#sect-REST_API_Guide-Hosts-Fencing_Action .
The solution was designed to accommodate machines without any iLO, RSA, IPMI or any sort of out of band management at all. Maybe this approach can complement the primary, RHEV Manager GUI supported, fencing capability.
Forgot to add that using Jumbo Frames helps. In fact, the rr_min_io_rq value of 2 is optimised for that.
Instead of that, can i set up MPIO directly from Rhev-m ? We have 2 Hypervisors with 2 ethernet interfaces each assigned for storage and we can't get to connect them at the same time to the storage domain via the manager; Has someone of you the recipe for that ? and what about if ip addresses must be in same subnet for a storage with active-pasive controller ( like DELL ecualogic); is there any way to configure the interfaces (hosts and storage controllers) in same subnet for all storage ? in order to have 4 paths from each host to storage.
Thanks for your answer.... what about to setup 2 phisical interfaces in each hypervisor using MPIO directly from RHEV manager ? is that posible ? ok it doesn´t matter if the address are in diferents subnets , do someone have the recipe ? some link to a doc ?? ... thanks in advance.
Yes you can. Define two logical storage networks within rhev-m and assign each one to a nic within the same subnet. E.q. on 1 hypervisor with dm multipath:
em1 192.168.1.1/24 storage1 (switch 1)
em2 192.168.1.21/24 storage2 (switch 2)
I have tested this configuration (no downtime), it works with DELL equalogic. On the public site switches, I use a bond with em3 and em4.
By the way I also have rhev-m virtualized on a rhel server with a vdisk on the SAN. I have 1 server standby. So I can startup rhevm immediately on the other one if the KVM host dies. Disadvantage: I can start the rhev-m VM on both machines by accident (need to find a lock possibility soon). Live migration of rhevm works to. Would be nice if the rehv-m VM can run on a rhev hypervisor. But that sounds like a chicken and egg situation.
Hello
I have 1 Equallogic connected to 4 RHEV and present 2 LU, but in the connection only appears 1 connection and I was setup 2 interfaces. Is possible to configure the multipath who a red hat (multipath.conf) (iface), etc.
Regards
SAN
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
