This may explain some of the more weird problems we've seen in this forum

Latest response

I am pretty much recovered from my jet-lag from my all nighter last Friday. It was a RHEV install and everything I touched turned to, well, smelly stuff.  We had a RHEV-M server and several freshly installed RHEL hosts with RHEV on top.  We eventually got everthing working, but it was not pretty. 

I raised support cases for all of the issues we uncovered - contact me privately for the case numbers if interested.  

rhevm-setup blows up with an ugly error
Setup RHEV on RHEL fails setting up the rhemv bridge device
rhevm-manage-domains blows up with bizarre errors

How to configure bonding for RHEV

In the first 3 cases, the built-in setup scripts just failed for no good reason, generally with file not found errors from deep dark routines buried in other routines that the setup scripts called.  Sometimes they were Java scripts, sometimes Python scripts, all blew up with ugly errors and stack traces.  It was as if we were debugging unfinished products. 

The bonding case is in a different category and I'll address that one in a reply to this post.

In rhevm-setup case, the problem turned out to be an aliased ls command, alias ls='ls -AF'

In the rhevm bridge and rhevm-manage-domains cases, the problem turned out to be custom scripts this site installs in /usr/local.  Rebuilding the RHEV-M server with a completely virgin RHEL installation cured those problems.  

We also ran into trouble setting up our RHEV on RHEL hosts because this site has a policy to turn off ssh root access.  The RHEV documentation is pretty clear that root access via ssh should be turned on, but this violates their security policies.  If configured by the book, those RHEL hosts will fail the next security audit and this will trigger compliance issues.  The current workaround - set up ssh to support root access if he key matches the RHEV-M server.  This turned out to be a pain when we had to rebuild the RHEV-M server from scratch and it had a different ssh key.  

The lesson from all this?  If in the field, at least for now, make sure your RHEL setups are virgin if you're putting RHEV-M or RHEV-H on RHEL on top.  Save yourself hours and hours and hours of frustration.  

For the developers - these kind of issues ***must*** be addressed more robustly.  Enterprise customers with hundreds of servers will have security and auditing policies in place and will expect their RHEL systems to conform to their security and auditing policies.  After all, the reason for doing RHEV-H on top of RHEL, instead of RHEV-H appliances, is so they can be tailored.  This tailoring may mean certain agents installed in /usr/local, it may mean customized Java setups, it may mean aliased shell commands.  Forcing these systems to be virgin takes away a potential significant advantage for Red Hat.

So that's my feature request this week - make both the RHEV-M and RHEV-H on top of RHEL installations more robust, so the installations can run to completion on tailored systems.  Or set up some scripting to check and make sure these systems meet the appropriate prequisites.  

For example - if an installation or setup script depends on a shell command, make sure that shell command is not aliased so it does what you expect.  If your setup script depends on a particular version of Java or some other component or script, launch it using the full pathname so /usr/local does not get in the way.  And for the root access via ssh issue - surely the RHEV install could set up some other account besides root and use this other account for its communicating, right?


- Greg Scott


For the RHEV bonding issue -

All our hosts had 4 NICs.  NICs eth0 and eth1 were to be bonded and connect to the rhevm network.  NICs eth2 and eth3 were to bond with each other and connect to 40-50 VLAN tagged logical networks.  The idea is, the guest VMs will connect to various VLANs, so the bonded NICs needed to be part of all VLANs.  

In this case, the rhevm network is **not** VLAN tagged, but the other bond with the other 2 NICs **is** VLAN tagged.  

The interface to make all this work is quirky and took some trial and error to get it going.  Our hosts are RHEV on RHEL, so the details may be different using RHEV-H appliances.

First, on the RHEL hosts, edit all the /etc/sysconfig/network-scripts/ifcfg-eth{0,1,2,3}| files and make them as simple as possible.  For example:


For eth0 or whichever NIC is already part of the rhevm bridge - make sure it also has a line, 


which should already be in place from when you set up RHEV-H on that host.

So now, with all the NICs minimally configured, the next step is setting up the bonds.  Unfortunately, if you just drag and drop the little boxes in the RHEV-M GUI, it will blow up with this error:

A slave interface is not properly configured.  Please verify slaves do not contain any of the following properties:  network name, boot protocol, IP Address, netmask, gateway, or VLAN ID notation (as part of the interface's name or explicitly).

Wonderful - the slave interfaces do not have any of those characteristics, yet RHEV-M says they do.  I think the reason is, the RHEV-M database still "thinks" they are configured the old way before editing the ifcfg- files by hand.

The workaround?  Create all your VLAN tagged logical networks.  Temporarily assign a random VLAN tagged network to each physical NIC and save network changes.  This apparently updates the RHEM-M database.  Now go back in, remove the VLAN tagged logical networks from your NICs and set up your bonds and save changes.  Now your bonds should be set.  Now add your VLAN tagged logical networks to your bonds and save changes.  

- Greg Scott

I normally doing the following things BEFORE I attach the RHEL host to RHEV-M (same setup as yours 4 NIC, two bonds, one for data with vlan tagging and second for rhevm):

* configure one of the interfaces from the rhevm bond with the ip address from the rhevm network range

* delete all the other ifcfg files from /etc/sysconfig/network-scripts dir


Then you can attach the host and later configure the network through the GUI without any problem.