This may explain some of the more weird problems we've seen in this forum
I am pretty much recovered from my jet-lag from my all nighter last Friday. It was a RHEV install and everything I touched turned to, well, smelly stuff. We had a RHEV-M server and several freshly installed RHEL hosts with RHEV on top. We eventually got everthing working, but it was not pretty.
I raised support cases for all of the issues we uncovered - contact me privately for the case numbers if interested.
rhevm-setup blows up with an ugly error
Setup RHEV on RHEL fails setting up the rhemv bridge device
rhevm-manage-domains blows up with bizarre errors
How to configure bonding for RHEV
In the first 3 cases, the built-in setup scripts just failed for no good reason, generally with file not found errors from deep dark routines buried in other routines that the setup scripts called. Sometimes they were Java scripts, sometimes Python scripts, all blew up with ugly errors and stack traces. It was as if we were debugging unfinished products.
The bonding case is in a different category and I'll address that one in a reply to this post.
In rhevm-setup case, the problem turned out to be an aliased ls command, alias ls='ls -AF'
In the rhevm bridge and rhevm-manage-domains cases, the problem turned out to be custom scripts this site installs in /usr/local. Rebuilding the RHEV-M server with a completely virgin RHEL installation cured those problems.
We also ran into trouble setting up our RHEV on RHEL hosts because this site has a policy to turn off ssh root access. The RHEV documentation is pretty clear that root access via ssh should be turned on, but this violates their security policies. If configured by the book, those RHEL hosts will fail the next security audit and this will trigger compliance issues. The current workaround - set up ssh to support root access if he key matches the RHEV-M server. This turned out to be a pain when we had to rebuild the RHEV-M server from scratch and it had a different ssh key.
The lesson from all this? If in the field, at least for now, make sure your RHEL setups are virgin if you're putting RHEV-M or RHEV-H on RHEL on top. Save yourself hours and hours and hours of frustration.
For the developers - these kind of issues ***must*** be addressed more robustly. Enterprise customers with hundreds of servers will have security and auditing policies in place and will expect their RHEL systems to conform to their security and auditing policies. After all, the reason for doing RHEV-H on top of RHEL, instead of RHEV-H appliances, is so they can be tailored. This tailoring may mean certain agents installed in /usr/local, it may mean customized Java setups, it may mean aliased shell commands. Forcing these systems to be virgin takes away a potential significant advantage for Red Hat.
So that's my feature request this week - make both the RHEV-M and RHEV-H on top of RHEL installations more robust, so the installations can run to completion on tailored systems. Or set up some scripting to check and make sure these systems meet the appropriate prequisites.
For example - if an installation or setup script depends on a shell command, make sure that shell command is not aliased so it does what you expect. If your setup script depends on a particular version of Java or some other component or script, launch it using the full pathname so /usr/local does not get in the way. And for the root access via ssh issue - surely the RHEV install could set up some other account besides root and use this other account for its communicating, right?
thanks
- Greg Scott