RHEVH ends up with network interface named breth0, not rhevm

Latest response

I've been attempting to get a CLEAN install of rhevm and a rhevh host and have run into a problem that seems to contradict the nice little videos:

* Red Hat Enterprise Virtualization 3.1 - Installing RHEV Manager :: https://access.redhat.com/knowledge/videos/269993 [1]
* Red Hat Enterprise Virtualization 3.1 - Installing RHEV Hypervisor :: https://access.redhat.com/knowledge/videos/270043 [2]
* Red Hat Enterprise Virtualization 3.1 - Connecting to RHEV Manager via Web Admin Portal :: https://access.redhat.com/knowledge/videos/270093 [3]
* Red Hat Enterprise Virtualization 3.1 - Approving RHEV Hypervisors :: https://access.redhat.com/knowledge/videos/270113 [4]

I install rhevh just fine and in exactly the same way as in video 2.

When I get to logging into the rhevm host (video 3),  my host does not appear.  When I add it manually, it seems to work and then sets the node as non-operational.  When I investigate further, I can see that something different -  my rhevh interface is set to breth0, not rhevm,
the management interface.   If I then 'fiddle' with the rhevm gui,  I can then change the interface from breth0 to rhevm. and eventually it will go green..

My questions are:

 

  1. What prevented me from adding this cleanly?  What do I check?  dns works, rhevm and rhevh can resolve each other and ping each other.
  2. What logs do look at ?  engine.log?  ovirt.log?  
  3. How do you CLEANLY recover from this?  I am rather tired of running rhevm-cleanup/setup and reinstalling the host every time this happens

Rick

Responses

In engine.log I can see:

2012-12-26 16:05:24,894 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-27) [4f519a0a] Host bl460g6-tux12 is set to Non-Operational, it is missing the following networks: rhevm
 

After trying to approve, this pops up:

Error while executing action: Cannot approve RHEV Hypervisor Host.
-Host must be in "Pending Approval" or "Install Failed" status in order to be approved.

Grr. 

After setting the machine into maintenance and then going to the Setup Host Networks, changing the network to rhevm, I now get rhevm 'spinning' and these messages in the engine.log:

S
2012-12-26 16:08:00,082 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetupNetworksVDSCommand] (ajp-/127.0.0.1:8702-7) [3638457e] START, SetupNetworksVDSCommand(HostName = bl460g6-tux12, HostId = d90005b2-4f9f-11e2-b13b-525400d12530, force=false, checkConnectivity=true, conectivityTimeout=120,
        networks=[rhevm {id=00000000-0000-0000-0000-000000000009, description=Management Network, subnet=null, gateway=null, type=null, vlan_id=null, stp=false, storage_pool_id=58ce52b2-4f96-11e2-898e-525400d12530, mtu=0, vmNetwork=true, cluster=network_cluster {id={clusterId=null, networkId=null}, status=Operational, is_display=false, required=true}}],
        bonds=[],
        interfaces=[eth0 {id=2c8c3f7e-133a-446d-b0fa-3913785a874b, vdsId=d90005b2-4f9f-11e2-b13b-525400d12530, name=eth0, macAddress=18:a9:05:45:17:20, networkName=rhevm, bondName=null, bootProtocol=null, address=null, subnet=null, gateway=null, mtu=0, bridged=true, speed=10000, type=2, networkImplementationDetails=null},
                eth6 {id=dbbaaa4e-5044-4490-a681-2026e8821ae6, vdsId=d90005b2-4f9f-11e2-b13b-525400d12530, name=eth6, macAddress=18:a9:05:45:17:23, networkName=null, bondName=null, bootProtocol=None, address=, subnet=, gateway=null, mtu=1500, bridged=false, speed=0, type=0, networkImplementationDetails=null},
                eth5 {id=fa6f2632-0104-487e-be01-8beac6575cf0, vdsId=d90005b2-4f9f-11e2-b13b-525400d12530, name=eth5, macAddress=18:a9:05:45:17:26, networkName=null, bondName=null, bootProtocol=None, address=, subnet=, gateway=null, mtu=1500, bridged=false, speed=0, type=0, networkImplementationDetails=null},
                eth7 {id=2b5abf37-d566-4b62-9a6f-37fb5cfea9a1, vdsId=d90005b2-4f9f-11e2-b13b-525400d12530, name=eth7, macAddress=18:a9:05:45:17:27, networkName=null, bondName=null, bootProtocol=None, address=, subnet=, gateway=null, mtu=1500, bridged=false, speed=0, type=0, networkImplementationDetails=null},
                eth2 {id=e6ba3bdb-b7c4-422d-99dc-1cadae065014, vdsId=d90005b2-4f9f-11e2-b13b-525400d12530, name=eth2, macAddress=18:a9:05:45:17:21, networkName=null, bondName=null, bootProtocol=None, address=, subnet=, gateway=null, mtu=1500, bridged=false, speed=0, type=0, networkImplementationDetails=null},
                eth1 {id=546783dc-efaa-4551-a009-47c6c1578ace, vdsId=d90005b2-4f9f-11e2-b13b-525400d12530, name=eth1, macAddress=18:a9:05:45:17:24, networkName=null, bondName=null, bootProtocol=None, address=, subnet=, gateway=null, mtu=1500, bridged=false, speed=0, type=0, networkImplementationDetails=null},
                eth4 {id=5831f568-7070-419f-afc3-b45d5daca4a2, vdsId=d90005b2-4f9f-11e2-b13b-525400d12530, name=eth4, macAddress=18:a9:05:45:17:22, networkName=null, bondName=null, bootProtocol=None, address=, subnet=, gateway=null, mtu=1500, bridged=false, speed=0, type=0, networkImplementationDetails=null},
                eth3 {id=4c551391-7100-4ac0-aa73-94738c0beb9a, vdsId=d90005b2-4f9f-11e2-b13b-525400d12530, name=eth3, macAddress=18:a9:05:45:17:25, networkName=null, bondName=null, bootProtocol=None, address=, subnet=, gateway=null, mtu=1500, bridged=false, speed=0, type=0, networkImplementationDetails=null}],
        removedNetworks=[breth0],
        removedBonds=[]), log id: 8abd15c
2012-12-26 16:08:00,083 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetupNetworksVDSCommand] (ajp-/127.0.0.1:8702-7) [3638457e] FINISH, SetupNetworksVDSCommand, log id: 8abd15c
2012-12-26 16:08:03,643 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8702-7) [3638457e] Timeout waiting for VDSM response. java.util.concurrent.TimeoutException
 

That last error repeats over and over again... Finally rhevm puts up a dialog:

Error while executing action Setup Networks: Could not connect to peer host
 
Errors then appear:
 
2012-12-26 16:10:20,082 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8702-7) [3638457e] Error code noConPeer and error message VDSGenericException: VDSErrorException: Failed to SetupNetworksVDS, error = connectivity check failed
2012-12-26 16:10:20,082 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (ajp-/127.0.0.1:8702-7) [3638457e] org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to SetupNetworksVDS, error = connectivity check failed
2012-12-26 16:10:20,082 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (ajp-/127.0.0.1:8702-7) [3638457e] Command SetupNetworksVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to SetupNetworksVDS, error = connectivity check failed
2012-12-26 16:10:20,082 ERROR [org.ovirt.engine.core.bll.SetupNetworksCommand] (ajp-/127.0.0.1:8702-7) [3638457e] Command org.ovirt.engine.core.bll.SetupNetworksCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to SetupNetworksVDS, error = connectivity check failed
2012-12-26 16:10:20,089 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) [3638457e] No string for UNASSIGNED type. Use default Log
 
I can see on the rhevh host that we now have interface named rhevm.  I can ping and resolv names.  What else is there???
 
This is rather horrible on several levels:
 
- the end user is given virtual no information about the nature of the failure, nor how to resolve it
 
- the logs are rather meaningless
 
Rick

 

 

 

 

Hi Rick,

 

In order to follow the flow through properly, we need to take it back to the top, i.e. find nout why the host, after you followed the videos, did not appear in the RHEV-M admin UI. There might be several different causes, but they usually pretty much add up to misconfigured networking.

RHEV-H must be able to resolve the RHEV-M via DNS (both A and PTR records) and vice versa. Besides, RHEv-H, in order to register itself with RHEV-M must be able to reach it's http and https ports. 

Also make sure their time/date are in sync (this can cause serious issues with SSL if not).

 

So, in order to properly investigate this, I suggest you start from scratch - remove the host, wipe the disks, install a fresh RHEV-H, and follow the video through, step by step.

At the end, you should see the host in RHEV-M's UI in the "avaiting approval" status.

 

 

The 'breth' issue is not really an issue, since the proper approve/register flow renames it ro rhevm automatically

You mention that the network name of rhevm is not an issue but this video:

* Red Hat Enterprise Virtualization 3.1 - Installing RHEV Hypervisor :: https://access.redhat.com/knowledge/videos/270043 [2]

Shows the interface name as rhevm immediately after installation.  That doesn't happen for me.

I'm starting over from scratch.  The RHEVH is going to be a VM on the same physical host as the RHEVM (which is a guest).  This is
what was done in the video, since you can spot the use of /dev/vda.  

This product seems remarkably unforgiving in the setup stage.  I don't recall the last time that I had to work with a product where installation errors required the deinstallation and cleanup of so many moving parts. 

Ok... appears that the BETA rhev-h iso I was using seems to be the problem.  I had:

rhevh-6.4-20121126.0.iso

The one from the default 6.3 channel:

rhevh-6.3-20121212.0.iso

Got me past that point. Now the network comes up as rhevm upon registration and we go green...

 

However... I can't seem to do anything with storage.   When I attempt to configure local storage I get a popup that says:

Error while executing action New Local Storage Domain: Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).

 

... so I bring up the interface again and notice that the storage path is pre-populated with '/data/images/rhev' and is grayed-out and unselectable. 

Please tell me there is some way to correct this without reinstalling RHEV H ... again... :( It is about a 30min task because of my infrastructure.

 

 

 

 

 

...nevermind. 

ssh'd into the RHEV-H host, cd /data/images/rhev and found a directory with a big long name.  Blew it away and a new one was recreated.

What is the point of keeping RHEV from reinstalling itself?  It seems that this needs some good override switches....

RHEV-H is basically a locked down, installable live-cd image. Installation takes a few minutes and it is usually easier to just reinstall than troubleshoot (that's the point after all). 

As for local storage, RHEV tries not to blow existing storage domains away, and if it finds a UUID-like directory under the location you pointed to, it will abort, so as not to wipe another storage domain.

It might be, that your initial SD creation failed, leaving an already created directory structure behind (worth investigating why, so it won't happen again), and so when you tried to create a local SD again, the process was aborted by the safety mechanism, which is quite normal.

I would suggest you really start off with RHEV on multiple hosts and centralised storage, otherwise, lots of the features RHEV provides will remain unavailable

Yes, I understand that RHEVH should be able to be deployed quickly,  but it still seems odd that there is no way from the RHEV-H host admin that you cannot reset that host to a 'known, clean state'.  An option that would clear out network settings, local disk storage and the like would seem to be something that could be very useful, for example when you want to register a RHEV H to a new RHEVM.

Not all deployments are as quick as you think.  In some environments, like mine, a full install of RHEVH takes over 30 minutes from boot to finish.  Are there ways to improve this?  No doubt, but in my experience customers tend to have even more outlandish environments than my puny lab. ;)