Command GetCapabilitiesVDS execution failed. Error: VDSRecoveringException: Failed to initialize storage
I have a functioning RHEV 3.1 environment with RHEVM on a 6.3 virtual machine and a RHEV-H 6.3 host. I initially had local storage but have since configured the environment to use iscsi. I can deploy virtual machines to the one RHEV H host.
I tried to bring in a RHEL 6.3 Linux host to the environment but it fails every time after the first boot. RHEVM starts spewing into engine.log:
2013-01-14 12:33:02,128 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-58) Command GetCapabilitiesVDS execution failed. Error: VDSRecoveringException: Failed to initialize storage
2013-01-14 12:33:02,346 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-49) [195801e5] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: eb737b9a-5e7f-11e2-bee3-525400d12530 Type: VDS
At this point, the new host is placed in NON OPERATIONAL state and I can't do much with it until I stop vdsmd or reboot the node. Then my only options are to reinstall or remove the node.
The Linux host's vdsm log file appears to complain only about being unable to run sudo without a tty in order to start ksmtuned, despite that being started automatically.
I even get this trying to add the node to the default DC.
Both nodes are in the same subnet - in fact both are nodes in the same blade enclosure.
I have not presented iscsi storage to the non-working node at this point.... but why should it matter?
Firewall is completely off.
Any ideas? I'm stumped.
Rick
Responses
I had run into this issue as well, using the same method (attempting to add a RHEL 6.3 host to RHEV).
I would update /etc/sudoers
# grep vdsm /etc/sudoers
vdsm ALL=(ALL) NOPASSWD: ALL
Also - I had to manually configure my multipath configuration - but I won't steer you down that path unless the VDSM work-around fails.
Leaving the sudoers config alone for now, to address the initial post... What you report is quite normal - you add a host to a datacenter/cluster, where the host is supposed to be able to access certain storage LUNs. If the host cannot access that storage space, it cannot run VMs that reside on that storage space, and thus cannot operate in the cluster. This is why it is called "non operational" .
Once you enable this host's access to the iSCSI SAN (set the initiator ID in the target ACLs I suppose), you can try and activate this host again, RHEV will try to connect it to the SAN and once that works, the host will change it's status to "up".
There is no need to reinstall a host that goes non-operational, all you need to do is look at why it has this status in the events, and address the reason. Once the reson is resolved, just activate the host again
Now, about the sudoers config, RHEV does require some specific settings and UIDs/GIDs. If you are using customized RHEL machines as RHEL hosts, there might be conflicts, and in order to see what settings are required, it's easiest to set up a clean RHEL 6 basic server installation, and add it to RHEV-M. The host will have all the right settings, and you will be able to compare the way it gets set up to what you already have
Hello,
I am haing the exact same problem. I know my storage is presented to the server as I have tested a manual mount of the nfs storage and it works.
We are having the mentioned sudoers issue as we maintain our sudo file with puppet. I do have issue suggesting we do a base install and compair the differences, the configuration should be documented some where so we can tell what is required on a RHEL host. A customer should not need to do a full OS compare to get a solution. We highly customize our base installs for both functionality and secuirty so I need to understand the requirements. Are all the requirements/changes needed for RHEV listed some where?
-chrisl