RHEL host creation failure - SSL errors

Latest response


I have a RHEV 3.0.3 installation with a 2.2 compatability mode cluster.  Recently, I tried adding a new host (RHEL 5.8 and 6.2 were attempted) and the following happens:


1. Install a host with RHEL 6.2, update, register, etc.

2. Add new host via RHEVM web ui

3. After a while, in events tab, "Host XXXXX installation failed. Please refer to log files for further details"

4. Host status permanently hangs at "installing"


Additional info:


Vdsmd on the new host crashes immediately after start.  /var/log/vdsm/vdsm.log shows:


MainThread::ERROR::2012-06-13 14:29:56,123::vdsm::74::vds::(run) Traceback (most recent call last):
  File "/usr/share/vdsm//vdsm", line 72, in run
  File "/usr/share/vdsm//vdsm", line 40, in serve_clients
    cif = clientIF.clientIF(log)
  File "/usr/share/vdsm/clientIF.py", line 96, in __init__
    self.server = self._createXMLRPCServer()
  File "/usr/share/vdsm/clientIF.py", line 222, in _createXMLRPCServer
  File "/usr/share/vdsm/SecureXMLRPCServer.py", line 111, in __init__
    ctx.load_cert_chain(certFile, keyFile)
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Context.py", line 100, in load_cert_chain
    m2.ssl_ctx_use_cert_chain(self.ctx, certchainfile)
SSLError: No such file or directory

I noticed there is nothing in the certs directory of /etc/pki/vdsm:


[root@rhev-prod-node6 certs]# cd /etc/pki/vdsm && find .


...whereas my other RHEL hypervisors have a vdsmcert.pem and cacert.pem in their /var/vdsm/ts/certs directory. 


There are also errors in the rhevm log on the RHEV-M host:


2012-06-13 10:45:47,526 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (http- Cannot get vdsManager for vdsid=766000c0-b566-11e1-9114-5452001e1b9d
2012-06-13 10:45:47,527 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (http- Cannot get vdsManager for vdsid=766000c0-b566-11e1-9114-5452001e1b9d
2012-06-13 10:45:47,527 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (http- Cannot get vdsManager for vdsid=766000c0-b566-11e1-9114-5452001e1b9d


https://access.redhat.com/knowledge/solutions/127013  seems to be getting at the same thing, but this is RHEV-H specific and I wasn't able to translate into something helpful.


Is there a certificate that is not being copied to the host?



Certificates are generated after host registration. This sounds like the process fails prior the certificate generation step.


Lets make sure we start at the beginning - can the host and RHEV-M resolve each other both by IP and FQDN? Are their clocks in sync? Is there a firewall between the Management server and the host?

Ok, I double-checked that forward and reverse DNS work on both the manager and host.


There are no host-based or network firewalls in place. 


Clocks were slightly off - set up and restarted ntpd to correct this.


I'm still getting the same behavior though.  In  the RHEV-M web gui, I get "Internal RHEV Manager error (Error code: 5001)".  Here's the tail end of the rhevm.log when this happens:



2012-06-14 12:01:22,815 INFO  [org.ovirt.engine.core.bll.VdsInstaller] (pool-10-thread-47) Installation of x.x.x.x. Executing installation stage. (Stage: Downloading certificate request from Host)
2012-06-14 12:01:22,815 INFO  [org.ovirt.engine.core.utils.hostinstall.MinaInstallWrapper] (pool-10-thread-47) Downloading file /tmp/cert_f3428bf4-f9e3-488b-9052-bd18829473de.req from x.x.x.x to /etc/pki/rhevm/requests/cert_f3428bf4-f9e3-488b-9052-bd18829473de.req
2012-06-14 12:01:23,118 INFO  [org.ovirt.engine.core.bll.VdsInstaller] (pool-10-thread-47) Installation of x.x.x.x. successfully done sftp operation ( Stage: Downloading certificate request from Host)
2012-06-14 12:01:23,118 INFO  [org.ovirt.engine.core.utils.hostinstall.MinaInstallWrapper] (pool-10-thread-47) return true
2012-06-14 12:01:23,118 INFO  [org.ovirt.engine.core.bll.VdsInstaller] (pool-10-thread-47)  DownloadCertificateRequest ended:true
2012-06-14 12:01:23,118 INFO  [org.ovirt.engine.core.bll.VdsInstaller] (pool-10-thread-47) Installation of x.x.x.x. Executing installation stage. (Stage: Sign certificate request and generate certificate)
2012-06-14 12:01:24,122 ERROR [org.ovirt.engine.core.bll.InstallVdsCommand] (pool-10-thread-47) Command org.ovirt.engine.core.bll.InstallVdsCommand throw exception
    at org.ovirt.engine.core.utils.hostinstall.OpenSslCAWrapper.readAllLines(OpenSslCAWrapper.java:113)
    at org.ovirt.engine.core.utils.hostinstall.OpenSslCAWrapper.runCommandArray(OpenSslCAWrapper.java:65)
    at org.ovirt.engine.core.utils.hostinstall.OpenSslCAWrapper.SignCertificateRequest(OpenSslCAWrapper.java:25)
    at org.ovirt.engine.core.bll.VdsInstaller.RunStage(VdsInstaller.java:261)
    at org.ovirt.engine.core.bll.VdsInstaller.Install(VdsInstaller.java:207)
    at org.ovirt.engine.core.bll.InstallVdsCommand.executeCommand(InstallVdsCommand.java:77)
    at org.ovirt.engine.core.bll.CommandBase.ExecuteWithoutTransaction(CommandBase.java:629)
    at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:718)
    at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1022)
    at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:168)
    at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:107)
    at org.ovirt.engine.core.bll.CommandBase.Execute(CommandBase.java:733)
    at org.ovirt.engine.core.bll.CommandBase.ExecuteAction(CommandBase.java:207)
    at org.ovirt.engine.core.bll.MultipleActionsRunner.RunCommands(MultipleActionsRunner.java:140)
    at org.ovirt.engine.core.bll.MultipleActionsRunner$1.run(MultipleActionsRunner.java:61)
    at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:52)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)


Thanks for your help.

Looks familiar, was this setup upgraded from 2.2, that was originally installed as 2.1?


Yes.  This system dates all the way back to when RHEV 2 went GA and we built a RHEVM  on Windows Server 2003.  I believe that GA release was 2.0?  This same server carried us throug a couple of RHEV minor version software upgrades.  Until we upgraded to RHEV 3.0 a month ago, we ran RHEV 2.2 on Win Server 2003 in an unsupported configuration.  Some nice folks in Red Hat engineering were kind enough to test the 2.2 -> 3.0 migration scripts with our specific versions/setup prior to our upgrade.  The 2.2 -> 3.0 went essentially without a hitch.


I recommend opening a support case with Red Hat if it's upgraded from 2.1 -> 2.2 -> 3.0.


There are a couple of known issues with certificates while registering RHEV-H 6.2 hosts to the above 3.0 setup.


You can refer the below articles to see how you can work around them.





The GA build was 2.1, so please open a support case, like Sadique suggested. There is an outstanding issue only relevant to setups that started as the original 2009 build of RHEV 2.1, that takes some extra effort to resolve, and is better to be dealt with in a proper support case.



Alternatively, if you have the time and the hardware, you can always export the VMs to an export domain, and import them into a clean newly installed RHEV 3.0 setup.


Sorry about the inconvenience

Will do, thanks guys for the help.