VMs unknown state and RHEV-H's Non Responsive

Latest response

Running latest RHEV-H and latest RHEV-M on RHEL 6.6.
Just did a yum update which updated the following,

Transaction performed with:
Installed rpm-4.8.0-38.el6_6.x86_64 @rhel-x86_64-server-6
Updated subscription-manager-1.12.14-7.el6.x86_64 @rhel-x86_64-server-6
Installed yum-3.2.29-60.el6.noarch @rhel-x86_64-server-6
Installed yum-metadata-parser-1.1.2-16.el6.x86_64 @anaconda-RedHatEnterpriseLinux-201311111358.x86_64/6.5
Installed yum-plugin-versionlock-1.1.30-30.el6.noarch @rhel-x86_64-server-6
Packages Altered:
Updated java-1.7.0-openjdk-1:1.7.0.71-2.5.3.2.el6_6.x86_64 @rhel-x86_64-server-6
Update 1:1.7.0.75-2.5.4.0.el6_6.x86_64 @rhel-x86_64-server-6
Updated openssl-1.0.1e-30.el6_6.4.x86_64 @rhel-x86_64-server-6
Update 1.0.1e-30.el6_6.5.x86_64 @rhel-x86_64-server-6
Updated selinux-policy-3.7.19-260.el6_6.1.noarch @rhel-x86_64-server-6
Update 3.7.19-260.el6_6.2.noarch @rhel-x86_64-server-6
Updated selinux-policy-targeted-3.7.19-260.el6_6.1.noarch @rhel-x86_64-server-6
Update 3.7.19-260.el6_6.2.noarch @rhel-x86_64-server-6
Updated subscription-manager-1.12.14-7.el6.x86_64 @rhel-x86_64-server-6
Update 1.12.14-9.el6_6.x86_64 @rhel-x86_64-server-6

After that and a RHEV-M reboot all, VM's went in unkown state, all hosts (RHEV-H), storage domains, clusters, datacenters went in Non Responsive.
Lots of errors in engine.log, like

2015-01-21 21:55:25,414 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-31) Command GetCapabilitiesVDSCommand(HostName = rhevbohnw01.unix.regionh.top.local, HostId = 4256f307-fbab-4ae7-bdf0-7025d1ecf007, vds=Host[rhevbohnw01.unix.regionh.top.local,4256f307-fbab-4ae7-bdf0-7025d1ecf007]) execution failed. Exception: VDSNetworkException: javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate)

One per Host.

Decided to do a yum undo and without reboot everything is fine again.

Did the same yum update on another environment also with same versions RHEV-M and RHEV-H. Exact same issue here where undo also corrected the problem.

Anybody experienced the same??

Stig Overvad Poulsen's picture

Responses

Hi ! I Have the same issue and all my plateform is down !!

Have you submit a case about it ?

Christophe - did you check to see which files were altered as part of the update? (find /etc/ -mtime 7)
Also - did you try to back out the patches?

Hi Christophe

No haven't submitted a support case yet.
I did a yum history undo which solved my problem.
I have now collected logs using the log collector and plan to submit a support case first thing tomorrow.

I'll post any news I have when I know more

Hi to everyone,

Same issue here, please don't update your open ssl to the malfunction version:

openssl-1.0.1e-30.el6_6.5.x86_64

Adrian

This is not a problem in openssl. In the latest critical security update for openjdk 1.7 SSLv3 has been disabled by default as part of the solution for one of the critical vulnerabilities, see https://rhn.redhat.com/errata/RHSA-2015-0067.html

This is a good thing, however the VDSM daemon seems to have TLS protocol disabled, see http://www.ovirt.org/Features/PKI , hence the handshake fails. Temporary workaround until vdsm is fixed to work with TLS is to comment out

jdk.tls.disabledAlgorithms=SSLv3

from /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/jre/lib/security/java.security

Enrico Tagliavini

Hi Sven,
There is a good analyze! but I think that Red Hat must be careful about this sort of update.
I fall down my all virtual environnement caused by this issue.
I have open a case #01338370, and I will add your comment for permit red hat enginner to investigate about it.
Thank for your investigate!!!

I've opened a case wuth redhat and turned out I have to downgrade java-1.7.0-openjdk with the command:

yum downgrade java-1.7.0-openjdk

this removes java-1.7.0-openjdk.x86_64 1:1.7.0.75-2.5.4.0.el6_6 and installs java-1.7.0-openjdk.x86_64 1:1.7.0.71-2.5.3.2.el6_6

the I had to issue a ovirt-engine restart

problem solved, but it would have been better if it never happened....

Thank you. This tip helped resolve my issue. Downgraded openjdk and things came back up after reboot.

I've also opened a case with Red Hat to find out when this will be fixed or if it is a known issue.

On the RHEV-M machine:
yum downgrade java-1.7.0-openjdk
service ovirt-engine restart

Thanks. That fixed my problem also.

Thanks for very useful info from all.
Seems now there's a "Solution" also.
Red Hat - Updated Solution: RHEV hosts are Unresponsive after upgrading Java on the RHEV Manager

The same problem between Red Hat Storage Console and nodes.
Support didn't had a clue at all...
The support engineer mentioned old log files, network communication while it was clear from my log paste in the bug report this was an SSLException....
My colleague had the same problem with RHEV and that was the only reason we found out the RHS Console problem...

Fix your QA RH!

Oh, we had this fun too today!

"On the RHEV-M machine:
yum downgrade java-1.7.0-openjdk
service ovirt-engine restart"

This resolved for us as well. RHEV QA is a nightmare...

I installed an ovirt 3.5.1 virtualization host and manager machine, on RHEL 6.6. I did not experience the java issue when upgrading to java-1.7.0-openjdk-1.7.0.75-2.5.4.0.el6_6.x86_64.

Paul

"On the RHEV-M machine:
yum downgrade java-1.7.0-openjdk
service ovirt-engine restart"

Did the trick! QA much?

I had this issue as well last night, followed the same to fix:
On the RHEV-M machine:
yum downgrade java-1.7.0-openjdk
service ovirt-engine restart

Fixed it immediately. Thanks guys.

You guys are lifesavers. I panicked and brought down all my cluster preemptively and had no way of bringing it back up. No way to activate any hosts. I tried reaching support, but I found this post before they contacted me. Downgraded and everything is working.

Solution: https://access.redhat.com/solutions/1326683?sc_cid=cp|emnt|sol|vw&

Tried the recent RHEV-M update which worked fine.
Solution: VMs unknown state and RHEV-H's Non Responsive

Confimed: Red Hat Enterprise Virtualization Manager Version 3.4.5-0.3.el6ev works fine with SSLv3 turned off.

hi we are using rhevm3.5 with rhev-h 6.7. we face problem with vm status unknown after rhev-h nonresponsive or power card removed. kindly help us

Was this helpful?

We appreciate your feedback. Leave a comment if you would like to provide more detail.
It looks like we have some work to do. Leave a comment to let us know how we could improve.