RHEV 3.0 to 3.1 upgrade experience

Latest response

Just want to share my experience upgrading from v3.0 to v3.1.

Lab upgrade

Our RHEV lab consists of 3 hypervisors, fibre channel storage, no vlans, no bonds, 70 active VMs. The rhevm machine is running as a KVM virtual machine under RHEL6. 

  1. First we upgraded all hypervisors to v6.3-20121212, as we were uncertain about the hypervisor requirement for the upgrade.
  2. To have a good rollback possibility we shut down the rhevm virtual machine, and copied the disk image to another machine.
  3. Then did a full yum update of the rhevm server, rebooted, added the jbappplatform-6-x86_64-server-6 and rhel-x86_64-server-6-rhevm-3.1 repos, "yum update rhevm-setup" and ran "rhevm-upgrade" to do the upgrade.
  4. Failed on "Error: The current system contains a block (iSCSI/Fibre Channel) Export Storage Domain which is no longer supported.". This was an inactive export domain that we didn't need anymore, so I deleted it and started over.
  5. Tried a new rhevm-upgrade, and failed again. This time was a much worse failure. I had failed to remove the old rhel-x86_64-server-6-rhevm-3 and jbappplatform-5-x86_64-server-6-rpm repositories, so the installation had gotten wrong packages installed. I wasn't able to re-run the upgrade at this point, and was told to roll back to old versions. This was non-trivial, so I ended up going back to my backup rhevm diskimage and start from scratch.
  6. Repeat step 1,2,3, remove rhevm3.0 and jboss5 repos, remove no longer needed storage domain again, start new rhevm-upgrade.
  7. Success! 
  8. Then did the "yum install rhevm-dwh ; rhevm-dwh-setup" to upgrade the history service. This failed several times because it kept running out of space doing a huge database dump to /var/lib/ovirt-engine/backups/. But the failures were unproblematic and could be restarted after I had added more disk space to /var/.
  9. Then upgraded the reports package doing "yum install rhevm-reports ; rhevm-reports-setup".

And finally upgraded cluster and datacenter compatibility to v3.1 using the rhevm webui. No problem.

The fact that I could easily roll back to the previous disk-image made the me feel confident that the upgrade would be safe to do to our production environment also.

 

Production upgrade

The lab was running on v3.1 for about 2 weeks before we did the production upgrade. Our production environment is a bit more complex than the lab. 22 hypervisors, iSCSI storage from NetApp and Storwize, active/passive bonds for rhevm and production networks, lots of VLANs, 2 hypervisors doing local storage.

Before doing the upgrade we made sure all hypervisors were running 6.3-something (mix of versions from 201206xx-20121212). The backup strategy were the same as in the lab, copy the rhevm disk image to a safe place.

Then we ran the same steps as for the lab upgrade, avoiding the rhevm v3.0 and jboss5 repository problems, so the main upgrade webt without any issues. Unfortunately we were unable to avoid running out of disk space during the reports package upgrade. It needed quite insane amounts of disk (18GB for the dump) and ran out of disk several times.

Other than that the upgrade was completely unproblematic, we noticed that we couldn't upgrade the cluster/datacenter compatibility level before all hypervisors were upgraded to 6.3-201212, did the upgrade and we're now happily running v3.1 everywhere!

Responses