Upgrade from RHEVM 3.0 to RHEVM 3.1 gone wrong
Hi All,
I have have been trying to resolve an issue that has happened to me, and although I have a case opened it might be worth opening it up for discussion and perhaps see if the collective effort can help to resolve the issue quickly.
I have a Physical server running RHEL 6.2 and within RHEVM 3.0, I had a cluster of 1 host with an FC storage and 1 Cluster of 1 host running VMs in local storage with everything working fine. Everything began when I attempted an upgrade to 3.1, seemed easy enough but the upgrade failed and the roll back plan did not work. Thankfully the upgrade process did a database backup for me and I was left with an sql file and little else. After much research and opening a case, I ended up installing a fresh install of server and RHEVM 3.0 software, I now had a working server but no clusters/hosts/vms from before.
The sql file had engine as the database not "rhevm", i am told rhevm was the 3.0 version of the database and "engine" the 3.1 version. I opted to rename the intances of the word "engine" to "rhevm". I went into the postgress database environment and renamed the original "blank" rhevm db to something else and injected the sql file. It created the database just fine, but the GUI no longer worked. I released that the database created did not have any permissions as the blank rhevm database. Created these to be the same and then GUI worked fine. I was now able to see the old clusters/hosts/vms/networks, etc but all were in an unknown state. I followed a guide i found in the knowledgebase called "How to restore rhevm from a database backup after a crash" which gave me some info on how to proceed. I had to put the SPM status to "null" via some sql commands in the guide. I now needed to put the hosts in maintenance mode and activate. But it saw that VMs were running so it could not go to maintenance mode. I managed to reboot the host with the FC storage and worked fine, the host was seen and put into maintenance and then had to remove and re-register the host, only then was the host seen in active mode. I saw all vms and did not lose any data.
Now my focus went to the other host, the one with locally stored VMs (one VM being a server in production). I am worried that if do the same as for my other host, when i re-register the host it will erase all my locally stored VMs, thus removing the production server!.
I would welcome any sugguestions on this matter, i am sure this must have happened befoe to someone.
Thank you
Brandon Saccone
Responses
Oh wow...
With all RHEV deployments, my gut screams to me, do RHEV-M in a virtual machine. Now I know why. This advice won't help your current situation, but may be useful in the future.
Do RHEL on bare metal, then do a RHEL/libvirt guest VM. The standard RHEL subscription comes with at least one supported guest VM, so this doesn't cost one extra penny. You can put the virtual disk for that VM in either an NFS or iSCSI or FC store on your SAN if you want. Install RHEV-M inside your RHEL VM. Leave your RHEL on bare metal host as clean and simple as possible and put all your management stuff inside that VM. If you're running Windows Servers, it might also make sense to set up a libvirt VM as a Windows Domain Controller and DNS server. This way, you have a copy of your internal DNS and Active Directory outside the RHEV environment just in case.
So now, when it's time up upgrade RHEV, you can shut down your RHEV-M VM and just make a copy of its virtual disk file. Kind of a poor man's snapshot I guess. But now you have the complex RHEV-M system completely preserved. Upgrade RHEVM on it and if the upgrade goes south, just copy the .img file and you're back where you started.
On your production VM - do you have a downtime window? Maybe you can back it up from inside the VM, rebuild your data center, and then provision it again and restore it. I know this is ugly.
- Greg Scott
In addition to a VM, I suggest using separate VG, PVs and LVs for the RHEVM app, reports, etc.. This allows you to migrate the application more cleanly, positions the app for an easier Cluster integration, but most importantly it would also allow you to do some additional reduncancy if you were to mirror the volumes and then split them prior to the upgrade. I'm not sure that the RHEVM installation documentation identifies the filesystems that are used, so you would have to figure that out.
Hey Brandon,
I'm sure some of us will follow this quite closely as you work to a resolution and I really wish I had advice to offer.
How much local storage are you working with?
I suspect (i.e. I do not know - so, please don't try this without validation)...
you could
- backup your local storage to an external device
- configure your RHEL host to be a Hypervisor
- determine if the local storage is still valid
- if storage is valid, but no VMs are found, SQL injection of the hosts from the backup database into your new RHEVM (engine) database would probably work
- if storage is not valid, there seems to be a number of utilities that can import VM's from different types of media/storage. Although I believe this would be a bit of a serial process.
I have not worked with the local storage option. From what I have seen the VMs are all volume-based (i.e. a VM is a volume) and the names are mostly ambiguous, appearing like a UUID. Were your VMs in a separate VG, or part of the system's base VG?
This issue will require a pretty smart resource, but I am confident that RH Support will get you through this. They may have already tested this in the lab and have a procedure all worked out!
Best of luck!!!
If I were in your shoes, I would assume worst case and that when you redo your RHEV environment, it's going to blow away everything that was in place. You'll probably end up rebuilding your host because it will need new certificates and whatever else it needs to connect to the new RHEV environment and you'll end up redoing your local storage. Maybe the support guys will say different, but I would cover my rear-end just in case.
On your Bare Metal Recovery, my suggestion is to forget the local USB drive. I don't think you need it. Do your Windows Bare Metal recovery to somewhere - anywhere - where there's enough space to accomodate it. And do a full system backup too. If you're putting the stuff on a real, physical USB drive, maybe connect it to a workstation and share it from the workstation. Do your backups over the network and I think the Windows Recovery Environment is smart enough now to deal with networks. So you should be able to boot that VM into the WinRE environment, repair your system, navigate to the network share where your backups sit, and restore everything from there.
***Before*** taking down your production VM, practice all this. In fact - what if you provision an old fashioned RHEL/Libvirt VM? Do your recovery using this VM and test it. So now you have two copies of your VM. If this RHEL VM works, then do it for real. Shut down your RHEV VM and run your restored VM in production for a while while you re-do your RHEV environment. When ready, set up a RHEV Export domain and import it into your new RHEV environment.
One detail - while you're practicing, you'll probably have both copies of that VM up and running simultaneously. So make sure your test VM is connected to a different virtual network. Windows hates seeing two systems with the same Computername and IP Address, especially if it's a domain controller. So make sure those copies of VMs are isolated from each other.
I know this is ugly and leaves copies of that VM lying around all over the place, but this easy to clean up.
Also, do your new RHEVM in a VM this time. And push hard for a hat or tee-shirt or a free year of support or something from the support guys for going through all this agony. :)
- Greg
Here are a couple more links on that bare metal Windows restore. This first one doesn't give much useful info.
http://technet.microsoft.com/en-us/library/dd979562(v=ws.10).aspx
Another one with some more promise:
http://technet.microsoft.com/en-us/library/ee849849(v=ws.10).aspx
Paydirt. This one shows a step by step process to backup and recover. The process uses Hyper-V virtual machines but don't worry about that. You care about backup and recovery over the network. Pay attention to the network part of it and don't get hung up with the Hyper-V stuff.
- Greg
Oh yes, one more note on that Windows Backup and recovery. I'm not sure I completely agree with that blog you're following. I think you want to do a full system backup - which includes bare metal recovery and everything else. That blog says to only do a bare metal recovery.
This is why we practice I guess.
- Greg
And then some big picture stuff for the future that doesn't do Brandon any good today.
I really really really hope the developers follow this closely. The fundamental issue is, **everything** depends on RHEVM. If RHEVM breaks, the whole world breaks. We all know about the VMware world - if the vCenter server breaks, I can still manage the individual hosts. (Although VMware may be backing away from this with that new web client.)
With RHEV, I don't have any way to touch an individual host directly with any kind of usable interface, so if RHEV-M gets messed up, I'm screwed. Hopefully an architectural fix for this is on the development drawing board.
- Greg
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
