RH linux 5 server does not boot anymore after kernel update
Hi,
I have some Red Hat 5 Linux servers running kernel version: 2.6.18-164.el5. I need to upgrade at least to version 2.6.18-185.el5 because I want to extend a LVM ext3-partition with more Gbytes. I have downloaded version 1.6.18-255.el5 and did the upgrade on a virtual testserver. On a Xenserver platform it worked but on a Hyper-V server as host, the virtual Linux machine does not boot anymore.
# rpm -U `kernel-files` (was used as upgrade command)
This seems to work but when i reboot the server it does not startup anymore. It fails on finding VolGroup00.
After putting back the .VHD of the virtual machine, i could get my server back up with the old kernel.
Is this a known issue ? Is there another kernel more recent than 2.6.18-185 that could help ?
Rgds,
Responses
Hello,
I am unclear as to why you need to update your kernel in order to resize your LVM2 logical volume and corresponding ext3 file system. All available RHEL 5 kernels, including 2.6.18-164.el5, are able to resize logical volumes. Could you clarify this please?
Regarding the system not finding VolGroup00: did you make any changes to your LVM layout before you rebooted? For instance, did you extend the volume group and logical volume? Did you make any changes to /boot/grub/grub.conf or /etc/lvm/lvm.conf? If you could post the full set of steps that you followed, it may help us understand what went wrong.
Can you also check your /etc/lvm/lvm.conf for the line 'filter = [ ... ]' and post it here?
Thanks,
John Ruemker, RHCA
Red Hat Technical Account Manager
Thank you for clearing that up. You are correct that it is best to be on the later kernel version before performing the resize2fs, to be sure you don't run into any issues such as what's described in the bugzilla.
Often times system administrators will reconfigure the lvm filter while the system is up and running, and it won't be discovered right away that this filter change has caused a problem since the volume group is already active. However when you install a new kernel, it will remake the initrd, which in turn pulls in the latest copy of lvm.conf. If you had an invalid filter in there that would prevent init from finding the physical volume(s) for VolGroup00, then you would see the issue you have described.
Can you post the 'filter' line from your /etc/lvm/lvm.conf?
Also, can you let us know if booting from the old kernel entry in grub allows the system to boot normally?
Thanks,
John Ruemker, RHCA
Red Hat Technical Account Manager
Thank you for providing that data. Your filter line isn't to blame then, since you are using the default which causes LVM to scan all devices for physical volumes.
Unfortunately I can't explain why you would be seeing this problem, using the data we have. With a look at your initrd's contents, we might be able to tell more. Would you be able to open a case with Red Hat Global Support Services so that we can collect some data from your system and do a bit more digging? Let me know if you need any information on how to open a case.
Regards,
John Ruemker, RHCA
Red Hat Technical Account Manager
I am having the same problem as in this ticket...I actually just opened a ticket for this in Redhat support but then found this thread. All I did was install the base Linux OS...then added it as one my supported servers and then ran "yum update". There were about 600 updates in this and one of them was the kernel.
After updates were completed and I restarted the server I ran into this issue. I can go to GRUB during inital startup and slect the old kernel version and I am able to boot the system just fine.
I have not made any changes to anything on the server. I did fresht install on our Hyper-V server (i.e., VM). the lvm.conf file is the default one that was there and no changes have been done.
Any insight on this would be greatly appreciated.
I am also hitting this problem but I am not aware of any recent updates I made to my machine. I am currently going through the machine and will let you know if I find a solution. I may have made some configuration changes between reboots. The error seems to indicate that the volume group that is specified in my initrd for root is not detected.
Sadique mentioned this earlier, but I want to touch on it too.
It is possible that you need to rebuild your init ramdisk (initrd) with support for the Microsoft paravirtualized block driver. A new initrd is generated and used with your newly installed kernel.
Please refer to your Linux Integration Services documentation for additional details.
Phil Jensen, RHCA
Sr. Operations Engineer
Proofpoint, Inc.
Hi, I've had similar issues with RHEL 5 servers not rebooting after patching to the latest kernel. The resolution in our case was to remove 'noapic' from the kernel boot parameters in grub.conf.
Hi, I'm running RHEL 5.5 over 3 years without Linux Integration Services on my Hyper-V cluster and could apply all kernel updates without any problems until the last update 2.6.18-348.1.1 - atfer this update the OS does not boot anymore. My last functional kernel was 2.6.18-308.16.1 - it worked without any problems. With the latest kernel I get the same message - Volume group "VolGroup00" not found - this results in kernel panic. I'm able to boot with the previous kernel 2.6.18-308.16.1 - the kernel boot parameters are for both kernel versions identical.
Hello,
In many of these situations where a newer kernel fails to boot where the old one still does, its not so much the kernel thats at fault, its the new initrd that was built for that kernel. If you have changed anything in your /etc/lvm/lvm.conf (such as the filter) since the last kernel was installed, your system may have continued to boot fine until now but once that lvm.conf is included in an initrd, your boot process and devices will depend on the values in that file; if there were a syntax error or a setting which would cause your root Volume Group to not be found, the act of installing a new kernel would include it in the initrd and booting off that new kernel would panic. Similar to how if you were booting from a multipath device and you had modified multipath.conf, it may not have had a noticeable impact until you were actually mapping the boot device in the initrd using that new configuration. Or if you had modified the options for the SCSI controller's kernel module in /etc/modprobe.conf, it wouldn't have actually been applied until those options were included in the initrd.
So, the best way to solve these types of problems is to actually inspect the contents of the initrd to see what changed. This can be a complicated process, so Red Hat Global Support Services can certainly help you with it if you wish to open a case. But if you'd rather try it on your own, you can unpack each of the initrd's (in RHEL 5) like so:
# mkdir initrd-old
# cd initrd-old
# zcat /boot/initrd-2.6.18-Y1.Z1.el5.img | cpio -idv
# mkdir ../initrd-new
# cd ../initrd-new
# zcat /boot/initrd-2.6.18-Y2.Z2.el5.img | cpio -idv
(filling in the correct filename for the old and new initrds, respectively). Now that you have them unpacked, you can compare the files with diff or however is convenient for you. If you can spot a few differences, its usually not too difficult to figure out which is to blame. If you start in the files mentioned above (lvm.conf, multipath.conf, modprobe.conf, etc), those are usually the common culprits in these scenarios.
Again, don't hesitate to open a case if you'd like some help with this.
Regards,
John Ruemker, RHCA
Senior Software Maintenance Engineer
Global Support Services
Red Hat, Inc.
The joys of virtualization environments - particularly if running the virtualization environment's tools within your VMs - is that, if someone updates your VM management tools, it can klotz your modprobe.conf. Had that happen recently and it took doing the above to notice, "oh, crap, the VM management tools ate my disk drivers' entries".
I've also run into this with systems where I was simulating a physical server by setting up bonded interfaces within the VM. Dunno about Hyper-V's management tools, but VMware's management tools tend to assume a *very* simple configuration and, if you do the automated install, will tend to punish you if you've done anything marginally "complicated" (VMware tools seem to pretty much want to completely own the modprobe.conf files).
I've been running into issues with LVM becoming invalid (for a number of different reasons). More often in my Virtualized environment.
Assuming you have a method to boot from media (network, CD, etc...) I would recommend booting off that media and then running
vgscan && pvscan && lvscan
vgs && pvs && lvs
then check out the timestamp on your /etc/lvm/lvm.conf
If you use Satellite for deployments you can find a backup copy of the lvm.conf in /var/lib/rhncfg/backups/
If not, I would recommend comparing it to a working system.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
