RHEL 7.4 system does not boot after updating with any newer Kernel greater then 3.10.0-514.21.1.el7.x86_64

Latest response

RHEL 7.4 fails to boot after updating well with any Kernel greater then 3.10.0-514.21.1.el7.x86_64.

The system don't find the Root Partition, installed on lvm2.
Boot-Message:
dracut-initqueue[279] Warning: Could not boot
dracut-initqueue[279] Warning; /dev/mapper/rhel_...-root does not exit

Responses

It certainly unable to locate the required root device '/dev/mapper/rhel-root' as expected and hence, fails, so you could check out block devices and partitions available from the dracut shell. Run the command "parted /dev/sda -s p" to find out partition details (assuming root file system installed on first drive which is by default ).

Also, activate the volume group and find out logical volumes present. Run the command "lvm vgscan" and then "lvm vgchange -ay" from the dracut shell to check on this. Run "blkid" which should ideally shows your root , swap file system corresponding block devices.

Assuming your older kernel is intact, try booting into older kernel and check if respective vmlinuz image file is generated for the new kernel under /boot . You may try rebuilding initird image and check. Also, check out for free space under /tmp file system or under root if it is not mounted separately.

Hallo Sadashiva, after "lvm vgscan" and "lvm vgchange -ay" in dracut-shell the volume group for root partition is there, lvdisplay shows, that the logical volume for root is present but LV Status is "Not available" for root and swap partiton. I don't understand why. blkid shows the lvm-partition on /dev/sdg2 as Type LVM2_member. Recreate the initrd for the new kernel don't help. Error message is the same.
The root filesystem has 40GB free space and /tmp is no exclusive mountpoint.

Best Reguards

Ludger Köhler, were you able to fix it?

You can activate lvm using "lvchange -ay /dev/vgname/lvname".

Check out the steps listed out in this KB https://access.redhat.com/solutions/1282013 You would need to perform the steps under "VIEW AND REPAIR THE GRUB CONFIGURATION". Let's know what happens after this. All the best!

Sorry, this don't help. lvm.conf filter = [ "a/.*/" ]

Output dracut-shell "lvm pvscan" and "lvm vgscan" shows root VG is there. "lvm lvscan" shows inactive root LV lvm vgchange -ay shows "0 logical Volume(s) in volume group for root now active

I could assume here that, the system could boot back into previous kernel without any problem. If so, could you boot to older working kernel configuration and check if required files under /boot for the new kernel available there such as vmlinuz.x, intramfs.x, check out the /boot/grub2/grub.cfg file for any errors on the line begining with linux16 for the newer kernel.

Hi Ludger,

I watched this discussion for a while and assume that the problem most probably might occur due to the fact that you upgraded the system from RHEL 7.3 to RHEL 7.4 ... because you mention that all kernels after version 3.10.0-514.21.1.el7.x86_64 don't boot.

Kernel 3.10.0-514 was in use on RHEL 7.3 and the kernel included in RHEL 7.4 is 3.10.0-693. Upgrading an operating system often leads to various (more or less big) problems and that is why I recommended to perform a clean installation of the system after the final edition was made available. You may want to consider following this advice, I think that the issue should be solved afterwards.

Regards,
Christian

Christian, yes at times even I too feel that clean installation is best, but it is not a feasible option in most cases. I've not seen much issues in this process (minor release upgrades) except in this case, whenever upgraded to RHEL7.4. Certainly, there are a many things to look into when you consider the complete upgrade (say for example upgrading from RHEL7.3 to RHEL7.4), application compatibility, how easy is to role-back in case upgrade fails, how much time we get to work on all these activities/troubleshooting etc,. however, if you look at this situation in which we are discussing here, it is just that the boot configs, files, modules are not in sync, or build properly after the kernel upgrade.

Yes, I see that these upgrades were discussed in a couple of earlier threads: https://access.redhat.com/discussions/3133621 https://access.redhat.com/discussions/3135161

Not sure if this is an on-going issue with RHEL7.4 upgrade, also no clarity if Red Hat is aware of this or if any one reached/raised case with Red Hat so far regarding this.

Ludger Köhler, you may raise a case with Red Hat if you've valid support subscription.

Hi Sadashiva,

I don't think that it is a "special RHEL 7.4 upgrade issue" ... things like these happen - I just wanted to say that a clean installation most certainly avoids such issues and that Ludger might spend less time for reinstalling the system than it can take to fix the configs and modules you correctly mentioned. :)

Regards,
Christian

Thanks for all answers, I don't think, that this is a problem for RHEL7.4 The problem is also existing for RHEL7.3 with kernel > vmlinuz-3.10.0-514.21.1.el7.x86_64 so kernel vmlinuz-3.10.0-514.26.1.el7.x86_64 and vmlinuz-3.10.0-514.26.2.el7.x86_64 also don't work. The files under /boot for all kernel (vmlinuz.x, intramfs.x) are available and /boot/grub2/grub.cfg has the right strings for every kernel. Creating new initrd also don't work. Yes, we have valid support subscription and wie can raise a case with Redhat.

You're welcome Ludger,

Then first try to get it fixed together with the technical support team and if they can't get it done in a reasonable time, just perform a clean installation. By the way, you may consider to use the ext4 file system instead of lvm.

Cheers :)
Christian

Sorry Christian, use ext4 instead of lvm? ... you mean to say "use ext4 instead of xfs"? I don't think that is necessary here.

Oh, I am sorry Sadashiva, seems I didn't express good enough what I wanted to say. I meant the partitioning, installing the system on a "good old" reliable ext4 partition (by selecting "manual configuring" in Anaconda installer) instead of installing it to logical volumes, so in "LVM mode". :)

Regards,
Christian

Good, please raise a case and keep us posted. Sharing knowledge is the greatest thing that we can give it back to the community :)

Was there a case raised with Red Hat support team for this issue? I hope you don't mind updating this discussion thread about the happenings.....

Hello Ludger,

Can you still boot with an old kernel?

If so what does the output of the commands df -h /boot and df -h /?

IMHO it might be that the /boot filesystem or / filesystem is too small, so each kernel update failed.

The boot filesystem would be the candidate.

Another option could be that the initrd files miss the lvm modules. To fix this I advise you, like Sadashiva Murthy M already did, open a support case and upload a sosreport.

Regards,

Jan Gerrit Kootstra

Good idea Jan Gerrit, but the default setting in /etc/yum.conf is installonly_limit=3, so when Ludger didn't change that to a much higher number ... these three kernels should not consume too much space - or am I wrong ? :)

Regards,
Christian

boot Filesystem is 1GB, 400M in use, enough space in every initrd the lvm module is present

Hello Christian,

I have seen setups with only 256M of space in /boot, because kickstarts where still based on Red Hat 6.1 or Red Hat 9 (before RHEL) installations and the partition/lvm setup where never updated.

Ignoring the obvious is bad praticise is my experience.

Regards,

Jan Gerrit

You are right Jan Gerrit, assuming "worst case" is better than assuming "normal or best case" ... by the way - I never saw an advantage in creating a separate /boot partition ... and in case things break, I simply fire up clonezilla. :)

Regards,
Christian

The "advantage" is that you have less restrictions on the filesystem type of the "root" filesystem.

Historically it was a restriction of bootloaders not being able to detect lvm physical volumes at boot stage 0. So you needed a boot stage 1 and stage 2 on a filesystem that is on a disk partition instead of a logical volume.

Thank you for this information Jan Gerrit, I also want to take the opportunity to thank you generally for your helpful support and your useful contributions here. We all can really learn a lot from you and your experience, Jan Gerrit ! :)

Regards,
Christian

Thanks Christian, 20 years in the business one learns some and gains some experience. Just learning new tricks gets a bid harder. So the RHCE 7 exam is a crime. I do not have day to day experience anymore since my promotion to Solution Designer.

So I build a demo/developer environment about 17 years ago, when I was still a system administrator in Unix/Linux and a PHD student in applied mathematics.

So if I see a discussion that takes a long time or is about a subject I know, I replay the issue and if needed open a support case myself. This is avantage of working for a Red Hat Partner for over a decade. In some situations I ask my contacts within Red Hat to look into a discussion and they ask a colleague to have a look too.

So I am pleased to be able to teach and help out newbies and see them learn quickly.

Look at yourself, how many contributions you give to the forum: Great.

Thank you for your kind words Jan Gerrit, it seems we are having the same approach, I'm trying my best to help (new) users getting their problems solved and so, give back something to the Red Hat community. I am really glad that you appreciate my contributions. :)

Regards,
Christian

Hello Ludger,

You mentioned in every initrd there is the lvm module, did you extract the content of the initrd files?

If so, did you also check the filter in /etc/lvm.conf that is inside the initrd?

In the past my colleagues and I experienced that the old MAKEINITRD on RHEL 5 or RHEL 6 did not always create the same /etc/lvm/lvm.conf as the one on the root filesystem, where the one created by anaconda during installation was "perfect".

Regards,

Jan Gerrit

Yes Jan, in fact I directed him to this KB https://access.redhat.com/solutions/1282013 to modify filters file in lvm.conf and rebuild initrd image afterwards which he has tried, but didn't work. But you never know, even with my experience I would say sometimes it is better to re-verify (double check) settings before jumping into conclusions. I too had experienced cases wherein initrd was not build and had missing modules required to detect root file system on a lvm, so recreating and making sure required modules are present is the perfect way.

I'm thinking why don't Red Hat would come up a tool/command interface which can perform some of these sanity checks after installing a new kernel to confirm if required modules are added (comparing the new initrd image with currently being used one), required parameters are properly added to grub file, or even add these to the kernel installation process itself would be good. This would save a lot of such issues or atleast people would be aware that new kernel is successfully installed with all modules/configuration or not, so that they don't take a chance of rebooting and going through all trouble, wasting time which is very important which an enterprise doesn't want to loose out (keeping in mind of rebooting into older kernel). I strongly urge Red Hat should come up with a tool for this, otherwise, we need to manually verify all these (config files, grub file, size constraints, initrd image check etc.,).

Jan, it is nice to see your involvement here and helping others. Thanks!!!

Hi Sadashiva,

Having such a checking tool would indeed be great ... good idea to propose it, hopefully a technician reads your request.

Regards,
Christian

in dracut shell while executing "lvm vgscan" getting lvm command not found error.

Something strange, dracut shell would by default supports lvm command. You could get into interactive lvm mode by typing "lvm" and hit enter key on dracut shell. After words you could see the prompt getting changed to "lvm>" where you could run pvscan, vgscan, lvscan to find out logical volumes. Activate all the lvms such as root, swap etc, using the command "lvm lvchange -ay /" and then hit exit to check if it boots up.

no wonder, you have to boot with rescue mode and validate the initrd-.img file which is having the lvm files or not. if the initrd-.img file is not having the lvm files then need to re-generate and verify it. which will help you to fix this issue.

I have seen "increased the nofile limit for this service to 16384 also changed TimeoutStartSec to 240secs" shouldn't let you go into rescue mode.

If you have HP hardware the OS driver may not be compatible with the raid controller. In my case, I had to remove all occurrences of "hpsa.hpsa_allow_any=1 hpsa.hpsa_simple_mode=1" parameters from the /etc/default/grub file and the grub.cfg file. Once this was done the OS could see the local drives and the system booted.