Kernel panic - not syncing: Fatal exception

Latest response

Hi,
hope someone can help me :-)
kernel error
Did a fresh install of RHEL 8.0 GM on our ESXi.
Selinux disabled

After yum check-update / upgrade to RHEL 8.2 i get:
Security: kernel-core-4.18.0-193.1.2.el8_2.x86_64 is an installed security update
Security: kernel-core-4.18.0-80.el8.x86_64 is the currently running version

After a reboot i get the fatal exception (see attached file) error - booting the old Kernel - no problem.

  • I tried yum remove kernel-4.18.0-193....
  • yum install kernel-4.18.0-193.1.2.el8_2.x86_64
  • yum reinstall kernel...
    ... no success....kernel error

Also tried my luck with rescue iso an grub repair suggestions on web - but this comes to nothing...

Other RHEL 8 installations with older kernel and same install procedure have no problems...
e.g. kernel 4.18.0-147.5.1.el8_1.x86_64

Suggestions welcome :-)
Best regards,
Helmut

Responses

Hi Richard,

I'd recommend to check the following (in case not checked earlier):

  • Available free space for root file system and /tmp if mounted separately.
  • Any errors/warning in log messages file.
  • Make sure that there are corresponding boot files which are created for the new kernel under /boot such as 'initramfs-xxxxx', 'vmlinuz-xxxxx' etc..
  • Is there any noticeable differences when you compare the new initramfs and old files under /boot ? If you suspect this, then remove the new initramfs file, and re-create one manually using dracut command. https://access.redhat.com/solutions/1958

Hi! did some checks after yum update and before reboot... - free space enough available - no noticeable difference in initramfs files - bootfiles with new kernel all under /boot available - some errors in messages file:

May 14 10:39:31 HOSTNAME systemd[1]: rpc-statd.service: Failed with result 'timeout'.
May 14 10:40:08 HOSTNAME dracut[13720]:    microcode_ctl: kernel version "4.18.0-193.1.2.el8_2.x86_64" failed early load check for "intel-06-2d-07", skipping
May 14 10:40:08 HOSTNAME dracut[13720]:    microcode_ctl: kernel version "4.18.0-193.1.2.el8_2.x86_64" failed early load check for "intel-06-4f-01", skipping
May 14 10:40:08 HOSTNAME dracut[13720]:    microcode_ctl: kernel version "4.18.0-193.1.2.el8_2.x86_64" failed early load check for "intel-06-55-04", skipping
May 14 10:40:41 HOSTNAME dracut[22524]:    microcode_ctl: kernel version "4.18.0-193.1.2.el8_2.x86_64" failed early load check for "intel-06-2d-07", skipping
May 14 10:40:41 HOSTNAME dracut[22524]:    microcode_ctl: kernel version "4.18.0-193.1.2.el8_2.x86_64" failed early load check for "intel-06-4f-01", skipping
May 14 10:40:41 HOSTNAME dracut[22524]:    microcode_ctl: kernel version "4.18.0-193.1.2.el8_2.x86_64" failed early load check for "intel-06-55-04", skipping
May 14 10:40:59 HOSTNAME dracut[30968]:    microcode_ctl: kernel version "4.18.0-80.4.2.el8_0.x86_64" failed early load check for "intel-06-2d-07", skipping
May 14 10:40:59 HOSTNAME dracut[30968]:    microcode_ctl: kernel version "4.18.0-80.4.2.el8_0.x86_64" failed early load check for "intel-06-4f-01", skipping
May 14 10:40:59 HOSTNAME dracut[30968]:    microcode_ctl: kernel version "4.18.0-80.4.2.el8_0.x86_64" failed early load check for "intel-06-55-04", skipping
May 14 10:40:56 HOSTNAME dracut[30968]: dracut module 'busybox' will not be installed, because command 'busybox' could not be found!
May 14 10:40:56 HOSTNAME dracut[30968]: dracut module 'btrfs' will not be installed, because command 'btrfs' could not be found!
May 14 10:40:56 HOSTNAME dracut[30968]: dracut module 'dmraid' will not be installed, because command 'dmraid' could not be found!
May 14 10:40:57 HOSTNAME dracut[30968]: dracut module 'stratis' will not be installed, because command 'stratisd-init' could not be found!

now i'm a step further... - i did not reboot after yum update

dracut -f -v (rebuild current kernel - probably not necessary)
cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).bak.$(date +%m-%d-%H%M%S).img
dracut -f /boot/initramfs-4.18.0-193.1.2.el8_2.x86_64.img 4.18.0-193.1.2.el8_2.x86_64
grub2-mkconfig -o /boot/grub2/grub.cfg
  • reboot --> works -> uname -r --> shows new kernel
  • shutdown, powern on --> kernel panik

Ok, that is strange, if it works on reboot it should ideally be good on cold boot as well.

I think you can drop the word "ideally" my friend ! :)

Regards,
Christian

More than strange, Sadashiva ... eventually it might be a hardware problem - power button defect maybe ? Hope not, Richard ... :)

Regards,
Christian

I'm still not convinced Christian. But can't rule out until we get a proper evidence for such issue. I'm sure that it is not certainly power button issue :)

I've chosen the wrong word Sadashiva, instead of "eventually" I should better have said "possibly". :)

Regards,
Christian

Hi, im a collegue of Richard :-) It's a VM on ESXi..

I did an new install from golden Master with complete reprocedure:

uname -r
4.18.0-80.4.2.el8_0.x86_64

less /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.0 (Ootpa)"

yum check-update
yum update

Installieren    45 Pakete
Aktualisieren  908 Pakete

Gesamte Downloadgröße: 1.3 G
Ist dies in Ordnung? [j/N]: 
--> j

yum check-update

ed Hat Enterprise Linux 8 for x86_64 - AppStream (RPMs)
Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)

Security: kernel-core-4.18.0-193.1.2.el8_2.x86_64 is an installed security update
Security: kernel-core-4.18.0-80.4.2.el8_0.x86_64 is the currently running version

 ls -la --block-size=M initramfs-*
-rw-------. 1 root root 64M 25. Jun 2019  initramfs-0-rescue-d9155579c44d42d285a9e01fb082687f.img
-rw-------  1 root root 27M 14. Mai 15:36 initramfs-4.18.0-193.1.2.el8_2.x86_64.img
-rw-------  1 root root 27M 14. Mai 15:37 initramfs-4.18.0-80.4.2.el8_0.x86_64.img
-rw-------  1 root root 19M 14. Mai 15:34 initramfs-4.18.0-80.4.2.el8_0.x86_64kdump.img
-rw-------  1 root root 27M 14. Mai 15:34 initramfs-4.18.0-80.el8.x86_64.img
-rw-------. 1 root root 17M 29. Jun 2019  initramfs-4.18.0-80.el8.x86_64kdump.img


ls -la --block-size=M vmlinuz-*
-rwxr-xr-x. 1 root root 8M 25. Jun 2019  vmlinuz-0-rescue-d9155579c44d42d285a9e01fb082687f
-rwxr-xr-x  1 root root 9M  7. Mai 18:49 vmlinuz-4.18.0-193.1.2.el8_2.x86_64
-rwxr-xr-x  1 root root 8M 14. Jun 2019  vmlinuz-4.18.0-80.4.2.el8_0.x86_64
-rwxr-xr-x. 1 root root 8M 13. Mär 2019  vmlinuz-4.18.0-80.el8.x86_64

reboot now

----------->>>>>>>> WORKS

-> login
 uname -r
4.18.0-193.1.2.el8_2.x86_64
--> NEW KERNEL WORKS

shutdown now

------------>>>>>>>> KERNEL PANIC

Start wie OLD KERNEL

uname -r
4.18.0-80.4.2.el8_0.x86_64

yum check-update

Security: kernel-core-4.18.0-193.1.2.el8_2.x86_64 is an installed security update
Security: kernel-core-4.18.0-80.4.2.el8_0.x86_64 is the currently running version

yum update

Abhängigkeiten sind aufgelöst.
Nichts zu tun.
Fertig.

- reboot now
- choosing new kernel
- WORKS!!!
- reboot now (again)
- WORKS!!!

- shutdown now --> starting
------------->>>>>>>> KERNEL PANIC

I'm suspecting that this could due to vm-tools not working properly or some sort issue at this level. I see that in one of the KBs it is suggested to manually run "vmware-config-tools.pl" after kernel update. Please refer this https://access.redhat.com/solutions/69122. It is worth to check this. This article only points to RHEL6/5 releases, not sure if this is relevant to RHEL8/7 as well.

I also thought about this ... but we're using open-vm-tools on our RHEL 8 servers.

Hi Helmut,

You are using the correct package : open-vm-tools is the placement for vmware-tools. :)

Regards,
Christian

Hi Helmut/Richard,

Have you opened a support ticket with Red Hat? If not done yet, please open it, so that Red Hat team would involve in analyzing and fixing this issue. Also, please do update this thread with findings/fix from Red Hat. It is good for community as well...

All the best!

Hi, yes i already opened a ticket yesterday and will update this thread ... Regards, Helmut

Thank you, Helmut ! I'm excited to get the results of the investigation about what's the root cause. :)

Regards,
Christian

Yes, even me Christian. Lets hear it from Helmut/Richard about fixes.

Hi Helmut/Richard,

Have you guys found the fix? What did Red Hat Technical team say about this? Is this case still active with support team?

I'm sure that your answer or reply would certainly help the community.

Hi!

Yes the case is still actice - server boots now with the kernel - but still very curious :-)

1) Support team did see an error in the serial output loggin -> panic while initializing the cpufreq driver

2) This function was added in new kernel:

  1109  void refresh_frequency_limits(struct cpufreq_policy *policy)
  1110  {
  1111          struct cpufreq_policy new_policy;
  1112  
  1113          if (!policy_is_inactive(policy)) {
  1114                  new_policy = *policy;
  1115                  pr_debug("updating policy for CPU %u\n", policy->cpu);
  1116  
  1117                  cpufreq_set_policy(policy, &new_policy);
  1118          }
  1119  }
  1120  EXPORT_SYMBOL(refresh_frequency_limits);

3) Got in struction to disable service "tuned.service" and then append "intel_pstate=disable" at booting with new kernel

4) -->> SUCCESSFULL BOOT

5) Support team checked on their VM infrasructure --> no problems

6) Did some investigation to inform support team --> we use ESXI 6.5 and Guest Version RHEL 7, i know that ESXI 6.5 Update 1 supports RHEL 8 --> but didn't have no problems so far (with older kernels and RHEL 8.2):

7) The curious thing: - during collecting informations for support team - i booted the machine with new kernel, tuned.service enabled / started, and standard boot parameter --> WORKS - did shutdown, reboot - new kernel, old kernel, and so on - no more problems...

I think i have to setup a new machine and check if the error is always away with the instructions above, or this was only undefinably luck?! Now waiting for support team - sent a new serial output debug log...

Thank you for the reply Helmut. I appreciate if support team trigger a new knowledge base solution article regarding this, so that it would help if others come across the same.

Yesterday we upgraded one of you ESXi to version 6.7 U2 (before it was 6.5). Moved the server to the new ESXi Host, did a compatibility upgrade to Version 15 (before it was 13) and set the guest operating system to RHEL 8 (bevore it was RHEL 7). Booted --> no errors - Installed another RHEL 8 server on the same ESXi - no errors --> Problem solved

I had also a remote session with RHEL support, and the above was not their first suggestion to solve the Problem - case is still open. I suppose becaus this configuration (ESXi 6.5, VM version 13, Guest OS --> RHEL 7) did work until 4.18.0-193.