Kernel Panic in functions like gup_pgd_range() after a corrected "Hardware Error" is reported

Solution Verified - Updated 2024-11-24T22:33:18+00:00 -

Red Hat Lightspeed can detect this issue

Proactively detect and remediate issues impacting your systems.

View matching systems and remediation

Environment

Dell PowerEdge Hardware
Red Hat Enterprise Linux 8
- Specifically kernel versions below kernel-4.18.0-305.91.1.el8_4
- Hugepages must be configured

Issue

The kernel panics after a corrected memory "Hardware Error".

Resolution

Red Hat Enterprise Linux 8

The issue has been resolved with kernel-4.18.0-348.el8 via Errata .
The issue was tracked at private Bugzilla 1984173.

Red Hat Enterprise Linux 8.4 AUS

The issue has been resolved with kernel-4.18.0-305.91.1.el8_4 via Errata RHSA-2023:3461.
The issue was tracked at private Bugzilla 2188306.

Workaround

A tested workaround is to boot the server with the ghes.disable=y kernel command line option to temporarily disable the hardware error reporting code causing the crashes.

Please reference the following article for more information on how to change the kernel command line options:

How do I permanently modify the kernel command line?

Root Cause

A race condition exists in hardware error detection and handling code paths provided from the ghes kernel module. When a hardware error is detected for some unit of memory (also known as a page of memory), the ghes module works to migrate the contents of that memory to help save those contents.

The race condition occurs when a hugepage is being migrated as the result of the aforementioned migration. ghes hands off the remaining actions of the migration to a kworker thread. During the migration by the kworker thread, another process (e.g. KVM) coincidentally using that same memory attempts to access the "migrated" page. The migration requires modifications to the hierarchical page management structures (in this case, the Page Upper Directory or PUD and Page Middle Directory or PMD). The migration activity needs to be locked to prevent concurrent access to those migrating pages but is not, because the kernel did not consider the PUD entry for the page in question to be eligible for migration. As such, the kworker process and the other process experience a "General Protection Fault" and panic the kernel due to the kworker process migrating the page of memory out from under the second process. The amount and size of hugepages on the system and frequency of hardware errors can influence the likelihood of the bug occurring wherein increases to hugepage quantity and/or hugepage size increases and/or hardware error detection frequency increases the likelihood of hitting the bug.

The fix changes the code to assume the PUD entry is eligible for migration resulting in it being locked down from concurrent access while the migration occurs.

Note The issue may be triggered by hardware errors regardless if the error is corrected or uncorrected. The error triggers by a combination of how Dell hardware handles hardware errors and a bug in the Linux kernel and not the direct cause of a defect nor indicates a defect with Dell hardware. Similarly, while a single or infrequent corrected hardware error is generally considered safe to ignore, substantial quantities of correct hardware errors in a very short period of time (hundreds within a single second, for example) should at least be reviewed by the hardware vendor; such activity can introduce substantial jitter to latency-sensitive workloads wherein the OS must handle the errors.

Diagnostic Steps

Pre-requisites

Deploy kdump in Order to Collect a vmcore:
- Vmcore analyis is required to determine if you are being impacted by this issue. This first requires that a vmcore is dumped successfully.
- If the kexec-tools package is absent or the kdump service is inactive, please reference the following article to install, enable, start, and configure kdump:
  How to troubleshoot kernel crashes, hangs, or reboots with kdump on Red Hat Enterprise Linux
Prepare crash Environment for vmcore Analysis
- Ensure that you have the crash package installed, and if necessary install the package:
```
# yum install crash
```
- Ensure the necessary debuginfo package is installed. See the following article for more information:
  How can I download or install debuginfo packages for RHEL systems?

Vmcore Analysis

Here is the backtrace of the failing process. It is handling a userspace page fault for a guest virtual machine (VM):

PID: 20191    TASK: ffff9842c2350000  CPU: 56   COMMAND: "CPU 3/KVM"
 #0 [ffffaaab26b3f768] machine_kexec at ffffffffb2c6090e
 #1 [ffffaaab26b3f7c0] __crash_kexec at ffffffffb2d8f0bd
 #2 [ffffaaab26b3f888] crash_kexec at ffffffffb2d8ffad
 #3 [ffffaaab26b3f8a0] oops_end at ffffffffb2c2435d
 #4 [ffffaaab26b3f8c0] general_protection at ffffffffb36010ce
    [exception RIP: gup_pgd_range+0x24c]
    RIP: ffffffffb2e95acc  RSP: ffffaaab26b3f970  RFLAGS: 00010086
    RAX: 00007f1148cdffff  RBX: 000f98136ffff230  RCX: ffff981580000230
    RDX: 000fffffffffffff  RSI: 00007f1148ce0000  RDI: effffffdeffffe02
    RBP: ffffaaab26b3fa4c   R8: ffffaaab26b3fa4c   R9: ffffaaab26b3fbbb
    R10: 00112a630d68d462  R11: 0000000000000000  R12: 0000000000000001
    R13: 0000000000000000  R14: 00007f1148cdf000  R15: 00007f1148cdf000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffaaab26b3fa38] internal_get_user_pages_fast at ffffffffb2e97d1e
 #6 [ffffaaab26b3fa80] __get_user_pages_fast at ffffffffb2e97e38
 #7 [ffffaaab26b3fa88] __gfn_to_pfn_memslot at ffffffffc09efee4 [kvm]
 #8 [ffffaaab26b3faf0] try_async_pf at ffffffffc0a2e821 [kvm]
 #9 [ffffaaab26b3fb68] direct_page_fault at ffffffffc0a39b4a [kvm]
#10 [ffffaaab26b3fc30] kvm_mmu_page_fault at ffffffffc0a3a3f9 [kvm]
#11 [ffffaaab26b3fd18] vcpu_enter_guest at ffffffffc0a0d37c [kvm]
#12 [ffffaaab26b3fdb8] kvm_arch_vcpu_ioctl_run at ffffffffc0a101ea [kvm]
#13 [ffffaaab26b3fde8] kvm_vcpu_ioctl at ffffffffc09ed71a [kvm]
#14 [ffffaaab26b3fe80] do_vfs_ioctl at ffffffffb2f2d234
#15 [ffffaaab26b3fef8] ksys_ioctl at ffffffffb2f2d870
#16 [ffffaaab26b3ff30] __x64_sys_ioctl at ffffffffb2f2d8b6
#17 [ffffaaab26b3ff38] do_syscall_64 at ffffffffb2c0420b
#18 [ffffaaab26b3ff50] entry_SYSCALL_64_after_hwframe at ffffffffb36000ad
    RIP: 00007f1334a0c62b  RSP: 00007f12537fd628  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000055c92fb0b410  RCX: 00007f1334a0c62b
    RDX: 0000000000000000  RSI: 000000000000ae80  RDI: 0000000000000060
    RBP: 0000000000000000   R8: 000055c92c88fdd8   R9: 000000000000002c
    R10: 0000000000000001  R11: 0000000000000246  R12: 0000000000000001
    R13: 000055c92c8b2020  R14: 0000000000000000  R15: 00007f1338379000
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

A corrected hardware error is seen in the logs just prior to the panic:

crash> log
[...cut...]
[327450.877805] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
[327450.877807] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[327450.877808] {1}[Hardware Error]: event severity: corrected
[327450.877809] {1}[Hardware Error]:  Error 0, type: corrected
[327450.877810] {1}[Hardware Error]:  fru_text: A1
[327450.877810] {1}[Hardware Error]:   section_type: memory error
[327450.877811] {1}[Hardware Error]:   error_status: 0x0000000000000400
[327450.877812] {1}[Hardware Error]:   physical_address: 0x00000010af886040
[327450.877814] {1}[Hardware Error]:   node: 0 card: 0 module: 0 rank: 1 bank: 2 device: 4 row: 44193 column: 928 
[327450.877815] {1}[Hardware Error]:   error_type: 2, single-bit ECC
[327450.877816] {1}[Hardware Error]:   DIMM location: not present. DMI handle: 0x0000 
[327450.877828] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 65534
[327450.877828] {2}[Hardware Error]: It has been corrected by h/w and requires no further action
[327450.877828] {2}[Hardware Error]: event severity: corrected
[327450.877829] {2}[Hardware Error]:  Error 0, type: corrected
[327450.877831] {2}[Hardware Error]:   section type: unknown, 330f1140-72a5-11df-9690-0002a5d5c51b
[327450.877832] {2}[Hardware Error]:   section length: 0x38
[327450.877834] {2}[Hardware Error]:   00000000: 01010001 00000000 af886000 00000010  .........`......
[327450.877836] {2}[Hardware Error]:   00000010: 00001000 00000000 af886fff 00000010  .........o......
[327450.877837] {2}[Hardware Error]:   00000020: 00000080 00000000 00000000 00000000  ................
[327450.877838] {2}[Hardware Error]:   00000030: 00000000 00000000                    ........
[327450.886746] general protection fault: 0000 [#1] SMP NOPTI
[327450.892235] CPU: 56 PID: 20191 Comm: CPU 3/KVM Kdump: loaded Tainted: G          I      --------- -  - 4.18.0-305.34.2.el8_4.x86_64 #1
[327450.904392] Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.15.1 06/15/2022
[327450.912055] RIP: 0010:gup_pgd_range+0x24c/0xc50
[327450.916673] Code: 89 03 00 00 48 81 e3 00 00 00 c0 48 21 d8 48 03 0d a9 13 ee 00 4c 89 74 24 10 48 8d 1c 01 49 8d 46 ff 4d 89 fe 48 89 44 24 58 <4c> 8b 23 4d 8d ae 00 00 20 00 49 81 e5 00 00 e0 ff 49 8d 45 ff 4c
[327450.935505] RSP: 0018:ffffaaab26b3f970 EFLAGS: 00010086
[327450.940817] RAX: 00007f1148cdffff RBX: 000f98136ffff230 RCX: ffff981580000230
[327450.948036] RDX: 000fffffffffffff RSI: 00007f1148ce0000 RDI: effffffdeffffe02
[327450.955256] RBP: ffffaaab26b3fa4c R08: ffffaaab26b3fa4c R09: ffffaaab26b3fbbb
[327450.962476] R10: 00112a630d68d462 R11: 0000000000000000 R12: 0000000000000001
[327450.969696] R13: 0000000000000000 R14: 00007f1148cdf000 R15: 00007f1148cdf000
[327450.976916] FS:  00007f12537fe700(0000) GS:ffff984500d00000(0000) knlGS:0000000000000000
[327450.985087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[327450.990921] CR2: 000055e478b5dfc0 CR3: 0000002d48056001 CR4: 00000000007726e0
[327450.998138] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[327451.005358] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[327451.012578] PKRU: 55555554

The hardware error occurs on a 1 GB page starting at address 0x1080000000:

crash> ptov 0x00000010af886040
VIRTUAL           PHYSICAL        
ffff98262f886040  10af886040 

crash> vtop ffff98262f886040
VIRTUAL           PHYSICAL        
ffff98262f886040  10af886040      

PGD DIRECTORY: ffffffffb4210000
PAGE DIRECTORY: 5c73c01067
   PUD: 5c73c014c0 => 80000010800001e3
  PAGE: 1080000000  (1GB)

      PTE          PHYSICAL   FLAGS
80000010800001e3  1080000000  (PRESENT|RW|ACCESSED|DIRTY|PSE|GLOBAL|NX)

      PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffd02382be2180 10af886000                0        0  1 17ffffc0400000 hwpoison\

The address on which we panic is held in %rbx and placed on the stack:

crash> bt -FFls
ffffaaab26b3f978: ffff9842c9ef2228 00007f1148ce0000 
ffffaaab26b3f988: ffffaaab26b3fa4c 00007f1148ce0000 
ffffaaab26b3f998: 0000000000000007 00007f1148ce0000 
ffffaaab26b3f9a8: ffffaaab26b3f9f8 ffff9842c80567f0
ffffaaab26b3f9b8: 00007f1148ce0000 00007f1100080005
ffffaaab26b3f9c8: 00007f1148cdffff 00007f1148cdffff
ffffaaab26b3f9d8: 84607eb05b91ba00 00007f1148cdffff 
ffffaaab26b3f9e8: 0000000126b3faa0 00007f1148cdffff 
ffffaaab26b3f9f8: 0000002d49ef2067 84607eb05b91ba00 

ffffaaab26b3fa08: 00007f1148cdf000 0000000000080005
                        %rbx              %rbp
                        addr 

ffffaaab26b3fa18: 0000000000000001 ffffaaab26b3fab0 
                         %r12           %r13
                                     page (struct)

ffffaaab26b3fa28: 0000000000000206 [ffff9842edf34c10:kmalloc-2k] 
                         %r14               %r15
                                    struct kvm_memory_slot

ffffaaab26b3fa38: internal_get_user_pages_fast+0xce   We are in gup_pgd_range
                            %rip

The address maps back to the following physical page:

crash> vtop 00007f1148cdf000
VIRTUAL     PHYSICAL        
7f1148cdf000  f88cdf000       

   PGD: 2d480567f0 => 2d49ef2067
   PUD: 2d49ef2228 => 8000000f80000887
  PAGE: f80000000  (1GB)

      PTE         PHYSICAL   FLAGS
8000000f80000887  f80000000  (PRESENT|RW|USER|PSE|NX)

      VMA           START       END     FLAGS FILE
ffff9842c2911878 7f0a40000000 7f1240000000 2c4600fb /dev/hugepages/libvirt/qemu/7-instance-00000630/qemu_back_mem._objects_ram-node0.CTZ2nX

      PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffd0237e2337c0  f88cdf000                0        0  0 17ffffc0000000

Looking at the PUD page which holds the 1 GB page page table entry (PTE), we see this page is out of sequence and likely held the physical page on which the hardware error occurred, 1080000000:

2d49ef2140:  0000002d41a92067 80000017800008e7   g .A-...........
2d49ef2150:  80000017400008e7 80000017000008e7   ...@............
2d49ef2160:  80000016c00008e7 80000016800008e7   ................
2d49ef2170:  80000016400008e7 80000016000008e7   ...@............
2d49ef2180:  80000015c00008e7 80000015800008e7   ................
2d49ef2190:  80000015400008e7 80000015000008e7   ...@............
2d49ef21a0:  80000014c00008e7 80000014800008e7   ................
2d49ef21b0:  80000014400008e7 80000014000008e7   ...@............
2d49ef21c0:  80000013c00008e7 80000013800008e7   ................
2d49ef21d0:  80000013400008e7 80000013000008e7   ...@............
2d49ef21e0:  80000012c00008e7 80000012800008e7   ................
2d49ef21f0:  80000012400008e7 80000012000008e7   ...@............
2d49ef2200:  80000011c00008e7 80000011800008e7   ................
2d49ef2210:  80000011400008e7 80000011000008e7   ...@............
2d49ef2220:  80000010c00008e7 8000000f80000887   ................  <<<-------
2d49ef2230:  80000010400008e7 80000010000008e7   ...@............
2d49ef2240:  8000000fc00008e7 0000002d725f2067   ........g _r-...
2d49ef2250:  0000002d48174067 0000002d42820067   g@.H-...g..B-...
2d49ef2260:  0000002d49e6d067 0000000000000000   g..I-...........

The page which took a hardware error is being migrated without locking. Since it
is a 1 GB page there is a larger window for the page to be touched during migration
without locking, causing a panic as the data in the page may be fluctuating at
this time.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Kernel Panic in functions like gup_pgd_range() after a corrected "Hardware Error" is reported

Red Hat Lightspeed can detect this issue

Environment

Issue

Resolution

Red Hat Enterprise Linux 8

Red Hat Enterprise Linux 8.4 AUS

Workaround

Root Cause

Diagnostic Steps

Pre-requisites

Vmcore Analysis

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Red Hat Lightspeed can detect this issue

Environment

Issue

Resolution

Red Hat Enterprise Linux 8

Red Hat Enterprise Linux 8.4 AUS

Workaround

Root Cause

Diagnostic Steps

Pre-requisites

Vmcore Analysis

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links