The kernel logs show "Page fault caused by firmware" error
Environment
- Red Hat Enterprise Linux 7
- Specifically between kernel versions
kernel-3.10.0-1160.31.1.el7
up to but not includingkernel-3.10.0-1160.61.1.el7
Issue
- The kernel logs show errors and warnings regarding EFI Runtime Services, Firmware Bugs, a
kworker
thread blocked for a long time, etc. - Assembling Intel's imsm RAID container fails with
mdadm
errors. - EFI related tasks might hang or fail, for example
efibootmgr
. - The system might panic after some time if the above errors are noted and if
kernel.hung_task_panic
sysctl is enabled.
Resolution
- Update to
kernel-3.10.0-1160.62.1.el7
or later. The patch was released in errata RHSA-2022:1198.
Root Cause
- An update to how the kernel switches addresses spaces when invoking UEFI runtime services introduced a bug wherein UEFI runtime services could be disabled and cause a
kworker
thread to hang indefinitely. - UEFIs provide a variety of functionality useful for any Operation System. The functionality is vital for properly booting an Operating System, but the same functions may be useful after a system fully boots (generally referred to as "runtime").
- The code path facilitating the use of the functions was streamlined in the upstream Linux kernel and backported to Red Hat Enterprise Linux.
- The streamlining introduced a bug in how the kernel manages Translation Lookaside Buffer (TLB) flushes documented in the article RHEL7: kernel crashes in leave_mm() with a message "kernel BUG at arch/x86/mm/tlb.c:48!".
- The above fix addressed the prior issue but did not address an edge case. The edge case occurred due to an inconsistency in the state of the current memory management structure used to map memory for the CPU.
- Effectively, the inconsistency allowed for the CPU to attempt to operate on regular memory within its TLB while the EFI memory was still currently mapped in. This triggers a page fault (we have an address for memory but do not actually "own" it) and the EFI recovers from this page fault. However, due to how EFIs handle page faults, EFI runtime services are then disabled and the
kworker
thread managing the work going to the UEFI is frozen in uninterruptible sleep (D) state. - The fix establishes a state for the CPU's TLB when doing these operations and takes action accordingly based on the state avoiding the inconsistency.
Diagnostic Steps
-
If the system is still operating, check the kernel ring buffer via
dmesg
orjournalctl -k
for any of the following errors/warnings:[ 34.313537] WARNING: CPU: 58 PID: 2597 at arch/x86/platform/efi/quirks.c:489 efi_recover_from_page_fault+0x40/0x120 [ 34.313541] [Firmware Bug]: Page fault caused by firmware at PA: 0xfffffffeecb081e7 [...] [ 34.313584] CPU: 58 PID: 2597 Comm: kworker/u672:4 Not tainted 3.10.0-1160.36.2.el7.x86_64 #1 [ 34.313603] Workqueue: efi_rts_wq efi_call_rts [...] [ 34.313787] efi: Froze efi_rts_wq and disabled EFI Runtime Services [...] [ 34.313854] efi: EFI Runtime Services are disabled!
-
The following is an example from
dmesg
where the issue occurred:[ 11.004945] ------------[ cut here ]------------ [ 11.004961] WARNING: CPU: 38 PID: 5 at arch/x86/platform/efi/quirks.c:489 efi_recover_from_page_fault+0x40/0x120 [ 11.004964] [Firmware Bug]: Page fault caused by firmware at PA: 0xfffffffeeca13c7f [ 11.004966] Modules linked in: wdat_wdt pcspkr ipmi_ssif joydev sg i2c_i801 lpc_ich mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_pad(+) acpi_power_meter(+) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ast igb drm_kms_helper crct10dif_pclmul crct10dif_common crc32c_intel ptp syscopyarea pps_core sysfillrect sysimgblt dca fb_sys_fops nvme i2c_algo_bit ttm nvme_core drm ahci libahci megaraid_sas libata drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod fuse [ 11.005015] CPU: 38 PID: 5 Comm: kworker/u256:0 Not tainted 3.10.0-1160.36.2.el7.x86_64 #1 [ 11.005027] Workqueue: efi_rts_wq efi_call_rts [ 11.005029] Call Trace: [ 11.005039] [<ffffffff84383559>] dump_stack+0x19/0x1b [ 11.005047] [<ffffffff83c9b258>] __warn+0xd8/0x100 [ 11.005053] [<ffffffff83c9b2df>] warn_slowpath_fmt+0x5f/0x80 [ 11.005057] [<ffffffff83c8c830>] efi_recover_from_page_fault+0x40/0x120 [ 11.005066] [<ffffffff83c75c86>] no_context+0xb6/0x300 [ 11.005071] [<ffffffff83c75fe2>] __bad_area_nosemaphore+0x112/0x220 [ 11.005076] [<ffffffff83c76104>] bad_area_nosemaphore+0x14/0x20 [ 11.005083] [<ffffffff84390750>] __do_page_fault+0x310/0x500 [ 11.005092] [<ffffffff83ce8494>] ? find_busiest_group+0x144/0x990 [ 11.005096] [<ffffffff84390975>] do_page_fault+0x35/0x90 [ 11.005100] [<ffffffff8438c778>] page_fault+0x28/0x30 [ 11.005107] [<ffffffff83c7df01>] ? switch_mm_irqs_off+0x21/0x290 [ 11.005111] [<ffffffff83c7df01>] ? switch_mm_irqs_off+0x21/0x290 [ 11.005116] [<ffffffff83c7dfe9>] ? switch_mm_irqs_off+0x109/0x290 [ 11.005120] [<ffffffff83c7dff5>] ? switch_mm_irqs_off+0x115/0x290 [ 11.005123] [<ffffffff83c7dff5>] ? switch_mm_irqs_off+0x115/0x290 [ 11.005127] [<ffffffff83c7dfe9>] ? switch_mm_irqs_off+0x109/0x290 [ 11.005131] [<ffffffff83c7dff5>] ? switch_mm_irqs_off+0x115/0x290 [ 11.005135] [<ffffffff83c7dfe9>] ? switch_mm_irqs_off+0x109/0x290 [ 11.005139] [<ffffffff83c7dfe9>] ? switch_mm_irqs_off+0x109/0x290 [ 11.005142] [<ffffffff83c8daa8>] ? efi_call+0x58/0x90 [ 11.005147] [<ffffffff83c8ce51>] ? efi_switch_mm+0x31/0x50 [ 11.005152] [<ffffffff841d4139>] ? efi_call_rts+0x2f9/0x600 [ 11.005161] [<ffffffff83cbde6f>] ? process_one_work+0x17f/0x440 [ 11.005167] [<ffffffff83cbef86>] ? worker_thread+0x126/0x3c0 [ 11.005172] [<ffffffff83cbee60>] ? manage_workers.isra.26+0x2a0/0x2a0 [ 11.005177] [<ffffffff83cc5e41>] ? kthread+0xd1/0xe0 [ 11.005182] [<ffffffff83cc5d70>] ? insert_kthread_work+0x40/0x40 [ 11.005188] [<ffffffff84395df7>] ? ret_from_fork_nospec_begin+0x21/0x21 [ 11.005193] [<ffffffff83cc5d70>] ? insert_kthread_work+0x40/0x40 [ 11.005196] ---[ end trace 32d18cdf2277bf49 ]--- [ 11.005209] efi: Froze efi_rts_wq and disabled EFI Runtime Services <<---------- [ 11.006254] ------------[ cut here ]------------ [ 11.006265] WARNING: CPU: 38 PID: 1333 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0 [ 11.006267] list_del corruption, ffffddf083ff0c20->next is LIST_POISON1 (dead000000000100) [ 11.006295] efi: EFI Runtime Services are disabled! [ 11.006269] Modules linked in: wdat_wdt pcspkr ipmi_ssif joydev sg i2c_i801 lpc_ich mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_pad(+) acpi_power_meter(+) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ast igb drm_kms_helper crct10dif_pclmul crct10dif_common crc32c_intel ptp syscopyarea pps_core sysfillrect sysimgblt dca fb_sys_fops nvme i2c_algo_bit ttm nvme_core drm ahci libahci megaraid_sas libata drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod fuse [ 11.006309] CPU: 38 PID: 1333 Comm: systemd-journal Tainted: G W ------------ 3.10.0-1160.36.2.el7.x86_64 #1 [ 11.006313] Call Trace: [ 11.006317] [<ffffffff84383559>] dump_stack+0x19/0x1b [ 11.006322] [<ffffffff83c9b258>] __warn+0xd8/0x100 [ 11.006328] [<ffffffff83c9b2df>] warn_slowpath_fmt+0x5f/0x80 [ 11.006332] [<ffffffff83ce4cfc>] ? dequeue_entity+0x11c/0x5c0 [ 11.006336] [<ffffffff83fa6753>] __list_del_entry+0x63/0xd0 [ 11.006341] [<ffffffff83fa67cd>] list_del+0xd/0x30 [ 11.006345] [<ffffffff83c7c0a8>] pgd_free+0x58/0x90 [ 11.006349] [<ffffffff83c97687>] __mmdrop+0x27/0xb0 [ 11.006355] [<ffffffff83cd4bbd>] finish_task_switch+0x16d/0x1c0 [ 11.006362] [<ffffffff84388e10>] __schedule+0x320/0x680 [ 11.006367] [<ffffffff84389199>] schedule+0x29/0x70 [ 11.006371] [<ffffffff843885dd>] schedule_hrtimeout_range_clock+0x12d/0x150 [ 11.006380] [<ffffffff83e9d4b9>] ? ep_scan_ready_list.isra.7+0x1b9/0x1f0 [ 11.006385] [<ffffffff84388613>] schedule_hrtimeout_range+0x13/0x20 [ 11.006389] [<ffffffff83e9d74e>] ep_poll+0x23e/0x360 [ 11.006397] [<ffffffff83e4c288>] ? do_truncate+0x88/0xc0 [ 11.006402] [<ffffffff83e51498>] ? __sb_start_write+0x58/0x120 [ 11.006407] [<ffffffff83cdadd0>] ? wake_up_state+0x20/0x20 [ 11.006412] [<ffffffff83e9ec2d>] SyS_epoll_wait+0xed/0x120 [ 11.006420] [<ffffffff84395f92>] system_call_fastpath+0x25/0x2a [ 11.006423] ---[ end trace 32d18cdf2277bf4a ]--- [............] [ 241.982299] INFO: task kworker/u256:0:5 blocked for more than 120 seconds. [ 241.982307] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 241.982311] kworker/u256:0 D ffff99031956a6e0 0 5 2 0x00000000 [ 241.982332] Workqueue: efi_rts_wq efi_call_rts [ 241.982336] Call Trace: [ 241.982352] [<ffffffffb8389199>] schedule+0x29/0x70 [ 241.982363] [<ffffffffb7c8c8cc>] efi_recover_from_page_fault+0xdc/0x120 [ 241.982373] [<ffffffffb7c75c86>] no_context+0xb6/0x300 [ 241.982381] [<ffffffffb7c75fe2>] __bad_area_nosemaphore+0x112/0x220 [ 241.982389] [<ffffffffb7c76104>] bad_area_nosemaphore+0x14/0x20 [ 241.982397] [<ffffffffb8390750>] __do_page_fault+0x310/0x500 [ 241.982407] [<ffffffffb7ce8494>] ? find_busiest_group+0x144/0x990 [ 241.982413] [<ffffffffb8390975>] do_page_fault+0x35/0x90 [ 241.982419] [<ffffffffb838c778>] page_fault+0x28/0x30 [ 241.982427] [<ffffffffb7ce0c26>] ? select_task_rq_fair+0x5a6/0x760 [ 241.982434] [<ffffffffb7c7df01>] ? switch_mm_irqs_off+0x21/0x290 [ 241.982440] [<ffffffffb7c7df01>] ? switch_mm_irqs_off+0x21/0x290 [ 241.982446] [<ffffffffb7c7dfe9>] ? switch_mm_irqs_off+0x109/0x290 [ 241.982451] [<ffffffffb7c7dff5>] ? switch_mm_irqs_off+0x115/0x290 [ 241.982457] [<ffffffffb7c7dff5>] ? switch_mm_irqs_off+0x115/0x290 [ 241.982462] [<ffffffffb7c7dfe9>] ? switch_mm_irqs_off+0x109/0x290 [ 241.982467] [<ffffffffb7c7dff5>] ? switch_mm_irqs_off+0x115/0x290 [ 241.982473] [<ffffffffb7c7dfe9>] ? switch_mm_irqs_off+0x109/0x290 [ 241.982478] [<ffffffffb7c7dfe9>] ? switch_mm_irqs_off+0x109/0x290 [ 241.982483] [<ffffffffb7c8daa8>] ? efi_call+0x58/0x90 [ 241.982489] [<ffffffffb7c8ce51>] ? efi_switch_mm+0x31/0x50 [ 241.982495] [<ffffffffb81d4139>] ? efi_call_rts+0x2f9/0x600 [ 241.982507] [<ffffffffb7cbde6f>] ? process_one_work+0x17f/0x440 [ 241.982514] [<ffffffffb7cbef86>] ? worker_thread+0x126/0x3c0 [ 241.982522] [<ffffffffb7cbee60>] ? manage_workers.isra.26+0x2a0/0x2a0 [ 241.982529] [<ffffffffb7cc5e41>] ? kthread+0xd1/0xe0 [ 241.982535] [<ffffffffb7cc5d70>] ? insert_kthread_work+0x40/0x40 [ 241.982543] [<ffffffffb8395df7>] ? ret_from_fork_nospec_begin+0x21/0x21 [ 241.982550] [<ffffffffb7cc5d70>] ? insert_kthread_work+0x40/0x40
-
-
If the system is unresponsive, try to capture a vmcore via NMI if the system is configured to crash and create a vmcore when an NMI is sent to the system.
- For more information on setting up
kdump
to create a vmcore on a system crash, please refer to the article How to troubleshoot kernel crashes, hangs, or reboots with kdump on Red Hat Enterprise Linux. - For more information on configuring a system to trigger
kdump
when an NMI is sent to the system, please refer to the article How can I configure my system to crash when NMI switch is pushed? - For more information on reading a vmcore, please refer to the documentation 7.6. Analyzing a core dump
- For more information on setting up
-
Within the vmcore, the kernel ring buffer can be checked with
log
ordmesg
. Check herein for logs similar to what is noted above. -
Within the vmcore, check if any
kworker
threads are blocked for an extended period of time and blocking inefi_call
crash> foreach UN ps -m [0 00:05:50.998] [UN] PID: 2597 TASK: ffffa2bffa90b180 CPU: 58 COMMAND: "kworker/u672:4"
-
Below is the stack for the above
kworker
thread:efi_call+0x58 efi_call_rts+0x2f9 process_one_work+0x17f worker_thread+0x126 kthread+0xd1 ret_from_fork_nospec_begin+0x7
-
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments