Red Hat Enterprise Linux system hangs with the error "Page fault caused by firmware"

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 7

    • Specifically between kernel versions kernel-3.10.0-1160.31.1.el7 up to but not including kernel-3.10.0-1160.61.1.el7

Issue

  • The kernel logs show errors and warnings regarding EFI Runtime Services, Firmware Bugs, at least one kworker threads blocked for a long time, etc.
  • Assembling Intel's imsm RAID container fails with mdadm errors.
  • The system might become unresponsive after some time if the above errors are noted.

Resolution

Root Cause

  • An update to how the kernel switches addresses spaces when invoking UEFI runtime services introduced a bug wherein UEFI runtime services could be disabled and cause a kworker thread to hang indefinitely.
  • UEFIs provide a variety of functionality useful for any Operation System. The functionality is vital for properly booting an Operating System, but the same functions may be useful after a system fully boots (generally referred to as "runtime").
  • The code path facilitating the use of the functions was streamlined in the upstream Linux kernel and backported to Red Hat Enterprise Linux.
  • The streamlining introduced a bug in how the kernel manages Translation Lookaside Buffer (TLB) flushes documented in the article RHEL7: kernel crashes in leave_mm() with a message "kernel BUG at arch/x86/mm/tlb.c:48!".
  • The above fix addressed the prior issue but did not address an edge case. The edge case occurred due to an inconsistency in the state of the current memory management structure used to map memory for the CPU.
  • Effectively, the inconsistency allowed for the CPU to attempt to operate on regular memory within its TLB while the EFI memory was still currently mapped in. This triggers a page fault (we have an address for memory but do not actually "own" it) and the EFI recovers from this page fault. However, due to how EFIs handle page faults, EFI runtime services are then disabled and the kworker thread managing the work going to the UEFI is frozen in uninterruptible sleep (D) state.
  • The fix establishes a state for the CPU's TLB when doing these operations and takes action accordingly based on the state avoiding the inconsistency.

Diagnostic Steps

  • If the system is still operating, check the kernel ring buffer via dmesg or journalctl -k for any of the following errors/warnings:

    [   34.313537] WARNING: CPU: 58 PID: 2597 at arch/x86/platform/efi/quirks.c:489 efi_recover_from_page_fault+0x40/0x120
    [   34.313541] [Firmware Bug]: Page fault caused by firmware at PA: 0xfffffffeecb081e7
    [...]
    [   34.313584] CPU: 58 PID: 2597 Comm: kworker/u672:4 Not tainted 3.10.0-1160.36.2.el7.x86_64 #1
    [   34.313603] Workqueue: efi_rts_wq efi_call_rts
    [...]
    [   34.313787] efi: Froze efi_rts_wq and disabled EFI Runtime Services
    [...]
    [   34.313854] efi: EFI Runtime Services are disabled!
    
    • The following is an example from dmesg where the issue occurred:

      [   11.004945] ------------[ cut here ]------------
      [   11.004961] WARNING: CPU: 38 PID: 5 at arch/x86/platform/efi/quirks.c:489 efi_recover_from_page_fault+0x40/0x120
      [   11.004964] [Firmware Bug]: Page fault caused by firmware at PA: 0xfffffffeeca13c7f
      [   11.004966] Modules linked in: wdat_wdt pcspkr ipmi_ssif joydev sg i2c_i801 lpc_ich mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_pad(+) acpi_power_meter(+) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ast igb drm_kms_helper crct10dif_pclmul crct10dif_common crc32c_intel ptp syscopyarea pps_core sysfillrect sysimgblt dca fb_sys_fops nvme i2c_algo_bit ttm nvme_core drm ahci libahci megaraid_sas libata drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod fuse
      [   11.005015] CPU: 38 PID: 5 Comm: kworker/u256:0 Not tainted 3.10.0-1160.36.2.el7.x86_64 #1
      [   11.005027] Workqueue: efi_rts_wq efi_call_rts
      [   11.005029] Call Trace:
      [   11.005039]  [<ffffffff84383559>] dump_stack+0x19/0x1b
      [   11.005047]  [<ffffffff83c9b258>] __warn+0xd8/0x100
      [   11.005053]  [<ffffffff83c9b2df>] warn_slowpath_fmt+0x5f/0x80
      [   11.005057]  [<ffffffff83c8c830>] efi_recover_from_page_fault+0x40/0x120
      [   11.005066]  [<ffffffff83c75c86>] no_context+0xb6/0x300
      [   11.005071]  [<ffffffff83c75fe2>] __bad_area_nosemaphore+0x112/0x220
      [   11.005076]  [<ffffffff83c76104>] bad_area_nosemaphore+0x14/0x20
      [   11.005083]  [<ffffffff84390750>] __do_page_fault+0x310/0x500
      [   11.005092]  [<ffffffff83ce8494>] ? find_busiest_group+0x144/0x990
      [   11.005096]  [<ffffffff84390975>] do_page_fault+0x35/0x90
      [   11.005100]  [<ffffffff8438c778>] page_fault+0x28/0x30
      [   11.005107]  [<ffffffff83c7df01>] ? switch_mm_irqs_off+0x21/0x290
      [   11.005111]  [<ffffffff83c7df01>] ? switch_mm_irqs_off+0x21/0x290
      [   11.005116]  [<ffffffff83c7dfe9>] ? switch_mm_irqs_off+0x109/0x290
      [   11.005120]  [<ffffffff83c7dff5>] ? switch_mm_irqs_off+0x115/0x290
      [   11.005123]  [<ffffffff83c7dff5>] ? switch_mm_irqs_off+0x115/0x290
      [   11.005127]  [<ffffffff83c7dfe9>] ? switch_mm_irqs_off+0x109/0x290
      [   11.005131]  [<ffffffff83c7dff5>] ? switch_mm_irqs_off+0x115/0x290
      [   11.005135]  [<ffffffff83c7dfe9>] ? switch_mm_irqs_off+0x109/0x290
      [   11.005139]  [<ffffffff83c7dfe9>] ? switch_mm_irqs_off+0x109/0x290
      [   11.005142]  [<ffffffff83c8daa8>] ? efi_call+0x58/0x90
      [   11.005147]  [<ffffffff83c8ce51>] ? efi_switch_mm+0x31/0x50
      [   11.005152]  [<ffffffff841d4139>] ? efi_call_rts+0x2f9/0x600
      [   11.005161]  [<ffffffff83cbde6f>] ? process_one_work+0x17f/0x440
      [   11.005167]  [<ffffffff83cbef86>] ? worker_thread+0x126/0x3c0
      [   11.005172]  [<ffffffff83cbee60>] ? manage_workers.isra.26+0x2a0/0x2a0
      [   11.005177]  [<ffffffff83cc5e41>] ? kthread+0xd1/0xe0
      [   11.005182]  [<ffffffff83cc5d70>] ? insert_kthread_work+0x40/0x40
      [   11.005188]  [<ffffffff84395df7>] ? ret_from_fork_nospec_begin+0x21/0x21
      [   11.005193]  [<ffffffff83cc5d70>] ? insert_kthread_work+0x40/0x40
      [   11.005196] ---[ end trace 32d18cdf2277bf49 ]---
      [   11.005209] efi: Froze efi_rts_wq and disabled EFI Runtime Services          <<----------
      [   11.006254] ------------[ cut here ]------------
      [   11.006265] WARNING: CPU: 38 PID: 1333 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
      [   11.006267] list_del corruption, ffffddf083ff0c20->next is LIST_POISON1 (dead000000000100)
      [   11.006295] efi: EFI Runtime Services are disabled!
      [   11.006269] Modules linked in: wdat_wdt pcspkr ipmi_ssif joydev sg i2c_i801 lpc_ich mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_pad(+) acpi_power_meter(+) ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ast igb drm_kms_helper crct10dif_pclmul crct10dif_common crc32c_intel ptp syscopyarea pps_core sysfillrect sysimgblt dca fb_sys_fops nvme i2c_algo_bit ttm nvme_core drm ahci libahci megaraid_sas libata drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod fuse
      [   11.006309] CPU: 38 PID: 1333 Comm: systemd-journal Tainted: G        W      ------------   3.10.0-1160.36.2.el7.x86_64 #1
      [   11.006313] Call Trace:
      [   11.006317]  [<ffffffff84383559>] dump_stack+0x19/0x1b
      [   11.006322]  [<ffffffff83c9b258>] __warn+0xd8/0x100
      [   11.006328]  [<ffffffff83c9b2df>] warn_slowpath_fmt+0x5f/0x80
      [   11.006332]  [<ffffffff83ce4cfc>] ? dequeue_entity+0x11c/0x5c0
      [   11.006336]  [<ffffffff83fa6753>] __list_del_entry+0x63/0xd0
      [   11.006341]  [<ffffffff83fa67cd>] list_del+0xd/0x30
      [   11.006345]  [<ffffffff83c7c0a8>] pgd_free+0x58/0x90
      [   11.006349]  [<ffffffff83c97687>] __mmdrop+0x27/0xb0
      [   11.006355]  [<ffffffff83cd4bbd>] finish_task_switch+0x16d/0x1c0
      [   11.006362]  [<ffffffff84388e10>] __schedule+0x320/0x680
      [   11.006367]  [<ffffffff84389199>] schedule+0x29/0x70
      [   11.006371]  [<ffffffff843885dd>] schedule_hrtimeout_range_clock+0x12d/0x150
      [   11.006380]  [<ffffffff83e9d4b9>] ? ep_scan_ready_list.isra.7+0x1b9/0x1f0
      [   11.006385]  [<ffffffff84388613>] schedule_hrtimeout_range+0x13/0x20
      [   11.006389]  [<ffffffff83e9d74e>] ep_poll+0x23e/0x360
      [   11.006397]  [<ffffffff83e4c288>] ? do_truncate+0x88/0xc0
      [   11.006402]  [<ffffffff83e51498>] ? __sb_start_write+0x58/0x120
      [   11.006407]  [<ffffffff83cdadd0>] ? wake_up_state+0x20/0x20
      [   11.006412]  [<ffffffff83e9ec2d>] SyS_epoll_wait+0xed/0x120
      [   11.006420]  [<ffffffff84395f92>] system_call_fastpath+0x25/0x2a
      [   11.006423] ---[ end trace 32d18cdf2277bf4a ]---
      
  • If the system is unresponsive, try to capture a vmcore via NMI if the system is configured to crash and create a vmcore when an NMI is sent to the system.

  • Within the vmcore, the kernel ring buffer can be checked with log or dmesg. Check herein for logs similar to what is noted above.

  • Within the vmcore, check if any kworker threads are blocked for an extended period of time and blocking in efi_call

    crash> foreach UN ps -m
    [0 00:05:50.998] [UN]  PID: 2597   TASK: ffffa2bffa90b180  CPU: 58  COMMAND: "kworker/u672:4"
    
    • Below is the stack for the above kworker thread:

      efi_call+0x58
       efi_call_rts+0x2f9
        process_one_work+0x17f
         worker_thread+0x126
          kthread+0xd1
           ret_from_fork_nospec_begin+0x7
      

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.