A fatal MCEs occur on multiple compute nodes during execution of ("REP; MOVS*") instructions in copy_page()

Solution Unverified - Updated -

Issue

  • A fatal MCEs occur on multiple compute nodes during execution of ("REP; MOVS*") instructions in copy_page()
core: [Hardware Error]: CPU 85: Machine Check Exception: f Bank 1: bd80000000100134
core: [Hardware Error]: RIP 10:<ffffffff92939aa7> {copy_page+0x7/0x10}
core: [Hardware Error]: TSC 60e4bb1a677e90 ADDR 5fd9886480 MISC 86 
core: [Hardware Error]: PROCESSOR 0:50657 TIME 1744199090 SOCKET 1 APIC 63 microcode 5003102
core: [Hardware Error]: Run the above through 'mcelog --ascii'
core: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
Kernel panic - not syncing: Fatal local machine check

     ......
 #0 [fffffe00010fcc60] machine_kexec at ffffffff9206156e
 #1 [fffffe00010fccb8] __crash_kexec at ffffffff9218f9ed
 #2 [fffffe00010fcd80] panic at ffffffff920e0df7
 #3 [fffffe00010fce10] mce_rdmsrl at ffffffff9203b6d3
 #4 [fffffe00010fce48] do_machine_check at ffffffff9203c95a
 #5 [fffffe00010fcf50] machine_check at ffffffff92a0112b
    [exception RIP: copy_page+7]
    RIP: ffffffff92939aa7  RSP: ffffa4481eb5b7b0  RFLAGS: 00010286
    RAX: 0000000000000000  RBX: ffff949dfe743d80  RCX: 0000000000000170
    RDX: 0000000000000086  RSI: ffff948b59886480  RDI: ffff9561ac486480
    RBP: fffff4503f660000   R8: 0000000000030688   R9: 0000000000030680
    R10: 0000000000000007  R11: 0000000000000000  R12: fffff45398b12180
    R13: fffff4503f662180  R14: fffff45398b10000  R15: 0000000000000200
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <MCE exception stack> ---
 #6 [ffffa4481eb5b7b0] copy_page at ffffffff92939aa7
     ......

crash> dis -lr ffffffff92939aa7
/usr/src/debug/kernel-4.18.0-305.12.1.el8_4/linux-4.18.0-305.12.1.el8_4.x86_64/arch/x86/lib/copy_page_64.S: 17
0xffffffff92939aa0 <copy_page>: xchg   %ax,%ax
/usr/src/debug/kernel-4.18.0-305.12.1.el8_4/linux-4.18.0-305.12.1.el8_4.x86_64/arch/x86/lib/copy_page_64.S: 18
0xffffffff92939aa2 <copy_page+2>:   mov    $0x200,%ecx
/usr/src/debug/kernel-4.18.0-305.12.1.el8_4/linux-4.18.0-305.12.1.el8_4.x86_64/arch/x86/lib/copy_page_64.S: 19
0xffffffff92939aa7 <copy_page+7>:   rep movsq %ds:(%rsi),%es:(%rdi) <<----- The trapping instruction

Environment

  • Red Hat Enterprise Linux 8
  • Red Hat Enterprise Linux 9 older than 9.5 GA - kernel-5.14.0-503.11.1.el9_5.x86_64
  • Intel CPUs (Skylake / Cascade Lake / Cooper Lake)
  • ("REP; MOVS*") instructions executed in copy_page()

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content