RHEL 6 fails to boot with a microcode update

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 6
  • Intel
  • Microcode matching the following:

    microcode: CPU0 sig=0x206f2, pf=0x4, revision=0x3b
    platform microcode: firmware: requesting intel-ucode/06-2f-02
    

Issue

  • After a CPU microcode update the system experiences a kernel panic early in the boot process and fails to boot properly.

Resolution

  • Please partner with the hardware vendor to determine if there are any firmware or BIOS upgrades available that install a newer microcode version.

  • Please ensure that the microcode version matches on each CPU and the log does not show that the microcode upgrade failed for one or multiple CPUs.

Root Cause

  • The wrmsr instruction can set a bit that may not be implemented on this microcode version, or if some logical CPUs have mismatched microcode version.

    371:#define FEATURE_SET_IBPB                            (1<<0)
    
  • The Intel manual lists the following reasons for the GPF in protected mode:

    * If the current privilege level is not 0.
    * If the value in ECX specifies a reserved or unimplemented MSR address.
    * If the value in EDX:EAX sets bits that are reserved in the MSR specified by ECX.
    * If the source register contains a non-canonical address and ECX specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE, IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP.
    
    • Disclaimer Please note the link provided is to an external site that is not maintained or verified by Red Hat and is provided for convenience only.

Diagnostic Steps

  • Several vmcores have shown that a panic occurs when executing a wrmsr instruction.

  • Example:

    crash> log
    …
    RIP  [<ffffffff811ac118>] flush_old_exec+0x448/0x800
    …
    
    crash> dis -rl flush_old_exec+0x448 | tail
    0xffffffff811ac0f8 <flush_old_exec+0x428>:    cmp    $0x1,%edx
    0xffffffff811ac0fb <flush_old_exec+0x42b>:    jbe    0xffffffff811ac10c <flush_old_exec+0x43c>
    0xffffffff811ac0fd <flush_old_exec+0x42d>:    movabs $0x8000000000000,%rdx
    0xffffffff811ac107 <flush_old_exec+0x437>:    test   %rdx,%rax
    0xffffffff811ac10a <flush_old_exec+0x43a>:    je     0xffffffff811ac128 <flush_old_exec+0x458>
    /usr/src/debug/kernel-2.6.32-754.36.1.el6/linux-2.6.32-754.36.1.el6.x86_64/arch/x86/include/asm/msr.h: 95
    0xffffffff811ac10c <flush_old_exec+0x43c>:    xor    %edx,%edx
    0xffffffff811ac10e <flush_old_exec+0x43e>:    mov    $0x1,%eax    
    0xffffffff811ac113 <flush_old_exec+0x443>:    mov    $0x49,%ecx    
    0xffffffff811ac118 <flush_old_exec+0x448>:    wrmsr            <---- 
    
  • Example:

    crash> log
    …
    general protection fault: 0000 [#1] SMP
    …
    CPU 28
    …
    Pid: 0, comm: swapper Tainted: P           -- ------------    2.6.32-696.28.1.el6.x86_64 #1 Dell Inc. PowerEdge R910/0KYD3D
    RIP: 0010:[<ffffffff815531fa>]  [<ffffffff815531fa>] schedule+0x62a/0xd10
    RSP: 0018:ffff884026c3fe28  EFLAGS: 00010046
    RAX: 0000000000000001 RBX: ffff8801a1bd8c00 RCX: 0000000000000049
    …
    Process swapper (pid: 0, threadinfo ffff884026c3c000, task ffff884026c33520)
    …
    Call Trace:
     [<ffffffff8100a001>] cpu_idle+0xf1/0x110
     [<ffffffff8154c4a3>] start_secondary+0x30f/0x365
    …
    
    crash> dis -l schedule+0x62a
    /usr/src/debug/kernel-2.6.32-696.28.1.el6/linux-2.6.32-696.28.1.el6.x86_64/arch/x86/include/asm/msr.h: 95
    0xffffffff815531fa <schedule+0x62a>:    wrmsr      <----
    
  • Example:

    [  199.416301] iTCO_vendor_support: vendor-support=0
    [  199.456964] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11rh
    [  199.496356] iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware/BIOS
    [  199.572207] microcode: CPU0 sig=0x206f2, pf=0x4, revision=0x3b
    [  199.612430] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  199.672553] microcode: CPU1 sig=0x206f2, pf=0x4, revision=0x39
    [  199.713226] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  199.767564] microcode: CPU2 sig=0x206f2, pf=0x4, revision=0x39
    [  199.808502] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  199.858188] microcode: CPU3 sig=0x206f2, pf=0x4, revision=0x39
    [  199.900492] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  199.951269] microcode: CPU4 sig=0x206f2, pf=0x4, revision=0x39
    [  199.993118] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  200.044474] microcode: CPU5 sig=0x206f2, pf=0x4, revision=0x39
    [  200.086567] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  200.139839] microcode: CPU6 sig=0x206f2, pf=0x4, revision=0x39
    [  200.182645] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  200.235125] microcode: CPU7 sig=0x206f2, pf=0x4, revision=0x39
    [  200.279348] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  200.330538] microcode: CPU8 sig=0x206f2, pf=0x4, revision=0x3b
    [  200.371983] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  200.423442] microcode: CPU9 sig=0x206f2, pf=0x4, revision=0x39
    [  200.464618] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  200.517251] microcode: CPU10 sig=0x206f2, pf=0x4, revision=0x39
    [  200.560472] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  200.615558] microcode: CPU11 sig=0x206f2, pf=0x4, revision=0x39
    [  200.656545] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  200.711065] microcode: CPU12 sig=0x206f2, pf=0x4, revision=0x39
    [  200.753104] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  200.804902] microcode: CPU13 sig=0x206f2, pf=0x4, revision=0x39
    [  200.849307] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  200.903818] microcode: CPU14 sig=0x206f2, pf=0x4, revision=0x39
    [  200.949790] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.004475] microcode: CPU15 sig=0x206f2, pf=0x4, revision=0x39
    [  201.057103] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.110344] microcode: CPU16 sig=0x206f2, pf=0x4, revision=0x3b
    [  201.154652] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.208535] microcode: CPU17 sig=0x206f2, pf=0x4, revision=0x39
    [  201.253043] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.305976] microcode: CPU18 sig=0x206f2, pf=0x4, revision=0x39
    [  201.348138] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.401162] microcode: CPU19 sig=0x206f2, pf=0x4, revision=0x39
    [  201.444918] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.497501] microcode: CPU20 sig=0x206f2, pf=0x4, revision=0x39
    [  201.541335] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.595048] microcode: CPU21 sig=0x206f2, pf=0x4, revision=0x39
    [  201.639749] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.692817] microcode: CPU22 sig=0x206f2, pf=0x4, revision=0x39
    [  201.735735] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.790203] microcode: CPU23 sig=0x206f2, pf=0x4, revision=0x39
    [  201.833558] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.892432] microcode: CPU24 sig=0x206f2, pf=0x4, revision=0x3b
    [  201.938387] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  201.993439] microcode: CPU25 sig=0x206f2, pf=0x4, revision=0x39
    [  202.037977] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  202.094508] microcode: CPU26 sig=0x206f2, pf=0x4, revision=0x39
    [  202.139837] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  202.191346] microcode: CPU27 sig=0x206f2, pf=0x4, revision=0x39
    [  202.235116] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  202.288199] microcode: CPU28 sig=0x206f2, pf=0x4, revision=0x39
    [  202.332740] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  202.384703] microcode: CPU29 sig=0x206f2, pf=0x4, revision=0x39
    [  202.429260] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  202.483504] microcode: CPU30 sig=0x206f2, pf=0x4, revision=0x39
    [  202.527395] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  202.579727] microcode: CPU31 sig=0x206f2, pf=0x4, revision=0x39
    [  202.624786] platform microcode: firmware: requesting intel-ucode/06-2f-02
    [  202.676760] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
    [  202.677556] general protection fault: 0000 [#1] SMP
    [  202.677559] last sysfs file: /sys/devices/platform/microcode/firmware/microcode/loading
    [  202.677561] CPU 26
    [  202.677563] Modules linked in: microcode(+) iTCO_wdt iTCO_vendor_support serio_raw osst st power_meter acpi_ipmi ipmi_si ipmi_msghandler joydev sg hpilo hpwdt lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif be2net qla2xxx(U) scsi_transport_fc scsi_tgt hpsa(U) radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
    [  202.677586]
    [  202.677589] Pid: 15690, comm: firmware.sh Not tainted 2.6.32-754.36.1.el6.x86_64 #1 HP ProLiant BL680c G7
    [  202.677592] RIP: 0010:[<ffffffff811ac118>]  [<ffffffff811ac118>] flush_old_exec+0x448/0x800
    [  202.677603] RSP: 0018:ffff88400d5f3bf8  EFLAGS: 00010246
    [  202.677605] RAX: 0000000000000001 RBX: ffff88200e224cc0 RCX: 0000000000000049
    [  202.677607] RDX: 0000000000000000 RSI: ffff88400e5059c0 RDI: ffff88400ea93300
    [  202.677609] RBP: ffff88400d5f3c68 R08: 0000000000000000 R09: 0000000000000000
    [  202.677611] R10: ffff88400d5f3fd8 R11: ffffffffa0486808 R12: ffff88400e5059c0
    [  202.677613] R13: ffff88400ea93300 R14: ffff88400e9a6040 R15: ffff88400ea93300
    [  202.677616] FS:  0000000000000000(0000) GS:ffff8820b0e80000(0000) knlGS:0000000000000000
    [  202.677618] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  202.677620] CR2: 00007f4e312240a0 CR3: 0000004008eec000 CR4: 00000000000207e0
    [  202.677623] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  202.677626] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    [  202.677630] Process firmware.sh (pid: 15690, threadinfo ffff88400d5f0000, task ffff88400e9a6040)
    [  202.677632] Stack:
    [  202.677634]  ffff88400d5f3c38 ffffffff811a34c1 ffff88400d5f3c38 0000000000000080
    [  202.677637] <d> ffff88400e9a6838 ffff88400000001a 00007ffffffff000 000000000e589ac0
    [  202.677641] <d> ffff88400d5f3c68 ffff88200e224cc0 ffff88400eb02240 0000000000000080
    [  202.677645] Call Trace:
    [  202.677653]  [<ffffffff811a34c1>] ? vfs_read+0x131/0x1a0
    [  202.677662]  [<ffffffff8120027f>] load_elf_binary+0x34f/0x1c50
    [  202.677668]  [<ffffffff8115c415>] ? follow_page+0x545/0x670
    [  202.677671]  [<ffffffff8115c415>] ? follow_page+0x545/0x670
    [  202.677674]  [<ffffffff81161f70>] ? __get_user_pages+0x110/0x450
    [  202.677678]  [<ffffffff811fd7de>] ? load_misc_binary+0x9e/0x3f0
    [  202.677681]  [<ffffffff81162349>] ? get_user_pages+0x49/0x50
    [  202.677684]  [<ffffffff811aba9b>] search_binary_handler+0x13b/0x370
    [  202.677687]  [<ffffffff811ac807>] do_execve+0x217/0x2c0
    [  202.677696]  [<ffffffff8100968a>] sys_execve+0x4a/0x80
    [  202.677704]  [<ffffffff815669e9>] stub_execve+0x49/0xa0
    [  202.677706] Code: 74 39 8b 15 03 01 a7 00 83 ea 01 83 fa 01 76 0f 48 ba 00 00 00 00 00 00 08 00 48 85 d0 74 1c 31 d2 b8 01 00 00 00 b9 49 00 00 00 <0f> 30 b0 00 84 c0 0f 85 52 01 00 00 0f 1f 40 00 49 8b 7c 24 50
    [  202.677726] RIP  [<ffffffff811ac118>] flush_old_exec+0x448/0x800
    [  202.677730]  RSP <ffff88400d5f3bf8>
    [  202.677734] ---[ end trace a9a394b506571c00 ]---
    [  202.677737] general protection fault: 0000 [#2] SMP
    [  202.677739] Kernel panic - not syncing: Fatal exception
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments