The kernel booting with 4th Gen AMD EPYC™ Processors crashes due to a NULL pointer dereference

Solution Verified - Updated -

Issue

  • The kernel booting with 4th Gen AMD EPYC™ Processors crashes due to a NULL pointer dereference.
[    0.067131] Freeing SMP alternatives memory: 36K 
[    0.171240] smpboot: CPU0: AMD EPYC 9754 128-Core Processor (family: 0x19, model: 0xa0, stepping: 0x2) 
[    0.172262] Performance Events: Fam17h+ 16-deep LBR, core perfctr, AMD PMU driver. 
[    0.173001] ... version:                2 
[    0.174000] ... bit width:              48 
[    0.175000] ... generic registers:      6 
[    0.176000] ... value mask:             0000ffffffffffff 
[    0.177000] ... max period:             00007fffffffffff 
[    0.178003] ... fixed-purpose events:   0 
[    0.179000] ... event mask:             000000000000003f 
[    0.180092] rcu: Hierarchical SRCU implementation. 
[    0.183958] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. 
[    0.185017] smp: Bringing up secondary CPUs ... 
[    0.186099] x86: Booting SMP configuration: 
[    0.187002] .... node  #0, CPUs:          #1   #2   #3   #4   #5   #6   #7   #8   #9  #10  #11  #12  #13  #14  #15  #16  #17  #18  #19  #20  #21  #22  #23  #24  #25  #26  #27  #28  #29  #30  #31 
[    0.238002] .... node  #1, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39  #40  #41  #42  #43  #44  #45  #46  #47  #48  #49  #50  #51  #52  #53  #54  #55  #56  #57  #58  #59  #60  #61  #62  #63 
[    0.287002] .... node  #2, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71  #72  #73  #74  #75  #76  #77  #78  #79  #80  #81  #82  #83  #84  #85  #86  #87  #88  #89  #90  #91  #92  #93  #94  #95 
[    0.336002] .... node  #3, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111 #112 #113 #114 #115 #116 #117 #118 #119 #120 #121 #122 #123 #124 #125 #126 #127 
[    0.386002] .... node  #0, CPUs:   #128 
[   10.386003] smpboot: do_boot_cpu failed(-1) to wakeup CPU#128 
[   10.388012] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 
[   10.389000] PGD 0  
[   10.389000] Oops: 0002 1 SMP NOPTI 
[   10.389000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-513.1.1.el8_9.x86_64 #1 
[   10.389000] Hardware name: ASUSTeK COMPUTER INC. RS500A-E12-RS12U VR23005466/K14PA-U24 Series, BIOS 1101 07/18/2023 
[   10.389000] RIP: 0010:x2apic_dead_cpu+0x1a/0x3f 
[   10.389000] Code: 5b d9 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 89 ff 48 c7 c0 68 e0 01 00 48 8b 14 fd 40 d8 fb 9b 48 8b 04 02 <f0> 48 0f b3 78 08 48 8b 14 fd 40 d8 fb 9b 48 c7 c0 70 e0 01 00 48 
[   10.389000] RSP: 0018:ff7a5db5000dbdc8 EFLAGS: 00010286 
[   10.389000] RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 
[   10.389000] RDX: ff458999cf800000 RSI: 0000000000000027 RDI: 0000000000000080 
[   10.389000] RBP: ff458999cf81e320 R08: 0000000000000000 R09: 0000000000000004 
[   10.389000] R10: 0000000000000008 R11: ff7a5db5000dbb78 R12: 0000000000000000 
[   10.389000] R13: ffffffff9ac6b8f0 R14: 0000000000000000 R15: 0000000000000055 
[   10.389000] FS:  0000000000000000(0000) GS:ff458999cf000000(0000) knlGS:0000000000000000 
[   10.389000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[   10.389000] CR2: 0000000000000018 CR3: 0000001bf8610001 CR4: 0000000000771ef0 
[   10.389000] PKRU: 55555554 
[   10.389000] Call Trace: 
[   10.389000]  ? __die_body+0x1a/0x60 
[   10.389000]  ? no_context+0x1ba/0x3f0 
[   10.389000]  ? __bad_area_nosemaphore+0x16c/0x1c0 
[   10.389000]  ? vprintk_emit+0x125/0x250 
[   10.389000]  ? do_page_fault+0x37/0x130 
[   10.389000]  ? page_fault+0x1e/0x30 
[   10.389000]  ? native_x2apic_icr_write+0x30/0x30 
[   10.389000]  ? x2apic_dead_cpu+0x1a/0x3f 
[   10.389000]  cpuhp_invoke_callback+0x8e/0x510 
[   10.389000]  _cpu_up+0x178/0x1b0 
[   10.389000]  ? do_early_param+0x95/0x95 
[   10.389000]  do_cpu_up+0x7f/0xd0 
[   10.389000]  smp_init+0x5c/0xb6 
[   10.389000]  kernel_init_freeable+0x117/0x232 
[   10.389000]  ? rest_init+0xaa/0xaa 
[   10.389000]  kernel_init+0xa/0xff 
[   10.389000]  ret_from_fork+0x1f/0x40 
[   10.389000] Modules linked in: 
[   10.389000] CR2: 0000000000000018 
[   10.389000] --[ end trace ba91f860d0b04f3b ]-- 
[   10.389000] RIP: 0010:x2apic_dead_cpu+0x1a/0x3f 
[   10.389000] Code: 5b d9 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 89 ff 48 c7 c0 68 e0 01 00 48 8b 14 fd 40 d8 fb 9b 48 8b 04 02 <f0> 48 0f b3 78 08 48 8b 14 fd 40 d8 fb 9b 48 c7 c0 70 e0 01 00 48 
[   10.389000] RSP: 0018:ff7a5db5000dbdc8 EFLAGS: 00010286 
[   10.389000] RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000 
[   10.389000] RDX: ff458999cf800000 RSI: 0000000000000027 RDI: 0000000000000080 
[   10.389000] RBP: ff458999cf81e320 R08: 0000000000000000 R09: 0000000000000004 
[   10.389000] R10: 0000000000000008 R11: ff7a5db5000dbb78 R12: 0000000000000000 
[   10.389000] R13: ffffffff9ac6b8f0 R14: 0000000000000000 R15: 0000000000000055 
[   10.389000] FS:  0000000000000000(0000) GS:ff458999cf000000(0000) knlGS:0000000000000000 
[   10.389000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[   10.389000] CR2: 0000000000000018 CR3: 0000001bf8610001 CR4: 0000000000771ef0 
[   10.389000] PKRU: 55555554 
[   10.389000] Kernel panic - not syncing: Fatal exception 
[   10.389000] --[ end Kernel panic - not syncing: Fatal exception ]-- 
[   10.323003] smpboot: do_boot_cpu failed(-1) to wakeup CPU#96^M
[   10.325012] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010^M
[   10.326000] PGD 0 ^M
[   10.326000] Oops: 0002 [#1] SMP NOPTI^M
[   10.326000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-372.87.1.el8_6.x86_64 #1^M
[   10.326000] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 Gen11, BIOS 1.32 05/29/2023^M
[   10.326000] RIP: 0010:x2apic_dead_cpu+0x1a/0x3f^M
[   10.326000] Code: b2 b9 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 89 ff 48 c7 c0 68 61 01 00 48 8b 14 fd 20 48 5a 87 48 8b 04 02 <f0> 48 0f b3 78 08 48 8b 14 fd 20 48 5a 87 48 c7 c0 70 61      01 00 48^M
[   10.326000] RSP: 0018:ff76962a40073dc8 EFLAGS: 00010282^M
[   10.326000] RAX: 0000000000000000 RBX: 0000000000000060 RCX: 0000000000000000^M
[   10.326000] RDX: ff1f94607ba00000 RSI: 0000000000000027 RDI: 0000000000000060^M
[   10.326000] RBP: ff1f94607ba16420 R08: 0000000000000000 R09: 0000000000000003^M
[   10.326000] R10: 0000000000000008 R11: ff76962a40073b70 R12: 0000000000000000^M
[   10.326000] R13: ffffffff86466170 R14: 0000000000000000 R15: 0000000000000055^M
[   10.326000] FS:  0000000000000000(0000) GS:ff1f94607a200000(0000) knlGS:0000000000000000^M
[   10.326000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[   10.326000] CR2: 0000000000000010 CR3: 00000075df010001 CR4: 0000000000771ef0^M
[   10.326000] PKRU: 55555554^M
[   10.326000] Call Trace:^M
[   10.326000]  cpuhp_invoke_callback+0x8d/0x500^M
[   10.326000]  _cpu_up+0x178/0x1b0^M
[   10.326000]  ? do_early_param+0x95/0x95^M
[   10.326000]  do_cpu_up+0x7f/0xd0^M
[   10.326000]  smp_init+0x5c/0xb6^M
[   10.326000]  kernel_init_freeable+0x117/0x22d^M
[   10.326000]  ? rest_init+0xaa/0xaa^M
[   10.326000]  kernel_init+0xa/0x100^M
[   10.326000]  ret_from_fork+0x35/0x40^M
[   10.326000] Modules linked in:^M
[   10.326000] CR2: 0000000000000010^M
[   10.326000] ---[ end trace 0008c0a32e72c9d3 ]---^M

Environment

  • Red Hat Enterprise Linux 8
  • Red Hat Enterprise Linux CoreOS (RHCOS) shipped with RHOCP 4.12 are also affected
  • 4th Gen AMD EPYC™ Processors with the following microcode updates for CVE-2023-20569 (Return Address Predictor velunerability)
    • Genoa B1 (Family=0x19 Model=0x11 Stepping=0x01) with microcode update 0x0A10113E
    • Genoa-X B2 (Family=0x19 Model=0x11 Stepping=0x02) with microcode update 0x0A10123E
    • Bergamo A2 (Family=0x19 Model=0xa0 Stepping=0x02) with microcode update 0x0AA00212

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content