The kernel booting with 4th Gen AMD EPYC™ Processors crashes due to a NULL pointer dereference
Issue
- The kernel booting with 4th Gen AMD EPYC™ Processors crashes due to a NULL pointer dereference.
[ 0.067131] Freeing SMP alternatives memory: 36K
[ 0.171240] smpboot: CPU0: AMD EPYC 9754 128-Core Processor (family: 0x19, model: 0xa0, stepping: 0x2)
[ 0.172262] Performance Events: Fam17h+ 16-deep LBR, core perfctr, AMD PMU driver.
[ 0.173001] ... version: 2
[ 0.174000] ... bit width: 48
[ 0.175000] ... generic registers: 6
[ 0.176000] ... value mask: 0000ffffffffffff
[ 0.177000] ... max period: 00007fffffffffff
[ 0.178003] ... fixed-purpose events: 0
[ 0.179000] ... event mask: 000000000000003f
[ 0.180092] rcu: Hierarchical SRCU implementation.
[ 0.183958] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[ 0.185017] smp: Bringing up secondary CPUs ...
[ 0.186099] x86: Booting SMP configuration:
[ 0.187002] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31
[ 0.238002] .... node #1, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63
[ 0.287002] .... node #2, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95
[ 0.336002] .... node #3, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111 #112 #113 #114 #115 #116 #117 #118 #119 #120 #121 #122 #123 #124 #125 #126 #127
[ 0.386002] .... node #0, CPUs: #128
[ 10.386003] smpboot: do_boot_cpu failed(-1) to wakeup CPU#128
[ 10.388012] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 10.389000] PGD 0
[ 10.389000] Oops: 0002 1 SMP NOPTI
[ 10.389000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-513.1.1.el8_9.x86_64 #1
[ 10.389000] Hardware name: ASUSTeK COMPUTER INC. RS500A-E12-RS12U VR23005466/K14PA-U24 Series, BIOS 1101 07/18/2023
[ 10.389000] RIP: 0010:x2apic_dead_cpu+0x1a/0x3f
[ 10.389000] Code: 5b d9 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 89 ff 48 c7 c0 68 e0 01 00 48 8b 14 fd 40 d8 fb 9b 48 8b 04 02 <f0> 48 0f b3 78 08 48 8b 14 fd 40 d8 fb 9b 48 c7 c0 70 e0 01 00 48
[ 10.389000] RSP: 0018:ff7a5db5000dbdc8 EFLAGS: 00010286
[ 10.389000] RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
[ 10.389000] RDX: ff458999cf800000 RSI: 0000000000000027 RDI: 0000000000000080
[ 10.389000] RBP: ff458999cf81e320 R08: 0000000000000000 R09: 0000000000000004
[ 10.389000] R10: 0000000000000008 R11: ff7a5db5000dbb78 R12: 0000000000000000
[ 10.389000] R13: ffffffff9ac6b8f0 R14: 0000000000000000 R15: 0000000000000055
[ 10.389000] FS: 0000000000000000(0000) GS:ff458999cf000000(0000) knlGS:0000000000000000
[ 10.389000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 10.389000] CR2: 0000000000000018 CR3: 0000001bf8610001 CR4: 0000000000771ef0
[ 10.389000] PKRU: 55555554
[ 10.389000] Call Trace:
[ 10.389000] ? __die_body+0x1a/0x60
[ 10.389000] ? no_context+0x1ba/0x3f0
[ 10.389000] ? __bad_area_nosemaphore+0x16c/0x1c0
[ 10.389000] ? vprintk_emit+0x125/0x250
[ 10.389000] ? do_page_fault+0x37/0x130
[ 10.389000] ? page_fault+0x1e/0x30
[ 10.389000] ? native_x2apic_icr_write+0x30/0x30
[ 10.389000] ? x2apic_dead_cpu+0x1a/0x3f
[ 10.389000] cpuhp_invoke_callback+0x8e/0x510
[ 10.389000] _cpu_up+0x178/0x1b0
[ 10.389000] ? do_early_param+0x95/0x95
[ 10.389000] do_cpu_up+0x7f/0xd0
[ 10.389000] smp_init+0x5c/0xb6
[ 10.389000] kernel_init_freeable+0x117/0x232
[ 10.389000] ? rest_init+0xaa/0xaa
[ 10.389000] kernel_init+0xa/0xff
[ 10.389000] ret_from_fork+0x1f/0x40
[ 10.389000] Modules linked in:
[ 10.389000] CR2: 0000000000000018
[ 10.389000] --[ end trace ba91f860d0b04f3b ]--
[ 10.389000] RIP: 0010:x2apic_dead_cpu+0x1a/0x3f
[ 10.389000] Code: 5b d9 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 89 ff 48 c7 c0 68 e0 01 00 48 8b 14 fd 40 d8 fb 9b 48 8b 04 02 <f0> 48 0f b3 78 08 48 8b 14 fd 40 d8 fb 9b 48 c7 c0 70 e0 01 00 48
[ 10.389000] RSP: 0018:ff7a5db5000dbdc8 EFLAGS: 00010286
[ 10.389000] RAX: 0000000000000000 RBX: 0000000000000080 RCX: 0000000000000000
[ 10.389000] RDX: ff458999cf800000 RSI: 0000000000000027 RDI: 0000000000000080
[ 10.389000] RBP: ff458999cf81e320 R08: 0000000000000000 R09: 0000000000000004
[ 10.389000] R10: 0000000000000008 R11: ff7a5db5000dbb78 R12: 0000000000000000
[ 10.389000] R13: ffffffff9ac6b8f0 R14: 0000000000000000 R15: 0000000000000055
[ 10.389000] FS: 0000000000000000(0000) GS:ff458999cf000000(0000) knlGS:0000000000000000
[ 10.389000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 10.389000] CR2: 0000000000000018 CR3: 0000001bf8610001 CR4: 0000000000771ef0
[ 10.389000] PKRU: 55555554
[ 10.389000] Kernel panic - not syncing: Fatal exception
[ 10.389000] --[ end Kernel panic - not syncing: Fatal exception ]--
[ 10.323003] smpboot: do_boot_cpu failed(-1) to wakeup CPU#96^M
[ 10.325012] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010^M
[ 10.326000] PGD 0 ^M
[ 10.326000] Oops: 0002 [#1] SMP NOPTI^M
[ 10.326000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-372.87.1.el8_6.x86_64 #1^M
[ 10.326000] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 Gen11, BIOS 1.32 05/29/2023^M
[ 10.326000] RIP: 0010:x2apic_dead_cpu+0x1a/0x3f^M
[ 10.326000] Code: b2 b9 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 89 ff 48 c7 c0 68 61 01 00 48 8b 14 fd 20 48 5a 87 48 8b 04 02 <f0> 48 0f b3 78 08 48 8b 14 fd 20 48 5a 87 48 c7 c0 70 61 01 00 48^M
[ 10.326000] RSP: 0018:ff76962a40073dc8 EFLAGS: 00010282^M
[ 10.326000] RAX: 0000000000000000 RBX: 0000000000000060 RCX: 0000000000000000^M
[ 10.326000] RDX: ff1f94607ba00000 RSI: 0000000000000027 RDI: 0000000000000060^M
[ 10.326000] RBP: ff1f94607ba16420 R08: 0000000000000000 R09: 0000000000000003^M
[ 10.326000] R10: 0000000000000008 R11: ff76962a40073b70 R12: 0000000000000000^M
[ 10.326000] R13: ffffffff86466170 R14: 0000000000000000 R15: 0000000000000055^M
[ 10.326000] FS: 0000000000000000(0000) GS:ff1f94607a200000(0000) knlGS:0000000000000000^M
[ 10.326000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
[ 10.326000] CR2: 0000000000000010 CR3: 00000075df010001 CR4: 0000000000771ef0^M
[ 10.326000] PKRU: 55555554^M
[ 10.326000] Call Trace:^M
[ 10.326000] cpuhp_invoke_callback+0x8d/0x500^M
[ 10.326000] _cpu_up+0x178/0x1b0^M
[ 10.326000] ? do_early_param+0x95/0x95^M
[ 10.326000] do_cpu_up+0x7f/0xd0^M
[ 10.326000] smp_init+0x5c/0xb6^M
[ 10.326000] kernel_init_freeable+0x117/0x22d^M
[ 10.326000] ? rest_init+0xaa/0xaa^M
[ 10.326000] kernel_init+0xa/0x100^M
[ 10.326000] ret_from_fork+0x35/0x40^M
[ 10.326000] Modules linked in:^M
[ 10.326000] CR2: 0000000000000010^M
[ 10.326000] ---[ end trace 0008c0a32e72c9d3 ]---^M
Environment
- Red Hat Enterprise Linux 8
- Red Hat Enterprise Linux CoreOS (RHCOS) shipped with RHOCP 4.12 are also affected
- 4th Gen AMD EPYC™ Processors with the following microcode updates for CVE-2023-20569 (Return Address Predictor velunerability)
- Genoa B1 (Family=0x19 Model=0x11 Stepping=0x01) with microcode update 0x0A10113E
- Genoa-X B2 (Family=0x19 Model=0x11 Stepping=0x02) with microcode update 0x0A10123E
- Bergamo A2 (Family=0x19 Model=0xa0 Stepping=0x02) with microcode update 0x0AA00212
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.