kernel panic in __xfrm_state_delete ().
Environment
- Red Hat Enterprise Linux 8.6
- Red Hat Enterprise Linux 9.
Issue
- kernel panic in __xfrm_state_delete () in RHEL8.6
[175887.177687] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[175887.181611] PGD 22ca6f067 P4D 0
[175887.183254] Oops: 0002 [#1] SMP NOPTI
[175887.185092] CPU: 1 PID: 3511882 Comm: pluto Kdump: loaded Tainted: P E --------- - - 4.18.0-372.26.1.el8_6.x86_64 #1
[175887.191499] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[175887.196258] RIP: 0010:__xfrm_state_delete+0xd2/0x1b0
[175887.198780] Code: 48 89 50 08 48 b8 00 02 00 00 00 00 ad de 8b 93 b4 00 00 00 48 89 43 20 85 d2 74 2b 48 8b 83 58 03 00 00 48 8b 93 60 03 00 00 <48> 89 02 48 85 c0 74 04 48 89 50 08 48 b8 00 02 00 00 00 00 a
d de
[175887.207763] RSP: 0018:ff691fda48ac3a78 EFLAGS: 00010206
[175887.210463] RAX: 0000000000000000 RBX: ff27f9309cbb8700 RCX: dead000000000200
[175887.214266] RDX: 0000000000000000 RSI: ff27f9309cbb87a0 RDI: ff27f9309cbb87a0
[175887.217737] RBP: ffffffff94b31100 R08: 0000000000000001 R09: 0000000000000002
[175887.221124] R10: 0000000000000002 R11: 0000000000000032 R12: ffffffff94b2f8c0
[175887.224600] R13: ffffffff93d00710 R14: ff691fda48ac3c60 R15: 0000000000000000
[175887.228067] FS: 00007fa6082e8740(0000) GS:ff27f9379fa40000(0000) knlGS:0000000000000000
[175887.231992] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[175887.234586] CR2: 0000000000000000 CR3: 000000017c9e4004 CR4: 0000000000371ee0
[175887.237564] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[175887.240626] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[175887.243851] Call Trace:
[175887.244928] xfrm_state_delete+0x1e/0x30
[175887.246597] xfrm_del_sa+0xb8/0x100
[175887.248161] xfrm_user_rcv_msg+0x133/0x1e0
[175887.249902] ? hist_field_none+0x10/0x10
[175887.251556] ? event_hist_trigger+0x347/0x410
[175887.253423] ? bpf_lsm_socket_shutdown+0x10/0x10
[175887.255384] ? copy_to_user_state_extra+0x3f0/0x3f0
[175887.257459] netlink_rcv_skb+0x4c/0x120
[175887.259120] xfrm_netlink_rcv+0x30/0x40
[175887.260735] netlink_unicast+0x196/0x230
[175887.262393] netlink_sendmsg+0x204/0x3d0
[175887.264040] sock_sendmsg+0x4c/0x50
[175887.265521] sock_write_iter+0x97/0x100
[175887.267157] new_sync_write+0x112/0x160
[175887.268796] vfs_write+0xa5/0x1a0
[175887.270462] ksys_write+0x4f/0xb0
Resolution
-
The issue is worked by following private Red Hat bugs
- RHEL8 - #2156048.
- RHEL9 - #2157579
-
Kindly open support ticket to get more information on the bug.
Root Cause
- The issue was introduced in RHEL8.6 by upstream commit
fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
. - This kernel panic is fixed upstream via following commit.
commit b97df039a68b2f3e848e238df5d5d06343ea497b
Author: Thomas Jarosch
Date: Wed Nov 2 11:18:48 2022 +0100
xfrm: Fix oops in __xfrm_state_delete()
Kernel 5.14 added a new "byseq" index to speed
up xfrm_state lookups by sequence number in commit
fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
While the patch was thorough, the function pfkey_send_new_mapping()
in net/af_key.c also modifies x->km.seq and never added
the current xfrm_state to the "byseq" index.
This leads to the following kernel Ooops:
BUG: kernel NULL pointer dereference, address: 0000000000000000
..
RIP: 0010:__xfrm_state_delete+0xc9/0x1c0
..
Call Trace:
<TASK>
xfrm_state_delete+0x1e/0x40
xfrm_del_sa+0xb0/0x110 [xfrm_user]
xfrm_user_rcv_msg+0x12d/0x270 [xfrm_user]
? remove_entity_load_avg+0x8a/0xa0
? copy_to_user_state_extra+0x580/0x580 [xfrm_user]
netlink_rcv_skb+0x51/0x100
xfrm_netlink_rcv+0x30/0x50 [xfrm_user]
netlink_unicast+0x1a6/0x270
netlink_sendmsg+0x22a/0x480
__sys_sendto+0x1a6/0x1c0
? __audit_syscall_entry+0xd8/0x130
? __audit_syscall_exit+0x249/0x2b0
__x64_sys_sendto+0x23/0x30
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x61/0xcb
Diagnostic Steps
- Following is snippet from vmcore of this issue.
crash> bt
PID: 3511882 TASK: ff27f9303a9b8000 CPU: 1 COMMAND: "pluto"
#0 [ff691fda48ac3730] machine_kexec at ffffffff92e6564e
#1 [ff691fda48ac3788] __crash_kexec at ffffffff92fa576d
#2 [ff691fda48ac3850] panic at ffffffff92eed98b
#3 [ff691fda48ac38d0] oops_end.cold.10 at ffffffff92e2660f
#4 [ff691fda48ac38f0] no_context at ffffffff92e763bf
#5 [ff691fda48ac3948] __bad_area_nosemaphore at ffffffff92e7671c
#6 [ff691fda48ac3990] do_page_fault at ffffffff92e76fb7
#7 [ff691fda48ac39c0] page_fault at ffffffff9380111e
[exception RIP: __xfrm_state_delete+210]
RIP: ffffffff936edbd2 RSP: ff691fda48ac3a78 RFLAGS: 00010206
RAX: 0000000000000000 RBX: ff27f9309cbb8700 RCX: dead000000000200
RDX: 0000000000000000 RSI: ff27f9309cbb87a0 RDI: ff27f9309cbb87a0
RBP: ffffffff94b31100 R8: 0000000000000001 R9: 0000000000000002
R10: 0000000000000002 R11: 0000000000000032 R12: ffffffff94b2f8c0
R13: ffffffff93d00710 R14: ff691fda48ac3c60 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ff691fda48ac3a90] xfrm_state_delete at ffffffff936edcce
#9 [ff691fda48ac3aa8] xfrm_del_sa at ffffffff936fb298
#10 [ff691fda48ac3af0] xfrm_user_rcv_msg at ffffffff936f8943
#11 [ff691fda48ac3c58] netlink_rcv_skb at ffffffff9365592c
#12 [ff691fda48ac3ca8] xfrm_netlink_rcv at ffffffff936f7370
#13 [ff691fda48ac3cc0] netlink_unicast at ffffffff936550f6
#14 [ff691fda48ac3d00] netlink_sendmsg at ffffffff93655394
#15 [ff691fda48ac3d70] sock_sendmsg at ffffffff935b853c
#16 [ff691fda48ac3d88] sock_write_iter at ffffffff935b85d7
#17 [ff691fda48ac3df8] new_sync_write at ffffffff9313d4f2
#18 [ff691fda48ac3e80] vfs_write at ffffffff93140be5
#19 [ff691fda48ac3eb0] ksys_write at ffffffff93140e5f
#20 [ff691fda48ac3ee8] unload_network_ops_symbols at ffffffffc0a48f78 [falcon_lsm_pinned_14306]
#21 [ff691fda48ac3f38] do_syscall_64 at ffffffff92e0430b
#22 [ff691fda48ac3f50] entry_SYSCALL_64_after_hwframe at ffffffff938000ad
RIP: 00007fa6069b2a07 RSP: 00007ffed0b0d5f0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fa6069b2a07
RDX: 0000000000000028 RSI: 00007ffed0b0dac0 RDI: 000000000000000d
RBP: 00007ffed0b0dac0 R8: 0000000000000000 R9: 00007ffed0b0dac0
R10: 00007ffed0b0dd77 R11: 0000000000000293 R12: 0000000000000028
R13: 00007ffed0b0dfd0 R14: 00007ffed0b0dfa0 R15: 0000559033bc4398
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash> sys|grep PANIC
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000000"
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.