System crash due to use-after-free in NVMe-over-TCP request handling

Solution Unverified - Updated -

Issue

  • System crash due to use-after-free in NVMe-over-TCP request handling. Example 1:
nvme nvme2: creating 32 I/O queues.
nvme nvme2: mapped 32/0/0 default/read/poll queues.
nvme nvme2: Successfully reconnected (1 attempt)
....
nvme nvme1: receive failed:  -22
nvme nvme1: starting error recovery
nvme_ns_head_submit_bio: 55 callbacks suppressed
block nvme0n16: no usable path - requeuing I/O
block nvme0n16: no usable path - requeuing I/O
....
general protection fault, probably for non-canonical address 0xffefe20244550000: 0000 [#1] PREEMPT SMP PTI
CPU: 15 PID: 48185 Comm: kworker/15:1H Kdump: loaded Tainted: P          IOE    --------- ---  5.14.0-70.30.1.el9_0.x86_64 #1
Hardware name: Dell Inc. R640 ScaleIO Ready Node/008R9M, BIOS 2.11.2 004/21/2021
Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
RIP: 0010:_copy_to_iter+0x4bc/0x710
....
Call Trace:
? sk_stream_alloc_skb+0x11d/0x210
 ? tcp_skb_entail+0x117/0x120
 ? __virt_addr_valid+0x45/0x70
 ? __check_object_size.part.0+0x45/0x140
block nvme0n16: no usable path - requeuing I/O
 __skb_datagram_iter+0x78/0x2d0
 ? do_tcp_sendpages+0x320/0x330
 ? zerocopy_sg_from_iter+0x50/0x50
 skb_copy_datagram_iter+0x33/0x90
 nvme_tcp_recv_data+0x16a/0x2a0 [nvme_tcp]
block nvme0n25: no usable path - requeuing I/O
block nvme0n26: no usable path - requeuing I/O
 nvme_tcp_recv_skb+0x8e/0x290 [nvme_tcp]
 ? nvme_tcp_recv_pdu+0x480/0x480 [nvme_tcp]
 tcp_read_sock+0x9e/0x1c0
 nvme_tcp_try_recv+0x68/0xa0 [nvme_tcp]
 ? __update_idle_core+0x1b/0xb0
 nvme_tcp_io_work+0x4d/0x90 [nvme_tcp]
 process_one_work+0x1e8/0x3c0
....
  • Example 2:
nvme nvme1: unsupported pdu type (3)
nvme nvme1: receive failed:  -22
nvme nvme1: starting error recovery
BUG: kernel NULL pointer dereference, address: 0000000000000188
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8000001d586d1067 P4D 8000001d586d1067 PUD 1d74678067 PMD 0 
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 5 PID: 1250 Comm: kworker/5:1H Kdump: loaded Tainted: P        W IOE    --------- ---  5.14.0-70.30.1.el9_0.x86_64 #1
Hardware name: Dell Inc. R640 ScaleIO Ready Node/008R9M, BIOS 2.11.2 004/21/2021
Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
RIP: 0010:nvme_tcp_recv_data+0x10c/0x2a0 [nvme_tcp]
....
Call Trace:
 nvme_tcp_recv_skb+0x8e/0x290 [nvme_tcp]
 ? nvme_tcp_recv_pdu+0x480/0x480 [nvme_tcp]
 tcp_read_sock+0x9e/0x1c0
 nvme_tcp_try_recv+0x68/0xa0 [nvme_tcp]
 ? __update_idle_core+0x1b/0xb0
 nvme_tcp_io_work+0x4d/0x90 [nvme_tcp]
 process_one_work+0x1e8/0x3c0
 worker_thread+0x50/0x3b0

Environment

  • Red Hat Enterprise Linux 9

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content