RHEL 7crash in process_one_work() when dereferencing a bad pool_workqueue pointer, race in md driver

Solution Verified - Updated 2024-06-13T23:35:24+00:00 -

Issue

System crashes in process_one_work() when dereferencing a bad pool_workqueue pointer picked up from a linked list in the worker struct. The panic kernel stack trace:

 #0 [ffff9b56a67ebb78] panic at ffffffffa1572a10
 #1 [ffff9b56a67ebbf8] oops_end at ffffffffa1583795
 #2 [ffff9b56a67ebc20] no_context at ffffffffa0e74ad4
 #3 [ffff9b56a67ebc70] __bad_area_nosemaphore at ffffffffa0e74da2
 #4 [ffff9b56a67ebcc0] bad_area_nosemaphore at ffffffffa0e74ec4
 #5 [ffff9b56a67ebcd0] __do_page_fault at ffffffffa1586730
 #6 [ffff9b56a67ebd40] do_page_fault at ffffffffa1586955
 #7 [ffff9b56a67ebd70] page_fault at ffffffffa1582768
    [exception RIP: process_one_work+49]
    RIP: ffffffffa0ebcfb1  RSP: ffff9b56a67ebe28  RFLAGS: 00010046
    RAX: 0000000000000140  RBX: ffff9bb1a0b96c20  RCX: ffff9b56a67ebfd8
    RDX: 0000000000000100  RSI: ffff9bb1a0b96c20  RDI: ffff9b10060aaf80
    RBP: ffff9b56a67ebe60   R8: ffff9b0f617aaa80   R9: 0000000180190013
    R10: 00000000617a9101  R11: ffff9b0f617aaa80  R12: ffff9b10060aaf80
    R13: ffff9bb1a209a4c0  R14: 0000000000000000  R15: ffff9b10060aaf80
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   0xffffffffa0ebcf80:  nopl   0x0(%rax,%rax,1)
   0xffffffffa0ebcf85:  push   %rbp
   0xffffffffa0ebcf86:  mov    %rsp,%rbp
   0xffffffffa0ebcf89:  push   %r15
   0xffffffffa0ebcf8b:  push   %r14
   0xffffffffa0ebcf8d:  xor    %r14d,%r14d
   0xffffffffa0ebcf90:  push   %r13
   0xffffffffa0ebcf92:  push   %r12
   0xffffffffa0ebcf94:  mov    %rdi,%r12
   0xffffffffa0ebcf97:  push   %rbx
   0xffffffffa0ebcf98:  mov    %rsi,%rbx
   0xffffffffa0ebcf9b:  sub    $0x10,%rsp
   0xffffffffa0ebcf9f:  mov    (%rsi),%rax
   0xffffffffa0ebcfa2:  mov    0x48(%rdi),%r13
   0xffffffffa0ebcfa6:  mov    %rax,%rdx
   0xffffffffa0ebcfa9:  xor    %dl,%dl
   0xffffffffa0ebcfab:  test   $0x4,%al
   0xffffffffa0ebcfad:  cmovne %rdx,%r14
   0xffffffffa0ebcfb1:  mov    0x8(%r14),%rax          <-- exception hit here
   0xffffffffa0ebcfb5:  mov    0x100(%rax),%r15d
 #8 [ffff9b56a67ebe68] worker_thread at ffffffffa0ebe216
 #9 [ffff9b56a67ebec8] kthread at ffffffffa0ec50d1
#10 [ffff9b56a67ebf50] ret_from_fork_nospec_begin at ffffffffa158bd1d

The sequence of events leading to the crash in short succession is:

WARNING: CPU: <cpu_id> PID: <pid_id> at drivers/md/md.c:513 md_flush_request+0x1f4/0x200

followed by

WARNING: CPU: <cpu_id> PID: <pid_id> at lib/list_debug.c:33 __list_add+0xac/0xc0

and finally

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008

and panic / crash

Environment

Red Hat Enterprise Linux 7.7
Kernel 3.10.0-1062.el7.x86_64

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

RHEL 7crash in process_one_work() when dereferencing a bad pool_workqueue pointer, race in md driver

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links