RHEL 7crash in process_one_work() when dereferencing a bad pool_workqueue pointer, race in md driver
Issue
System crashes in process_one_work() when dereferencing a bad pool_workqueue pointer picked up from a linked list in the worker struct. The panic kernel stack trace:
#0 [ffff9b56a67ebb78] panic at ffffffffa1572a10
#1 [ffff9b56a67ebbf8] oops_end at ffffffffa1583795
#2 [ffff9b56a67ebc20] no_context at ffffffffa0e74ad4
#3 [ffff9b56a67ebc70] __bad_area_nosemaphore at ffffffffa0e74da2
#4 [ffff9b56a67ebcc0] bad_area_nosemaphore at ffffffffa0e74ec4
#5 [ffff9b56a67ebcd0] __do_page_fault at ffffffffa1586730
#6 [ffff9b56a67ebd40] do_page_fault at ffffffffa1586955
#7 [ffff9b56a67ebd70] page_fault at ffffffffa1582768
[exception RIP: process_one_work+49]
RIP: ffffffffa0ebcfb1 RSP: ffff9b56a67ebe28 RFLAGS: 00010046
RAX: 0000000000000140 RBX: ffff9bb1a0b96c20 RCX: ffff9b56a67ebfd8
RDX: 0000000000000100 RSI: ffff9bb1a0b96c20 RDI: ffff9b10060aaf80
RBP: ffff9b56a67ebe60 R8: ffff9b0f617aaa80 R9: 0000000180190013
R10: 00000000617a9101 R11: ffff9b0f617aaa80 R12: ffff9b10060aaf80
R13: ffff9bb1a209a4c0 R14: 0000000000000000 R15: ffff9b10060aaf80
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
0xffffffffa0ebcf80: nopl 0x0(%rax,%rax,1)
0xffffffffa0ebcf85: push %rbp
0xffffffffa0ebcf86: mov %rsp,%rbp
0xffffffffa0ebcf89: push %r15
0xffffffffa0ebcf8b: push %r14
0xffffffffa0ebcf8d: xor %r14d,%r14d
0xffffffffa0ebcf90: push %r13
0xffffffffa0ebcf92: push %r12
0xffffffffa0ebcf94: mov %rdi,%r12
0xffffffffa0ebcf97: push %rbx
0xffffffffa0ebcf98: mov %rsi,%rbx
0xffffffffa0ebcf9b: sub $0x10,%rsp
0xffffffffa0ebcf9f: mov (%rsi),%rax
0xffffffffa0ebcfa2: mov 0x48(%rdi),%r13
0xffffffffa0ebcfa6: mov %rax,%rdx
0xffffffffa0ebcfa9: xor %dl,%dl
0xffffffffa0ebcfab: test $0x4,%al
0xffffffffa0ebcfad: cmovne %rdx,%r14
0xffffffffa0ebcfb1: mov 0x8(%r14),%rax <-- exception hit here
0xffffffffa0ebcfb5: mov 0x100(%rax),%r15d
#8 [ffff9b56a67ebe68] worker_thread at ffffffffa0ebe216
#9 [ffff9b56a67ebec8] kthread at ffffffffa0ec50d1
#10 [ffff9b56a67ebf50] ret_from_fork_nospec_begin at ffffffffa158bd1d
The sequence of events leading to the crash in short succession is:
WARNING: CPU: <cpu_id> PID: <pid_id> at drivers/md/md.c:513 md_flush_request+0x1f4/0x200
followed by
WARNING: CPU: <cpu_id> PID: <pid_id> at lib/list_debug.c:33 __list_add+0xac/0xc0
and finally
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
and panic / crash
Environment
- Red Hat Enterprise Linux 7.7
- Kernel 3.10.0-1062.el7.x86_64
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.