Kernel panic at __mutex_lock+190 due to unsigned module "wanec".
Environment
- Red Hat Enterprise Linux 8
wanec
Issue
- Kernel panic with
BUG: unable to handle kernel paging request at 00000a20a8570038 - Kernel panic at
__mutex_lock+190which is being called bywanec_event_ctrl()function of 3rd party modulewanec.
[5079239.202237] BUG: unable to handle kernel paging request at 0000a3a17c098038
[5079239.202605] PGD 2b73d9b067 P4D 0
[5079239.202841] Oops: 0000 [#1] SMP NOPTI
[5079239.203079] CPU: 58 PID: 3445653 Comm: asterisk Kdump: loaded Tainted: P OE --------- - - 4.18.0-477.10.1.el8_8.x86_64 #1
[5079239.203626] Hardware name: Lenovo ThinkSystem SR650 V2/7Z73CTO1WW, BIOS AFE122G-1.50 02/07/2023
[5079239.203943] RIP: 0010:__mutex_lock.isra.7+0xbe/0x420
..
[5079239.209997] Call Trace:
[5079239.210457] ? page_counter_uncharge+0x1d/0x40
[5079239.210924] wanec_event_ctrl+0x55/0x1e0 [wanec]
[5079239.211371] wp_tdmv_ioctl+0x174/0x270 [wanpipe]
[5079239.211720] dahdi_chanandpseudo_ioctl+0x59e/0x16a0 [dahdi]
[5079239.212077] ? reuse_swap_page+0x50/0x180
[5079239.212406] ? wp_page_reuse+0x4d/0x60
[5079239.212737] ? do_wp_page+0x247/0x350
[5079239.213072] dahdi_chan_ioctl+0x141/0xc10 [dahdi]
[5079239.213419] dahdi_unlocked_ioctl+0x50/0x110 [dahdi]
[5079239.213773] do_vfs_ioctl+0xa4/0x690
[5079239.214118] ? syscall_trace_enter+0x1ff/0x2d0
[5079239.214475] ksys_ioctl+0x64/0xa0
[5079239.214823] ? probe_fini+0x1212/0x1230 [traps]
[5079239.215188] __x64_sys_ioctl+0x16/0x20
[5079239.215546] do_syscall_64+0x5b/0x1b0
[5079239.215910] entry_SYSCALL_64_after_hwframe+0x61/0xc6
[5079239.216283] RIP: 0033:0x7fc56ce717cb
Resolution
- Engage the provider of
wanec3rd party kernel module for further investigation. - Check for any patches or updates available for
wanecto resolve this issue.
Possible Workaround
- Blacklist the third-party kernel module
wanecand its dependencies.
Root Cause
- The mutex is being called by
wanec_event_ctrl()function ofwanecthird-party module with some invalid memory address causing this issue.
Diagnostic Steps
- Backtrace of the panic task.
crash> bt
PID: 3486355 TASK: ff400a1a4af00000 CPU: 79 COMMAND: "asterisk"
#0 [ff81cd7364037980] machine_kexec at ffffffffbb06bec3
#1 [ff81cd73640379d8] __crash_kexec at ffffffffbb1b564a
#2 [ff81cd7364037a98] crash_kexec at ffffffffbb1b6581
#3 [ff81cd7364037ab0] oops_end at ffffffffbb02a9b1
#4 [ff81cd7364037ad0] no_context at ffffffffbb07e913
#5 [ff81cd7364037b28] __bad_area_nosemaphore at ffffffffbb07ec8c
#6 [ff81cd7364037b70] do_page_fault at ffffffffbb07f887
#7 [ff81cd7364037ba0] page_fault at ffffffffbbc0116e
[exception RIP: __mutex_lock+0xbe]
RIP: ffffffffbb9fc92e RSP: ff81cd7364037c50 RFLAGS: 00010206
RAX: 00000a20a8570000 RBX: ff4009d4b149243a RCX: ff400a20a8570000
RDX: ff400a1a4af00000 RSI: 0000000000000002 RDI: ff400a1a4af00000
RBP: ff81cd7364037cb0 R8: ff81cd7364037cb8 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: ff4009d4bdfc0600
R13: ff4009d7cceaa7d8 R14: 0000000000000002 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ff81cd7364037cb8] wanec_event_ctrl at ffffffffc10c6725 [wanec]
#9 [ff81cd7364037cf8] wp_tdmv_ioctl at ffffffffc15ce624 [wanpipe]
#10 [ff81cd7364037d40] dahdi_chanandpseudo_ioctl at ffffffffc0bf725e [dahdi]
#11 [ff81cd7364037df8] dahdi_chan_ioctl at ffffffffc0bf84a1 [dahdi]
#12 [ff81cd7364037e70] dahdi_unlocked_ioctl at ffffffffc0bfa640 [dahdi]
#13 [ff81cd7364037e80] do_vfs_ioctl at ffffffffbb379d84
#14 [ff81cd7364037ef8] ksys_ioctl at ffffffffbb37a3d4
#15 [ff81cd7364037f30] __x64_sys_ioctl at ffffffffbb37a426
#16 [ff81cd7364037f38] do_syscall_64 at ffffffffbb0052fb
#17 [ff81cd7364037f50] entry_SYSCALL_64_after_hwframe at ffffffffbbc000a9
RIP: 00007f01c77267cb RSP: 00007eff5417e558 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00000000022fee80 RCX: 00007f01c77267cb
RDX: 00007eff5417e5a8 RSI: 000000004004da5b RDI: 000000000000006a
RBP: 00007efd28001650 R8: 0000000000000327 R9: 00007f0162a19bc8
R10: 00007f01629ca8a9 R11: 0000000000000246 R12: 00007eff5417e5b0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
Source code and disassembly
738 static int __sched
739 __mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass,
740 struct lockdep_map *nest_lock, unsigned long ip)
741 {
742 return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false);
398 static inline int mutex_can_spin_on_owner(struct mutex *lock)
399 {
400 struct task_struct *owner;
...
413 owner = __mutex_owner(lock);
414 if (owner)
415 retval = owner_on_cpu(owner);
75 static inline struct task_struct *__mutex_owner(struct mutex *lock)
76 {
77 return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLAGS);
78 }
68 #define MUTEX_FLAGS 0x07
0xffffffffc10c6719 <wanec_event_ctrl+73>: lea 0x3a(%rbx),%r14
0xffffffffc10c671d <wanec_event_ctrl+77>: mov %r14,%rdi <-- 0xff4009d4b149243a
0xffffffffc10c6720 <wanec_event_ctrl+80>: call 0xffffffffbb9fcca0 <mutex_lock>
//mutex
0xff4009d4b149243a
Check owner task
//(struct task_struct *)(atomic_long_read(&((struct mutex *)0xff4009d4b149243a)->owner) & ~MUTEX_FLAGS);
//&lock->owner
crash> p &((struct mutex *)0xff4009d4b149243a)->owner
$5 = (atomic_long_t *) 0xff4009d4b149243a
//atomic_long_read(&((struct mutex *)0xff4009d4b149243a)->owner)
crash> rd 0xff4009d4b149243a
ff4009d4b149243a: 0000000000000000 ........
//(atomic_long_read(&lock->owner) & ~MUTEX_FLAGS
crash> px 0x0000000000000000&~0x07
$12 = 0x0 <<----
- The pointer for task_struct is 0x0 so it shouldn't get into owner_on_cpu().
2085 static inline bool owner_on_cpu(struct task_struct *owner)
...
2091 return READ_ONCE(owner->on_cpu) && !vcpu_is_preempted(task_cpu(owner)); <-- panic here
2092 }
/usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/./include/linux/compiler.h: 278
278 __READ_ONCE_SIZE;
0xffffffffbb9fc921 <__mutex_lock+177>: mov (%rbx),%rax
/usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/locking/mutex.c: 414
414 if (owner)
415 retval = owner_on_cpu(owner);
0xffffffffbb9fc924 <__mutex_lock+180>: and $0xfffffffffffffff8,%rax
0xffffffffbb9fc928 <__mutex_lock+184>: je 0xffffffffbb9fc9be <__mutex_lock+334>
/usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/./include/linux/compiler.h: 278
278 __READ_ONCE_SIZE;
0xffffffffbb9fc92e <__mutex_lock+190>: mov 0x38(%rax),%edx
- Check the registers.
//(%rbx),%rax
RBX: ff4009d4b149243a
crash> rd ff4009d4b149243a
ff4009d4b149243a: 0000000000000000 ........
//and $0xfffffffffffffff8,%rax
crash> px 0x0000000000000000&0xfffffffffffffff8
$13 = 0x0
- The rax should be 0x0 and dereferencing at 0x38(rax) should cause panic with messages like "handle kernel NULL pointer dereference at 0000000000000038".
- But, when system crashed, rax contained 00000a20a8570000 so panic occurred requesting at '00000a20a8570038'
- Since the mutex came from wanec_event_ctrl function of wanec module it is considered to engage the module provider for further investigation.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments