Kernel panic at __mutex_lock+190 due to unsigned module "wanec".

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 8
  • wanec

Issue

  • Kernel panic with BUG: unable to handle kernel paging request at 00000a20a8570038
  • Kernel panic at __mutex_lock+190 which is being called by wanec_event_ctrl() function of 3rd party module wanec.
[5079239.202237] BUG: unable to handle kernel paging request at 0000a3a17c098038
[5079239.202605] PGD 2b73d9b067 P4D 0 
[5079239.202841] Oops: 0000 [#1] SMP NOPTI
[5079239.203079] CPU: 58 PID: 3445653 Comm: asterisk Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-477.10.1.el8_8.x86_64 #1
[5079239.203626] Hardware name: Lenovo ThinkSystem SR650 V2/7Z73CTO1WW, BIOS AFE122G-1.50 02/07/2023
[5079239.203943] RIP: 0010:__mutex_lock.isra.7+0xbe/0x420
..
[5079239.209997] Call Trace:
[5079239.210457]  ? page_counter_uncharge+0x1d/0x40
[5079239.210924]  wanec_event_ctrl+0x55/0x1e0 [wanec]
[5079239.211371]  wp_tdmv_ioctl+0x174/0x270 [wanpipe]
[5079239.211720]  dahdi_chanandpseudo_ioctl+0x59e/0x16a0 [dahdi]
[5079239.212077]  ? reuse_swap_page+0x50/0x180
[5079239.212406]  ? wp_page_reuse+0x4d/0x60
[5079239.212737]  ? do_wp_page+0x247/0x350
[5079239.213072]  dahdi_chan_ioctl+0x141/0xc10 [dahdi]
[5079239.213419]  dahdi_unlocked_ioctl+0x50/0x110 [dahdi]
[5079239.213773]  do_vfs_ioctl+0xa4/0x690
[5079239.214118]  ? syscall_trace_enter+0x1ff/0x2d0
[5079239.214475]  ksys_ioctl+0x64/0xa0
[5079239.214823]  ? probe_fini+0x1212/0x1230 [traps]
[5079239.215188]  __x64_sys_ioctl+0x16/0x20
[5079239.215546]  do_syscall_64+0x5b/0x1b0
[5079239.215910]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[5079239.216283] RIP: 0033:0x7fc56ce717cb

Resolution

  • Engage the provider of wanec 3rd party kernel module for further investigation.
  • Check for any patches or updates available for wanec to resolve this issue.

Possible Workaround

  • Blacklist the third-party kernel module wanec and its dependencies.

Root Cause

  • The mutex is being called by wanec_event_ctrl() function of wanec third-party module with some invalid memory address causing this issue.

Diagnostic Steps

  • Backtrace of the panic task.
crash> bt
PID: 3486355  TASK: ff400a1a4af00000  CPU: 79   COMMAND: "asterisk"
 #0 [ff81cd7364037980] machine_kexec at ffffffffbb06bec3
 #1 [ff81cd73640379d8] __crash_kexec at ffffffffbb1b564a
 #2 [ff81cd7364037a98] crash_kexec at ffffffffbb1b6581
 #3 [ff81cd7364037ab0] oops_end at ffffffffbb02a9b1
 #4 [ff81cd7364037ad0] no_context at ffffffffbb07e913
 #5 [ff81cd7364037b28] __bad_area_nosemaphore at ffffffffbb07ec8c
 #6 [ff81cd7364037b70] do_page_fault at ffffffffbb07f887
 #7 [ff81cd7364037ba0] page_fault at ffffffffbbc0116e
    [exception RIP: __mutex_lock+0xbe]
    RIP: ffffffffbb9fc92e  RSP: ff81cd7364037c50  RFLAGS: 00010206
    RAX: 00000a20a8570000  RBX: ff4009d4b149243a  RCX: ff400a20a8570000
    RDX: ff400a1a4af00000  RSI: 0000000000000002  RDI: ff400a1a4af00000
    RBP: ff81cd7364037cb0   R8: ff81cd7364037cb8   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000001  R12: ff4009d4bdfc0600
    R13: ff4009d7cceaa7d8  R14: 0000000000000002  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ff81cd7364037cb8] wanec_event_ctrl at ffffffffc10c6725 [wanec]
 #9 [ff81cd7364037cf8] wp_tdmv_ioctl at ffffffffc15ce624 [wanpipe]
#10 [ff81cd7364037d40] dahdi_chanandpseudo_ioctl at ffffffffc0bf725e [dahdi]
#11 [ff81cd7364037df8] dahdi_chan_ioctl at ffffffffc0bf84a1 [dahdi]
#12 [ff81cd7364037e70] dahdi_unlocked_ioctl at ffffffffc0bfa640 [dahdi]
#13 [ff81cd7364037e80] do_vfs_ioctl at ffffffffbb379d84
#14 [ff81cd7364037ef8] ksys_ioctl at ffffffffbb37a3d4
#15 [ff81cd7364037f30] __x64_sys_ioctl at ffffffffbb37a426
#16 [ff81cd7364037f38] do_syscall_64 at ffffffffbb0052fb
#17 [ff81cd7364037f50] entry_SYSCALL_64_after_hwframe at ffffffffbbc000a9
    RIP: 00007f01c77267cb  RSP: 00007eff5417e558  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00000000022fee80  RCX: 00007f01c77267cb
    RDX: 00007eff5417e5a8  RSI: 000000004004da5b  RDI: 000000000000006a
    RBP: 00007efd28001650   R8: 0000000000000327   R9: 00007f0162a19bc8
    R10: 00007f01629ca8a9  R11: 0000000000000246  R12: 00007eff5417e5b0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Source code and disassembly


738 static int __sched 739 __mutex_lock(struct mutex *lock, unsigned int state, unsigned int subclass, 740 struct lockdep_map *nest_lock, unsigned long ip) 741 { 742 return __mutex_lock_common(lock, state, subclass, nest_lock, ip, NULL, false); 398 static inline int mutex_can_spin_on_owner(struct mutex *lock) 399 { 400 struct task_struct *owner; ... 413 owner = __mutex_owner(lock); 414 if (owner) 415 retval = owner_on_cpu(owner); 75 static inline struct task_struct *__mutex_owner(struct mutex *lock) 76 { 77 return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLAGS); 78 } 68 #define MUTEX_FLAGS 0x07 0xffffffffc10c6719 <wanec_event_ctrl+73>: lea 0x3a(%rbx),%r14 0xffffffffc10c671d <wanec_event_ctrl+77>: mov %r14,%rdi <-- 0xff4009d4b149243a 0xffffffffc10c6720 <wanec_event_ctrl+80>: call 0xffffffffbb9fcca0 <mutex_lock> //mutex 0xff4009d4b149243a Check owner task //(struct task_struct *)(atomic_long_read(&((struct mutex *)0xff4009d4b149243a)->owner) & ~MUTEX_FLAGS); //&lock->owner crash> p &((struct mutex *)0xff4009d4b149243a)->owner $5 = (atomic_long_t *) 0xff4009d4b149243a //atomic_long_read(&((struct mutex *)0xff4009d4b149243a)->owner) crash> rd 0xff4009d4b149243a ff4009d4b149243a: 0000000000000000 ........ //(atomic_long_read(&lock->owner) & ~MUTEX_FLAGS crash> px 0x0000000000000000&~0x07 $12 = 0x0 <<----
  • The pointer for task_struct is 0x0 so it shouldn't get into owner_on_cpu().

2085 static inline bool owner_on_cpu(struct task_struct *owner) ... 2091 return READ_ONCE(owner->on_cpu) && !vcpu_is_preempted(task_cpu(owner)); <-- panic here 2092 } /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/./include/linux/compiler.h: 278 278 __READ_ONCE_SIZE; 0xffffffffbb9fc921 <__mutex_lock+177>: mov (%rbx),%rax /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/locking/mutex.c: 414 414 if (owner) 415 retval = owner_on_cpu(owner); 0xffffffffbb9fc924 <__mutex_lock+180>: and $0xfffffffffffffff8,%rax 0xffffffffbb9fc928 <__mutex_lock+184>: je 0xffffffffbb9fc9be <__mutex_lock+334> /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/./include/linux/compiler.h: 278 278 __READ_ONCE_SIZE; 0xffffffffbb9fc92e <__mutex_lock+190>: mov 0x38(%rax),%edx
  • Check the registers.
//(%rbx),%rax
RBX: ff4009d4b149243a

crash> rd ff4009d4b149243a
ff4009d4b149243a:  0000000000000000                    ........

//and    $0xfffffffffffffff8,%rax
crash> px 0x0000000000000000&0xfffffffffffffff8
$13 = 0x0
  • The rax should be 0x0 and dereferencing at 0x38(rax) should cause panic with messages like "handle kernel NULL pointer dereference at 0000000000000038".
  • But, when system crashed, rax contained 00000a20a8570000 so panic occurred requesting at '00000a20a8570038'
  • Since the mutex came from wanec_event_ctrl function of wanec module it is considered to engage the module provider for further investigation.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments