Deadlock condition between CPUs running 3rd party code from morphisec_protector
Environment
- Red Hat Enterprise Linux
Issue
- Deadlock happens between CPUs running 3rd party code from
morphisec_protectordisables the interrupts and then hang on thespinlock.
Resolution
- Red Hat neither ships nor supports this module. Engage the respective vendor of the module
morphisec_protectorfor further investigation.
Diagnostic Steps
Pre-requisites
-
Deploy kdump in order to collect a vmcore:
- Vmcore analyis is required to determine if you are being impacted by this issue. This first requires that a vmcore is dumped successfully.
- If the
kexec-toolspackage is absent or thekdumpservice is inactive, please reference the following article to install, enable, start, and configure kdump:
How to troubleshoot kernel crashes, hangs, or reboots with kdump on Red Hat Enterprise Linux
-
Prepare a crash environment for vmcore analysis:
- Please reference the following article to set up a vmcore analysis environment:
How to set up a vmcore analysis environment?
- Please reference the following article to set up a vmcore analysis environment:
Vmcore Analysis
-
Cpu 0 is running the 3rd party module codes from
morphisec_protector, callingadd_patch_to_list()which is calling the OS routine_raw_spin_lock_irqsave()on the 3rd party variablepatcher_ctx+0x58. This disables interrupts on this cpu, then attempts spinlock the variable and it finds the variable is locked by some other CPU and so it spins waiting for it to be released:crash> bt -c 0 PID: 4718 TASK: ffff8e7996d1e180 CPU: 0 COMMAND: "runc" [exception RIP: native_queued_spin_lock_slowpath+0x1d] RIP: ffffffffa1d11ffd RSP: ffff8e790c86b728 RFLAGS: 00000093 RAX: 0000000000000001 RBX: 0000000000000206 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffc0cf79d8 RBP: ffff8e790c86b728 R8: ffff8e79b82a9580 R9: ffffea0744499e80 R10: 0000000000003464 R11: ffffffffa1dfc1b8 R12: ffffaf2a8a580008 R13: ffff8e7b1d5e518e R14: ffffaf2a8a57e008 R15: 0000000000000000 #0 [ffff8e790c86b730] queued_spin_lock_slowpath at ffffffffa235bf4b #1 [ffff8e790c86b740] _raw_spin_lock_irqsave at ffffffffa236a487 #2 [ffff8e790c86b758] add_patch_to_list at ffffffffc0c438ab [morphisec_protector] #3 [ffff8e790c86b778] create_and_add_patch_to_list at ffffffffc0c4393e [morphisec_protector] #4 [ffff8e790c86b7b0] AddWhitelistPatch at ffffffffc0c439f4 [morphisec_protector] #5 [ffff8e790c86b810] GenerateWhitelistPatch at ffffffffc0c49b8e [morphisec_protector] #6 [ffff8e790c86b868] ApplyWhitelistProcessPatches at ffffffffc0c4b221 [morphisec_protector] #7 [ffff8e790c86b960] patch_whitelist_process at ffffffffc0c4c26d [morphisec_protector] #8 [ffff8e790c86ba08] handle_suspicious_process_ex at ffffffffc0c44625 [morphisec_protector] #9 [ffff8e790c86bef8] handle_suspicious_process at ffffffffc0c44820 [morphisec_protector] #10 [ffff8e790c86bf08] not_trusted_or_not_seeded at ffffffffc0c591aa [morphisec_protector] #11 [ffff8e790c86bf50] system_call_fastpath at ffffffffa2374ddb RIP: 000055ef6acc4217 RSP: 00007ffdf748fc08 RFLAGS: 00010246 RAX: 00000000000000ba RBX: 000055ef6b7492e0 RCX: 000055ef6b7731c0 RDX: 0000000000000000 RSI: 00007ffdf748fbd0 RDI: 0000000000000002 RBP: 00007ffdf748fc00 R8: 00000000000001de R9: 0000000000000000 R10: 0000000000000008 R11: 0000000000000216 R12: ffffffffffffffff R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000400 ORIG_RAX: 00000000000000ba CS: 0033 SS: 002b crash> dis -r ffffffffc0c4389f [..] 0xffffffffc0c4389f <add_patch_to_list+0xf>: mov $0xffffffffc0cf79d8,%rdi <-- patcher_ctx+0x58 [morphisec_protector] crash> raw_spinlock_t ffffffffc0cf79d8 struct raw_spinlock_t { raw_lock = { val = { counter = 0x1 <-- locked } } } crash> dis -r ffffffffc0c438ab [..] 0xffffffffc0c438a6 <add_patch_to_list+0x16>: call 0xffffffffa236a450 <_raw_spin_lock_irqsave> <---- 0xffffffffc0c438ab <add_patch_to_list+0x1b>: mov 0xb4106(%rip),%rdx # 0xffffffffc0cf79b8 <patcher_ctx+0x38> -
Cpu 2 is the current owner of the variable
patcher_ctx+0x58 [morphisec_protector]and is attempting to free memory. The freeing operation requires that other CPU’s flush their TLB entries that may have a reference to that memory. It does this by issuing an IPI to each CPU and waits for them to respond. Because the 3rd party code has disabled interrupts on CPU 0 it will never respond and CPU 2 will never release the spinlock for CPU 0 to acquire and this creates a dead lock situation:crash> bt -c 2 PID: 4712 TASK: ffff8e791ffc9040 CPU: 2 COMMAND: "runc" [exception RIP: smp_call_function_many+0x20a] RIP: ffffffffa1d1173a RSP: ffff8e790110b998 RFLAGS: 00000002 RAX: 0000000000000000 RBX: 0000000000000080 RCX: ffff8e7b356204b8 RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000000000 RBP: ffff8e790110b960 R8: ffff8e7a7b45b800 R9: ffffffffa1f755c9 R10: ffff8e7b3569f160 R11: ffffea0744009000 R12: 000000000001b700 R13: ffffffffa1c7a480 R14: 0000000000000000 R15: ffff8e7b3569b740 #0 [ffff8e790110b9a0] on_each_cpu at ffffffffa1d117fd #1 [ffff8e790110b9c8] flush_tlb_kernel_range at ffffffffa1c7a7f9 #2 [ffff8e790110b9f8] __purge_vmap_area_lazy at ffffffffa1dfa020 #3 [ffff8e790110ba58] free_vmap_area_noflush at ffffffffa1dfa2bc #4 [ffff8e790110ba88] remove_vm_area at ffffffffa1dfbd90 #5 [ffff8e790110baa8] __vunmap at ffffffffa1dfbdda #6 [ffff8e790110bad0] vfree at ffffffffa1dfbf26 #7 [ffff8e790110bae8] memory_list_free_entry at ffffffffc0c4e439 [morphisec_protector] #8 [ffff8e790110bb08] MLP_free at ffffffffc0c4e750 [morphisec_protector] #9 [ffff8e790110bb18] FreePatch at ffffffffc0c426b3 [morphisec_protector] #10 [ffff8e790110bb38] apply_all_patches at ffffffffc0c427e6 [morphisec_protector] #11 [ffff8e790110bb90] patcher_handle_whitelist_patches at ffffffffc0c434d1 [morphisec_protector] #12 [ffff8e790110bbb8] mlp_generic_file_release at ffffffffc0c52967 [morphisec_protector] #13 [ffff8e790110bc38] mlp_generic_file_release_xfs at ffffffffc0c4109b [morphisec_protector] #14 [ffff8e790110bc48] __fput at ffffffffa1e433dc #15 [ffff8e790110bc90] ____fput at ffffffffa1e4363e #16 [ffff8e790110bca0] task_work_run at ffffffffa1cbe79b #17 [ffff8e790110bce0] do_exit at ffffffffa1c9dc61 #18 [ffff8e790110bd78] do_group_exit at ffffffffa1c9e44f #19 [ffff8e790110bda8] get_signal_to_deliver at ffffffffa1caf24e #20 [ffff8e790110be40] do_signal at ffffffffa1c2b527 #21 [ffff8e790110bf30] do_notify_resume at ffffffffa1c2bc32 #22 [ffff8e790110bf50] int_signal at ffffffffa2375124 RIP: 00005631173cd203 RSP: 00007f490c85edd0 RFLAGS: 00000206 RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 00007f490c85ede0 R8: 00007f490c85edd0 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000801000 R14: 0000000000000000 R15: 00007f490c85f700 ORIG_RAX: 000000000000010e CS: 0033 SS: 002b crash> p call_single_queue:a per_cpu(call_single_queue, 0) = $1 = { first = 0xffff8e7b356204b8 } per_cpu(call_single_queue, 1) = $2 = { first = 0x0 } per_cpu(call_single_queue, 2) = $3 = { first = 0x0 } per_cpu(call_single_queue, 3) = $4 = { first = 0x0 } crash> dis -r ffffffffc0c427e6 [..] 0xffffffffc0c42798 <apply_all_patches+0xc8>: test %bl,%bl 0xffffffffc0c4279a <apply_all_patches+0xca>: mov $0xffffffffc0cf79d8,%rdi <------ patcher_ctx+0x58 [morphisec_protector] 0xffffffffc0c427a1 <apply_all_patches+0xd1>: cmovne %r12,%r15 0xffffffffc0c427a5 <apply_all_patches+0xd5>: call 0xffffffffa236a450 <_raw_spin_lock_irqsave> <---------- locked the spinlock here 0xffffffffc0c427aa <apply_all_patches+0xda>: mov (%r15),%r12 [..] 0xffffffffc0c427de <apply_all_patches+0x10e>: mov %r12,%rdi 0xffffffffc0c427e1 <apply_all_patches+0x111>: call 0xffffffffc0c42650 <FreePatch> <------------- current path 0xffffffffc0c427e6 <apply_all_patches+0x116>: cmp %rbx,%r15
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments