Deadlock condition between CPUs running 3rd party code from morphisec_protector

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux

Issue

  • Deadlock happens between CPUs running 3rd party code from morphisec_protector disables the interrupts and then hang on the spinlock.

Resolution

  • Red Hat neither ships nor supports this module. Engage the respective vendor of the module morphisec_protector for further investigation.

Diagnostic Steps

Pre-requisites

  1. Deploy kdump in order to collect a vmcore:

  2. Prepare a crash environment for vmcore analysis:

Vmcore Analysis

  • Cpu 0 is running the 3rd party module codes from morphisec_protector, calling add_patch_to_list() which is calling the OS routine _raw_spin_lock_irqsave() on the 3rd party variable patcher_ctx+0x58. This disables interrupts on this cpu, then attempts spinlock the variable and it finds the variable is locked by some other CPU and so it spins waiting for it to be released:

    crash> bt -c 0
    PID: 4718     TASK: ffff8e7996d1e180  CPU: 0    COMMAND: "runc"
        [exception RIP: native_queued_spin_lock_slowpath+0x1d]
         RIP: ffffffffa1d11ffd  RSP: ffff8e790c86b728  RFLAGS: 00000093
         RAX: 0000000000000001  RBX: 0000000000000206  RCX: 0000000000000001
         RDX: 0000000000000001  RSI: 0000000000000001  RDI: ffffffffc0cf79d8
         RBP: ffff8e790c86b728   R8: ffff8e79b82a9580   R9: ffffea0744499e80
         R10: 0000000000003464  R11: ffffffffa1dfc1b8  R12: ffffaf2a8a580008
         R13: ffff8e7b1d5e518e  R14: ffffaf2a8a57e008  R15: 0000000000000000
     #0 [ffff8e790c86b730] queued_spin_lock_slowpath at ffffffffa235bf4b
     #1 [ffff8e790c86b740] _raw_spin_lock_irqsave at ffffffffa236a487
     #2 [ffff8e790c86b758] add_patch_to_list at ffffffffc0c438ab [morphisec_protector]
     #3 [ffff8e790c86b778] create_and_add_patch_to_list at ffffffffc0c4393e [morphisec_protector]
     #4 [ffff8e790c86b7b0] AddWhitelistPatch at ffffffffc0c439f4 [morphisec_protector]
     #5 [ffff8e790c86b810] GenerateWhitelistPatch at ffffffffc0c49b8e [morphisec_protector]
     #6 [ffff8e790c86b868] ApplyWhitelistProcessPatches at ffffffffc0c4b221 [morphisec_protector]
     #7 [ffff8e790c86b960] patch_whitelist_process at ffffffffc0c4c26d [morphisec_protector]
     #8 [ffff8e790c86ba08] handle_suspicious_process_ex at ffffffffc0c44625 [morphisec_protector]
     #9 [ffff8e790c86bef8] handle_suspicious_process at ffffffffc0c44820 [morphisec_protector]
    #10 [ffff8e790c86bf08] not_trusted_or_not_seeded at ffffffffc0c591aa [morphisec_protector]
    #11 [ffff8e790c86bf50] system_call_fastpath at ffffffffa2374ddb
        RIP: 000055ef6acc4217  RSP: 00007ffdf748fc08  RFLAGS: 00010246
        RAX: 00000000000000ba  RBX: 000055ef6b7492e0  RCX: 000055ef6b7731c0
        RDX: 0000000000000000  RSI: 00007ffdf748fbd0  RDI: 0000000000000002
        RBP: 00007ffdf748fc00   R8: 00000000000001de   R9: 0000000000000000
        R10: 0000000000000008  R11: 0000000000000216  R12: ffffffffffffffff
        R13: 0000000000000002  R14: 0000000000000001  R15: 0000000000000400
        ORIG_RAX: 00000000000000ba  CS: 0033  SS: 002b
    
    crash> dis -r ffffffffc0c4389f
    [..]
    0xffffffffc0c4389f <add_patch_to_list+0xf>: mov    $0xffffffffc0cf79d8,%rdi   <-- patcher_ctx+0x58 [morphisec_protector]
    
    crash> raw_spinlock_t ffffffffc0cf79d8
    struct raw_spinlock_t {
      raw_lock = {
        val = {
          counter = 0x1    <-- locked
        }
      }
    }
    
    crash> dis -r ffffffffc0c438ab
    [..]
    0xffffffffc0c438a6 <add_patch_to_list+0x16>:    call   0xffffffffa236a450 <_raw_spin_lock_irqsave>   <----
    0xffffffffc0c438ab <add_patch_to_list+0x1b>:    mov    0xb4106(%rip),%rdx        # 0xffffffffc0cf79b8 <patcher_ctx+0x38>             
    
  • Cpu 2 is the current owner of the variable patcher_ctx+0x58 [morphisec_protector] and is attempting to free memory. The freeing operation requires that other CPU’s flush their TLB entries that may have a reference to that memory. It does this by issuing an IPI to each CPU and waits for them to respond. Because the 3rd party code has disabled interrupts on CPU 0 it will never respond and CPU 2 will never release the spinlock for CPU 0 to acquire and this creates a dead lock situation:

    crash>  bt -c 2
    PID: 4712     TASK: ffff8e791ffc9040  CPU: 2    COMMAND: "runc"
        [exception RIP: smp_call_function_many+0x20a]
         RIP: ffffffffa1d1173a  RSP: ffff8e790110b998  RFLAGS: 00000002
         RAX: 0000000000000000  RBX: 0000000000000080  RCX: ffff8e7b356204b8
         RDX: 0000000000000000  RSI: 0000000000000080  RDI: 0000000000000000
         RBP: ffff8e790110b960   R8: ffff8e7a7b45b800   R9: ffffffffa1f755c9
         R10: ffff8e7b3569f160  R11: ffffea0744009000  R12: 000000000001b700
         R13: ffffffffa1c7a480  R14: 0000000000000000  R15: ffff8e7b3569b740
     #0 [ffff8e790110b9a0] on_each_cpu at ffffffffa1d117fd
     #1 [ffff8e790110b9c8] flush_tlb_kernel_range at ffffffffa1c7a7f9
     #2 [ffff8e790110b9f8] __purge_vmap_area_lazy at ffffffffa1dfa020
     #3 [ffff8e790110ba58] free_vmap_area_noflush at ffffffffa1dfa2bc
     #4 [ffff8e790110ba88] remove_vm_area at ffffffffa1dfbd90
     #5 [ffff8e790110baa8] __vunmap at ffffffffa1dfbdda
     #6 [ffff8e790110bad0] vfree at ffffffffa1dfbf26
     #7 [ffff8e790110bae8] memory_list_free_entry at ffffffffc0c4e439 [morphisec_protector]
     #8 [ffff8e790110bb08] MLP_free at ffffffffc0c4e750 [morphisec_protector]
     #9 [ffff8e790110bb18] FreePatch at ffffffffc0c426b3 [morphisec_protector]
    #10 [ffff8e790110bb38] apply_all_patches at ffffffffc0c427e6 [morphisec_protector]
    #11 [ffff8e790110bb90] patcher_handle_whitelist_patches at ffffffffc0c434d1 [morphisec_protector]
    #12 [ffff8e790110bbb8] mlp_generic_file_release at ffffffffc0c52967 [morphisec_protector]
    #13 [ffff8e790110bc38] mlp_generic_file_release_xfs at ffffffffc0c4109b [morphisec_protector]
    #14 [ffff8e790110bc48] __fput at ffffffffa1e433dc
    #15 [ffff8e790110bc90] ____fput at ffffffffa1e4363e
    #16 [ffff8e790110bca0] task_work_run at ffffffffa1cbe79b
    #17 [ffff8e790110bce0] do_exit at ffffffffa1c9dc61
    #18 [ffff8e790110bd78] do_group_exit at ffffffffa1c9e44f
    #19 [ffff8e790110bda8] get_signal_to_deliver at ffffffffa1caf24e
    #20 [ffff8e790110be40] do_signal at ffffffffa1c2b527
    #21 [ffff8e790110bf30] do_notify_resume at ffffffffa1c2bc32
    #22 [ffff8e790110bf50] int_signal at ffffffffa2375124
        RIP: 00005631173cd203  RSP: 00007f490c85edd0  RFLAGS: 00000206
        RAX: fffffffffffffdfe  RBX: 0000000000000000  RCX: ffffffffffffffff
        RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
        RBP: 00007f490c85ede0   R8: 00007f490c85edd0   R9: 0000000000000000
        R10: 0000000000000000  R11: 0000000000000206  R12: 0000000000000000
        R13: 0000000000801000  R14: 0000000000000000  R15: 00007f490c85f700
        ORIG_RAX: 000000000000010e  CS: 0033  SS: 002b
    
    crash>  p call_single_queue:a
    per_cpu(call_single_queue, 0) = $1 = {
      first = 0xffff8e7b356204b8
    }
    per_cpu(call_single_queue, 1) = $2 = {
      first = 0x0
    }
    per_cpu(call_single_queue, 2) = $3 = {
      first = 0x0
    }
    per_cpu(call_single_queue, 3) = $4 = {
      first = 0x0
    }
    
    crash> dis -r ffffffffc0c427e6
    [..]
    0xffffffffc0c42798 <apply_all_patches+0xc8>:    test   %bl,%bl
    0xffffffffc0c4279a <apply_all_patches+0xca>:    mov    $0xffffffffc0cf79d8,%rdi   <------ patcher_ctx+0x58 [morphisec_protector] 
    0xffffffffc0c427a1 <apply_all_patches+0xd1>:    cmovne %r12,%r15
    0xffffffffc0c427a5 <apply_all_patches+0xd5>:    call   0xffffffffa236a450 <_raw_spin_lock_irqsave>   <---------- locked the spinlock here
    0xffffffffc0c427aa <apply_all_patches+0xda>:    mov    (%r15),%r12
    [..]
    0xffffffffc0c427de <apply_all_patches+0x10e>:   mov    %r12,%rdi
    0xffffffffc0c427e1 <apply_all_patches+0x111>:   call   0xffffffffc0c42650 <FreePatch>     <------------- current path
    0xffffffffc0c427e6 <apply_all_patches+0x116>:   cmp    %rbx,%r15
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments