Server crashed due to deadlock in 3rd party McAfee module

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux
  • McAfee Endpoint Security

Issue

  • Server rebooted unexpectedly/crashed in the presence of third party McAfee kernel module.

Resolution

  • Engage the respective module vendor with regards to McAfee's Endpoint Security.

Workaround

Root Cause

  • McAfee kernel module caused a deadlock, leading to a hang and subsequent crash.

Diagnostic Steps

Pre-requisites

  1. Deploy kdump in Order to Collect a vmcore:

  2. Prepare crash Environment for vmcore Analysis:

Vmcore Analysis

  1. The vmcore was created due to the cssdmonitor task intentionally panicking the system with a sysrq. This is a cluster eviction.

    crash> bt
    PID: 3151310  TASK: ffff89a901f6c000  CPU: 35   COMMAND: "cssdmonitor"
     #0 [ffff9f4ab5af3cd8] machine_kexec at ffffffffacc6bec3
     #1 [ffff9f4ab5af3d30] __crash_kexec at ffffffffacdb564a
     #2 [ffff9f4ab5af3df0] panic at ffffffffaccf70bf
     #3 [ffff9f4ab5af3e70] sysrq_handle_crash at ffffffffad1fad91
     #4 [ffff9f4ab5af3e78] __handle_sysrq.cold.13 at ffffffffad1fb6b4
     #5 [ffff9f4ab5af3ea8] write_sysrq_trigger at ffffffffad1fb55b
     #6 [ffff9f4ab5af3eb8] proc_reg_write at ffffffffacfea7a9
     #7 [ffff9f4ab5af3ed0] vfs_write at ffffffffacf64be5
     #8 [ffff9f4ab5af3f00] ksys_write at ffffffffacf64e6f
     #9 [ffff9f4ab5af3f38] do_syscall_64 at ffffffffacc052fb
    #10 [ffff9f4ab5af3f50] entry_SYSCALL_64_after_hwframe at ffffffffad8000a9
    
  2. At the time of the panic, the load averages were elevated due to the number of blocked uninterruptible state tasks.

    LOAD AVERAGE: 1207.03, 1156.46, 890.58
    
    crash> ps -S
      RU: 56
      IN: 3884
      ID: 472
      UN: 1198
      ZO: 1
    
  3. The longest blocked task was PID 5718.

    crash> ps -m | grep UN | tail -n 3
    [  0 00:20:50.614] [UN]  PID: 2131812  TASK: ffff89c68fab0000  CPU: 17   COMMAND: "java"
    [  0 00:23:37.869] [UN]  PID: 1571647  TASK: ffff89a2d3790000  CPU: 10   COMMAND: "mfetpd"
    [  0 00:23:37.870] [UN]  PID: 5718     TASK: ffff896649458000  CPU: 30   COMMAND: "ClMgrS"
    
  4. It was waiting for a user space task to be created on behalf of the NFS subsystem. This notification is through a completion structure.

    crash> bt 5718
    PID: 5718     TASK: ffff896649458000  CPU: 30   COMMAND: "ClMgrS"
     #0 [ffff9f4a9dba32a8] __schedule at ffffffffad5fa5d1
     #1 [ffff9f4a9dba3300] schedule at ffffffffad5fabc5
     #2 [ffff9f4a9dba3318] schedule_timeout at ffffffffad5ff6b8
     #3 [ffff9f4a9dba33b0] wait_for_completion at ffffffffad5fb5f6
     #4 [ffff9f4a9dba33f0] call_usermodehelper_exec at ffffffffacd12702
     #5 [ffff9f4a9dba3430] call_sbin_request_key at ffffffffad028024
     #6 [ffff9f4a9dba3538] request_key_and_link at ffffffffad028414
     #7 [ffff9f4a9dba3600] request_key at ffffffffad0287ac
     #8 [ffff9f4a9dba3630] nfs_idmap_get_key at ffffffffc07c9acf [nfsv4]
     #9 [ffff9f4a9dba3688] nfs_idmap_lookup_id at ffffffffc07c9bb0 [nfsv4]
    #10 [ffff9f4a9dba36b8] nfs_map_name_to_uid at ffffffffc07ca1be [nfsv4]
    #11 [ffff9f4a9dba36f8] decode_getfattr_attrs at ffffffffc07c0bf5 [nfsv4]
    #12 [ffff9f4a9dba3778] decode_getfattr_generic.constprop.127 at ffffffffc07c10a2 [nfsv4]
    #13 [ffff9f4a9dba37d8] nfs4_xdr_dec_getattr at ffffffffc07c19ba [nfsv4]
    #14 [ffff9f4a9dba3828] call_decode at ffffffffc0716dfb [sunrpc]
    #15 [ffff9f4a9dba38a0] __rpc_execute at ffffffffc0726118 [sunrpc]
    #16 [ffff9f4a9dba38f8] rpc_execute at ffffffffc072e907 [sunrpc]
    #17 [ffff9f4a9dba3918] rpc_run_task at ffffffffc071897c [sunrpc]
    #18 [ffff9f4a9dba3958] nfs4_call_sync_custom at ffffffffc07a31bb [nfsv4]
    #19 [ffff9f4a9dba3968] nfs4_do_call_sync at ffffffffc07a3249 [nfsv4]
    #20 [ffff9f4a9dba39d8] _nfs4_proc_getattr at ffffffffc07a5fd8 [nfsv4]
    #21 [ffff9f4a9dba3a98] nfs4_proc_getattr at ffffffffc07aecf1 [nfsv4]
    #22 [ffff9f4a9dba3b00] __nfs_revalidate_inode at ffffffffc0aebb52 [nfs]
    #23 [ffff9f4a9dba3b28] nfs_access_get_cached at ffffffffc0ae3fb1 [nfs]
    #24 [ffff9f4a9dba3b78] nfs_do_access at ffffffffc0ae40d2 [nfs]
    #25 [ffff9f4a9dba3c10] nfs_permission at ffffffffc0ae439e [nfs]
    #26 [ffff9f4a9dba3c30] inode_permission at ffffffffacf71a5a
    #27 [ffff9f4a9dba3c58] link_path_walk at ffffffffacf72101
    #28 [ffff9f4a9dba3cb8] path_lookupat at ffffffffacf723bb
    #29 [ffff9f4a9dba3d18] filename_lookup at ffffffffacf76a50
    #30 [ffff9f4a9dba3e40] vfs_statx at ffffffffacf69784
    #31 [ffff9f4a9dba3e98] __do_sys_newstat at ffffffffacf69dd9
    #32 [ffff9f4a9dba3f38] do_syscall_64 at ffffffffacc052fb
    #33 [ffff9f4a9dba3f50] entry_SYSCALL_64_after_hwframe at ffffffffad8000a9
        RIP: 00007fe702268f89  RSP: 00007fe6f30f0688  RFLAGS: 00000246
        RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007fe702268f89
        RDX: 00007fe6f30f0950  RSI: 00007fe6f30f0950  RDI: 00007fe6d40064d0
        RBP: 00007fe6f30f06a0   R8: 000000000000000c   R9: 00007fe6fdd2329f
        R10: 0000000000000000  R11: 0000000000000246  R12: 00007fe6fdc7f212
        R13: 00007fe6f31f15ef  R14: 00007fe6f31f16b0  R15: 00007fe6f30f0f40
        ORIG_RAX: 0000000000000004  CS: 0033  SS: 002b
    
    
  5. We can find the completion structure by referencing the assembly. Per x86 convention, we know that the first parameter of a routine is passed via register RDI.

    Before calling wait_for_completion(), we see that the contents of register RSP is used to fill RDI, containing the completion structure.

    crash> whatis wait_for_completion
    void wait_for_completion(struct completion *);
    
    crash> dis -rl ffffffffacd12702 | tail 
    /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/umh.c: 442
    0xffffffffacd126ef <call_usermodehelper_exec+287>:  xor    %eax,%eax
    0xffffffffacd126f1 <call_usermodehelper_exec+289>:  xchg   %rax,0x40(%rbx)
    0xffffffffacd126f5 <call_usermodehelper_exec+293>:  test   %rax,%rax
    0xffffffffacd126f8 <call_usermodehelper_exec+296>:  jne    0xffffffffacd1267d <call_usermodehelper_exec+173>
    /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/umh.c: 447
    0xffffffffacd126fa <call_usermodehelper_exec+298>:  mov    %rsp,%rdi
    0xffffffffacd126fd <call_usermodehelper_exec+301>:  call   0xffffffffad5fb560 <wait_for_completion>
    
  6. Earlier in this routine, we can reference the assembly to build the stack and annotate it.

    crash> dis -rl ffffffffacd12702 | head -n 20
    /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/umh.c: 402
    0xffffffffacd125d0 <call_usermodehelper_exec>:  nopl   0x0(%rax,%rax,1) [FTRACE NOP]
    0xffffffffacd125d5 <call_usermodehelper_exec+0x5>:  push   %rbp
    0xffffffffacd125d6 <call_usermodehelper_exec+0x6>:  push   %rbx
    0xffffffffacd125d7 <call_usermodehelper_exec+0x7>:  mov    %rdi,%rbx
    0xffffffffacd125da <call_usermodehelper_exec+0xa>:  sub    $0x28,%rsp
    
     #4 [ffff9f4a9dba33f0] call_usermodehelper_exec+0x132 at ffffffffacd12702
        /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/umh.c: 449
        ffff9f4a9dba33f8: ffffffff00000000 ffff89a500000000        ---.
        RSP^                                                          | sub $0x28,%rsp
        ffff9f4a9dba3408: ffff9f4a9dba33c0 ffff9f4a9dba33c0           |
        ffff9f4a9dba3418: d1b5ac3f88cc4b00                         ---' 
                                           [ffff89a579d57be0:kmalloc-16] 
                                           RBX^
        ffff9f4a9dba3428: [ffff89add5035d00:key_jar] call_sbin_request_key+0x264 
                          RBP^                       RIP^ 
     #5 [ffff9f4a9dba3430] call_sbin_request_key+0x264 at ffffffffad028024
    
  7. Register RSP, contained 0xffff9f4a9dba33f8. The completion is not marked as done yet.

    crash> struct completion ffff9f4a9dba33f8
    struct completion {
      done = 0,
    
  8. Per the source, we know that the subprocess_info structure used to create the userspace task is passed into call_usermodehelper_exec() as the first parameter.

    crash> whatis call_usermodehelper_exec
    int call_usermodehelper_exec(struct subprocess_info *, int);
    
  9. This was passed via RDI into call_usermodehelper_exec(). The contents of RDI were then copied to register RBX. Register RBX was not modified for the remainder of the routine.

    crash> dis -rl ffffffffacd12702 | grep bx
    0xffffffffacd125d6 <call_usermodehelper_exec+0x6>:  push   %rbx
    0xffffffffacd125d7 <call_usermodehelper_exec+0x7>:  mov    %rdi,%rbx
    0xffffffffacd12664 <call_usermodehelper_exec+0x94>: mov    0x70(%rbx),%rax
    0xffffffffacd1266d <call_usermodehelper_exec+0x9d>: mov    %rbx,%rdi
    0xffffffffacd12675 <call_usermodehelper_exec+0xa5>: mov    %rbx,%rdi
    0xffffffffacd126a0 <call_usermodehelper_exec+0xd0>: pop    %rbx
    0xffffffffacd126f1 <call_usermodehelper_exec+0x121>:    xchg   %rax,0x40(%rbx)
    0xffffffffacd12702 <call_usermodehelper_exec+0x132>:    mov    0x64(%rbx),%ebp
    
  10. Register RBX was then pushed to the stack on the third push of the following routine.

    crash> dis -rl ffffffffad5fb5f6 | head
    /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/sched/completion.c: 137
    0xffffffffad5fb560 <wait_for_completion>:   nopl   0x0(%rax,%rax,1) [FTRACE NOP]
    0xffffffffad5fb565 <wait_for_completion+0x5>:   push   %r12
    0xffffffffad5fb567 <wait_for_completion+0x7>:   push   %rbp
    0xffffffffad5fb568 <wait_for_completion+0x8>:   mov    %rdi,%rbp
    0xffffffffad5fb56b <wait_for_completion+0xb>:   push   %rbx
    /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/sched/completion.c: 105
    0xffffffffad5fb56c <wait_for_completion+0xc>:   lea    0x8(%rbp),%rbx
    /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/sched/completion.c: 137
    0xffffffffad5fb570 <wait_for_completion+0x10>:  sub    $0x20,%rsp
    
  11. Annotating the stack, we see that register RBX held address 0xffff89a7509dbd80, the subprocess_info structure.

     #3 [ffff9f4a9dba33b0] wait_for_completion+0x96 at ffffffffad5fb5f6
        /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/sched/completion.c: 86
        ffff9f4a9dba33b8: [ffff896649458000:task_struct] ffff9f4a9dba3408  ---.
        ffff9f4a9dba33c8: ffff9f4a9dba3408 d1b5ac3f88cc4b00                ---'-- sub $0x20,%rsp
        ffff9f4a9dba33d8: [ffff89a7509dbd80:kmalloc-128] 0000000000000000 
                           RBX^                          RBP^ 
        ffff9f4a9dba33e8: [ffff89add5034000:key_jar] call_usermodehelper_exec+0x132 
                          R12^                       RIP^
     #4 [ffff9f4a9dba33f0] call_usermodehelper_exec+0x132 at ffffffffacd12702
    
    
    crash> struct subprocess_info ffff89a7509dbd80
    struct subprocess_info {
      work = {
        data = {
          counter = 0xc40
        },
        entry = {
          next = 0xffff89a7509dbd88,
          prev = 0xffff89a7509dbd88
        },
        func = 0xffffffffacd12770 <call_usermodehelper_exec_work>,
        {
          bdi_wb_backptr = 0x0,
          rh_kabi_hidden_111 = {
            rh_reserved1 = 0x0
          },
          {<No data fields>}
        },
        rh_reserved2 = 0x0,
        rh_reserved3 = 0x0,
        rh_reserved4 = 0x0
      },
      complete = 0xffff9f4a9dba33f8,
      path = 0xffffffffadc576b0 <request_key> "/sbin/request-key",
      argv = 0xffff9f4a9dba3480,
      envp = 0xffff9f4a9dba3440,
      wait = 0x2,
      retval = 0x0,
      init = 0xffffffffad027d30 <umh_keys_init>,
      cleanup = 0xffffffffad027cd0 <umh_keys_cleanup>,
      data = 0xffff89add5035d00
    }
    
  12. The subprocess_info structure is found in the long blocked task, and also a kworker PID 1036030.

    crash> search -t ffff89a7509dbd80
    PID: 5718     TASK: ffff896649458000  CPU: 30   COMMAND: "ClMgrS"
    ffff9f4a9dba33d8: ffff89a7509dbd80 
    
    PID: 1036030  TASK: ffff89accebf8000  CPU: 26   COMMAND: "kworker/u98:1"
    ffff9f4ab7527de0: ffff89a7509dbd80 
    ffff9f4ab7527e78: ffff89a7509dbd80 
    ffff9f4ab7527e90: ffff89a7509dbd80 
    
  13. The kworker PID 1036030 has been in interruptible state for almost 24 minutes.

    crash> ps -m  1036030
    [  0 00:23:37.870] [IN]  PID: 1036030  TASK: ffff89accebf8000  CPU: 26   COMMAND: "kworker/u98:1"
    
  14. This is it's call stack. It's waiting to be notified that a child PID has been created.

    crash> bt 1036030
    PID: 1036030  TASK: ffff89accebf8000  CPU: 26   COMMAND: "kworker/u98:1"
     #0 [ffff9f4ab7527d68] __schedule at ffffffffad5fa5d1
     #1 [ffff9f4ab7527dc0] schedule at ffffffffad5fabc5
     #2 [ffff9f4ab7527dd8] do_wait at ffffffffaccfb3e5
     #3 [ffff9f4ab7527e10] kernel_wait at ffffffffaccfcbe7
     #4 [ffff9f4ab7527e88] call_usermodehelper_exec_work at ffffffffacd127c9
     #5 [ffff9f4ab7527e98] process_one_work at ffffffffacd15ba7
     #6 [ffff9f4ab7527ed8] worker_thread at ffffffffacd16260
     #7 [ffff9f4ab7527f10] kthread at ffffffffacd1d3d4
     #8 [ffff9f4ab7527f50] ret_from_fork at ffffffffad80024f
    
  15. We can verify that it's trying to create the user space PID from the same subprocess_info structure by referencing the assembly. The function process_one_work() passes a work_struct structure to routine call_usermodehelper_exec_work(), which is then used as a subprocess_info structure.

    157 static void call_usermodehelper_exec_work(struct work_struct *work)
    158 {
    159         struct subprocess_info *sub_info =
    160                 container_of(work, struct subprocess_info, work);
    
  16. The assembly shows that the work_struct is passed from RBX to RDI before calling call_usermodehelper_exec_work().

    crash> dis -rl ffffffffacd15ba7 | tail 
    0xffffffffacd15b8f <process_one_work+0x18f>:    sti    
    0xffffffffacd15b90 <process_one_work+0x190>:    nopw   0x0(%rax,%rax,1)
    /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/./arch/x86/include/asm/jump_label.h: 38
    0xffffffffacd15b96 <process_one_work+0x196>:    nopl   0x0(%rax,%rax,1)
    /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/workqueue.c: 2287
    0xffffffffacd15b9b <process_one_work+0x19b>:    mov    0x18(%r14),%rax
    0xffffffffacd15b9f <process_one_work+0x19f>:    mov    %rbx,%rdi
    0xffffffffacd15ba2 <process_one_work+0x1a2>:    call   0xffffffffada01200 <__x86_indirect_thunk_rax>
    
  17. RBX is then pushed to the stack on the first push of call_usermodehelper_exec_work().

    crash> dis -rl ffffffffacd127c9 | head
    /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/umh.c: 158
    0xffffffffacd12770 <call_usermodehelper_exec_work>: nopl   0x0(%rax,%rax,1) [FTRACE NOP]
    /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/umh.c: 162
    0xffffffffacd12775 <call_usermodehelper_exec_work+0x5>: push   %rbx
    
  18. Annotating the stack, we see it's the same address as the subprocess_info structure 0xffff89a7509dbd80.

     #4 [ffff9f4ab7527e88] call_usermodehelper_exec_work+0x59 at ffffffffacd127c9
        /usr/src/debug/kernel-4.18.0-477.10.1.el8_8/linux-4.18.0-477.10.1.el8_8.x86_64/kernel/umh.c: 136
        ffff9f4ab7527e90: [ffff89a7509dbd80:kmalloc-128] process_one_work+0x1a7 
                           RBX^                           RIP^    
     #5 [ffff9f4ab7527e98] process_one_work+0x1a7 at ffffffffacd15ba7
    
  19. The child PID that the kworker is waiting to be created is nfsidmap, which is in uninterruptible blocked state.

    crash> ps -c 1036030
    PID: 1036030  TASK: ffff89accebf8000  CPU: 26   COMMAND: "kworker/u98:1"
      PID: 1087887  TASK: ffff89a589cf8000  CPU: 36   COMMAND: "nfsidmap"
    
    crash> ps -m 1087887
    [  0 00:00:42.304] [UN]  PID: 1087887  TASK: ffff89a589cf8000  CPU: 36   COMMAND: "nfsidmap"
    
  20. And is blocked after operating in the third party code provided by the mfe_fileaccess_1007161049 third party kernel, which isn't provided by Red Hat.

    crash> bt 1087887
    PID: 1087887  TASK: ffff89a589cf8000  CPU: 36   COMMAND: "nfsidmap"
     #0 [ffff9f4ab3d37d38] __schedule at ffffffffad5fa5d1
     #1 [ffff9f4ab3d37d90] schedule at ffffffffad5fabc5
     #2 [ffff9f4ab3d37da8] schedule_timeout at ffffffffad5ff5d7
     #3 [ffff9f4ab3d37e40] mfe_fileaccess_get_orig_sys_close_addr_32_bit at ffffffffc0349d53 [mfe_fileaccess_1007161049]
     #4 [ffff9f4ab3d37ec8] mfe_fileaccess_sc_do_scan at ffffffffc034a41c [mfe_fileaccess_1007161049]
     #5 [ffff9f4ab3d37f00] mfe_fileaccess_sys_openat_64_bit at ffffffffc0348549 [mfe_fileaccess_1007161049]
     #6 [ffff9f4ab3d37f38] do_syscall_64 at ffffffffacc052fb
     #7 [ffff9f4ab3d37f50] entry_SYSCALL_64_after_hwframe at ffffffffad8000a9
        RIP: 00007f8a3d49e6d2  RSP: 00007ffc39e7e840  RFLAGS: 00000246
        RAX: ffffffffffffffda  RBX: 000055ed229dbc50  RCX: 00007f8a3d49e6d2
        RDX: 0000000000000000  RSI: 00007ffc39e82ac0  RDI: 00000000ffffff9c
        RBP: 0000000000000008   R8: 0000000000000008   R9: 0000000000000001
        R10: 0000000000000000  R11: 0000000000000246  R12: 00007f8a3c3329d0
        R13: 00007f8a3c3329d0  R14: 0000000000000001  R15: 0000000000000000
        ORIG_RAX: 0000000000000101  CS: 0033  SS: 002b
    
  21. These third party modules are not supported by Red Hat.

    crash> mod -t 
    NAME                       TAINTS
    dsa_filter_hook            OE
    oracleasm                  O
    mfe_fileaccess_1007161049  OE
    mfe_aac_1007161049         OE
    dsa_filter                 POE
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments