Kernel panic with "general protection fault" in function __list_del_entry()

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 7

Issue

  • The server crashes with general protection fault: 0000 [#1] SMP error in the function __list_del_entry() which is called by drm_property_free_blob().
crash> sys | grep PANIC
       PANIC: "general protection fault: 0000 [#1] SMP "

crash> set -p
    PID: 17886
COMMAND: "kworker/2:1"
   TASK: ffff9dad9d721080  [THREAD_INFO: ffff9daf5c6a4000]
    CPU: 2
  STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 17886    TASK: ffff9dad9d721080  CPU: 2    COMMAND: "kworker/2:1"
 #0 [ffff9daf5c6a7938] machine_kexec at ffffffff8e8662f4
 #1 [ffff9daf5c6a7998] __crash_kexec at ffffffff8e922b82
 #2 [ffff9daf5c6a7a68] crash_kexec at ffffffff8e922c70
 #3 [ffff9daf5c6a7a80] oops_end at ffffffff8ef91798
 #4 [ffff9daf5c6a7aa8] die at ffffffff8e830a7b
 #5 [ffff9daf5c6a7ad8] do_general_protection at ffffffff8ef91092
 #6 [ffff9daf5c6a7b10] general_protection at ffffffff8ef90718
    [exception RIP: __list_del_entry+41]                                  <<-----
    RIP: ffffffff8eba6549  RSP: ffff9daf5c6a7bc8  RFLAGS: 00010203
    RAX: 0038615acd3c2b91  RBX: ffff9dad7edaed28  RCX: dead000000000200
    RDX: ffff9daac3d5b428  RSI: ffffffffc047e59d  RDI: ffff9dad7edaed28
    RBP: ffff9daf5c6a7bc8   R8: 000000000001f100   R9: ffffffffc044f462
    R10: ffff9dafffd1f100  R11: ffffed800a5bd0c0  R12: ffff9dad7edaed00
    R13: ffff9dad7edaed10  R14: ffff9dafe32df800  R15: ffff9dafd7b22760
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff9daf5c6a7bd0] list_del at ffffffff8eba65fd
 #8 [ffff9daf5c6a7be8] drm_property_free_blob at ffffffffc0456b1c [drm]
 #9 [ffff9daf5c6a7c08] drm_mode_object_put at ffffffffc04562b8 [drm]
#10 [ffff9daf5c6a7c30] drm_property_blob_put at ffffffffc0456b63 [drm]
#11 [ffff9daf5c6a7c40] __drm_atomic_helper_plane_destroy_state at ffffffffc04eaafe [drm_kms_helper]
#12 [ffff9daf5c6a7c58] drm_atomic_helper_plane_destroy_state at ffffffffc04eaba5 [drm_kms_helper]
#13 [ffff9daf5c6a7c70] vmw_du_plane_destroy_state at ffffffffc0536d08 [vmwgfx]
#14 [ffff9daf5c6a7ca0] drm_atomic_state_default_clear at ffffffffc0450f0a [drm]
#15 [ffff9daf5c6a7ce0] drm_atomic_state_clear at ffffffffc0451035 [drm]
#16 [ffff9daf5c6a7cf0] __drm_atomic_state_free at ffffffffc0451058 [drm]
#17 [ffff9daf5c6a7d10] drm_atomic_helper_dirtyfb at ffffffffc04eb139 [drm_kms_helper]
#18 [ffff9daf5c6a7d98] vmw_fb_dirty_flush at ffffffffc053c9d7 [vmwgfx]
#19 [ffff9daf5c6a7e20] process_one_work at ffffffff8e8bdfdf
#20 [ffff9daf5c6a7e68] worker_thread at ffffffff8e8bf0f6
#21 [ffff9daf5c6a7ec8] kthread at ffffffff8e8c5fb1

Resolution

  • The resolution is undetermined yet as the available data is insufficient.

Workaround

  • Engage the hardware vendor for any issues from the hardware side.
  • Blacklist the third-party modules and check if this issue is reproducible.
  • Upgrade the system to the latest kernel and check if the issue persists with the latest kernel.

Root Cause

  • The issue is occurring due to memory corruption. However, the exact root cause of this memory corruption is unknown due to insufficient data.
  • The possible root cause for such memory corruption can be either faulty hardware, 3rd-party kernel modules, or a kernel bug.

Diagnostic Steps

  • The panic string shows the kernel panic due to a general protection fault.
crash> sys | grep PANIC
       PANIC: "general protection fault: 0000 [#1] SMP "            <<-----
  • The backtrace of the panic task shows that exception RIP as __list_del_entry.
crash> bt
PID: 17886    TASK: ffff9dad9d721080  CPU: 2    COMMAND: "kworker/2:1"
 #0 [ffff9daf5c6a7938] machine_kexec at ffffffff8e8662f4
...
 #6 [ffff9daf5c6a7b10] general_protection at ffffffff8ef90718
    [exception RIP: __list_del_entry+0x29]                        <<-----
    RIP: ffffffff8eba6549  RSP: ffff9daf5c6a7bc8  RFLAGS: 00010203
    RAX: 0038615acd3c2b91  RBX: ffff9dad7edaed28  RCX: dead000000000200
    RDX: ffff9daac3d5b428  RSI: ffffffffc047e59d  RDI: ffff9dad7edaed28
    RBP: ffff9daf5c6a7bc8   R8: 000000000001f100   R9: ffffffffc044f462
    R10: ffff9dafffd1f100  R11: ffffed800a5bd0c0  R12: ffff9dad7edaed00
    R13: ffff9dad7edaed10  R14: ffff9dafe32df800  R15: ffff9dafd7b22760
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff9daf5c6a7bd0] list_del at ffffffff8eba65fd
 #8 [ffff9daf5c6a7be8] drm_property_free_blob at ffffffffc0456b1c [drm]  <<-----
 #9 [ffff9daf5c6a7c08] drm_mode_object_put at ffffffffc04562b8 [drm]
#10 [ffff9daf5c6a7c30] drm_property_blob_put at ffffffffc0456b63 [drm]
#11 [ffff9daf5c6a7c40] __drm_atomic_helper_plane_destroy_state at ffffffffc04eaafe [drm_kms_helper]
#12 [ffff9daf5c6a7c58] drm_atomic_helper_plane_destroy_state at ffffffffc04eaba5 [drm_kms_helper]
#13 [ffff9daf5c6a7c70] vmw_du_plane_destroy_state at ffffffffc0536d08 [vmwgfx]
#14 [ffff9daf5c6a7ca0] drm_atomic_state_default_clear at ffffffffc0450f0a [drm]
#15 [ffff9daf5c6a7ce0] drm_atomic_state_clear at ffffffffc0451035 [drm]
#16 [ffff9daf5c6a7cf0] __drm_atomic_state_free at ffffffffc0451058 [drm]
#17 [ffff9daf5c6a7d10] drm_atomic_helper_dirtyfb at ffffffffc04eb139 [drm_kms_helper]
#18 [ffff9daf5c6a7d98] vmw_fb_dirty_flush at ffffffffc053c9d7 [vmwgfx]
#19 [ffff9daf5c6a7e20] process_one_work at ffffffff8e8bdfdf
#20 [ffff9daf5c6a7e68] worker_thread at ffffffff8e8bf0f6
#21 [ffff9daf5c6a7ec8] kthread at ffffffff8e8c5fb1
  • Kernel panicked in the function __list_del_entry() called by drm_property_free_blob().
drm_property_free_blob() source:
524 static void drm_property_free_blob(struct kref *kref)
525 {
526         struct drm_property_blob *blob =
527                 container_of(kref, struct drm_property_blob, base.refcount);
528 
529         mutex_lock(&blob->dev->mode_config.blob_lock);
530         list_del(&blob->head_global);                       <<<< PANIC HERE
531         mutex_unlock(&blob->dev->mode_config.blob_lock);
532 
533         drm_mode_object_unregister(blob->dev, &blob->base);
534 
535         kvfree(blob);
536 }
  • Disassembly of the function drm_property_free_blob and its address ffffffffc0456b1c
crash> dis -r ffffffffc0456b1c
0xffffffffc0456af0 <drm_property_free_blob>:    data16 data16 data16 xchg %ax,%ax
0xffffffffc0456af5 <drm_property_free_blob+0x5>:        push   %rbp
0xffffffffc0456af6 <drm_property_free_blob+0x6>:        mov    %rsp,%rbp
0xffffffffc0456af9 <drm_property_free_blob+0x9>:        push   %r12
0xffffffffc0456afb <drm_property_free_blob+0xb>:        push   %rbx
0xffffffffc0456afc <drm_property_free_blob+0xc>:        mov    0x10(%rdi),%rax
0xffffffffc0456b00 <drm_property_free_blob+0x10>:       mov    %rdi,%rbx
...
0xffffffffc0456b17 <drm_property_free_blob+0x27>:       call   0xffffffff8eba65f0 <list_del>   <<-----
0xffffffffc0456b1c <drm_property_free_blob+0x2c>:       mov    0x10(%rbx),%rax

crash> dis 0xffffffff8eba65f0
0xffffffff8eba65f0 <list_del>:  push   %rbp
0xffffffff8eba65f1 <list_del+0x1>:      mov    %rsp,%rbp
0xffffffff8eba65f4 <list_del+0x4>:      push   %rbx

 #7 [ffff9daf5c6a7bd0] list_del at ffffffff8eba65fd
    ffff9daf5c6a7bd8: ffff9dad7edaed10 ffff9daf5c6a7c00 
    ffff9daf5c6a7be8: ffffffffc0456b1c 
 #8 [ffff9daf5c6a7be8] drm_property_free_blob at ffffffffc0456b1c [drm]
  • kref pointer is at ffff9dad7edaed10 and drm_property_blob structure is at ffff9dad7edaed00
crash> eval ffff9dad7edaed10 - 0x10
hexadecimal: ffff9dad7edaed00 

crash> drm_property_blob.head_global ffff9dad7edaed00
  head_global = {
    next = 0xffff9daac3d5b428,
    prev = 0x38615acd3c2b91
  },
  • prev pointer is equal to 0x38615acd3c2b91. This is not a valid pointer that shows an issue of memory corruption.
crash> kmem 0x38615acd3c2b91
kmem: cannot determine page for 38615acd3c2b91
38615acd3c2b91: physical address not found in mem map

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments