Kernel panic with "general protection fault" in function __list_del_entry()
Environment
- Red Hat Enterprise Linux 7
Issue
- The server crashes with
general protection fault: 0000 [#1] SMPerror in the function__list_del_entry()which is called bydrm_property_free_blob().
crash> sys | grep PANIC
PANIC: "general protection fault: 0000 [#1] SMP "
crash> set -p
PID: 17886
COMMAND: "kworker/2:1"
TASK: ffff9dad9d721080 [THREAD_INFO: ffff9daf5c6a4000]
CPU: 2
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 17886 TASK: ffff9dad9d721080 CPU: 2 COMMAND: "kworker/2:1"
#0 [ffff9daf5c6a7938] machine_kexec at ffffffff8e8662f4
#1 [ffff9daf5c6a7998] __crash_kexec at ffffffff8e922b82
#2 [ffff9daf5c6a7a68] crash_kexec at ffffffff8e922c70
#3 [ffff9daf5c6a7a80] oops_end at ffffffff8ef91798
#4 [ffff9daf5c6a7aa8] die at ffffffff8e830a7b
#5 [ffff9daf5c6a7ad8] do_general_protection at ffffffff8ef91092
#6 [ffff9daf5c6a7b10] general_protection at ffffffff8ef90718
[exception RIP: __list_del_entry+41] <<-----
RIP: ffffffff8eba6549 RSP: ffff9daf5c6a7bc8 RFLAGS: 00010203
RAX: 0038615acd3c2b91 RBX: ffff9dad7edaed28 RCX: dead000000000200
RDX: ffff9daac3d5b428 RSI: ffffffffc047e59d RDI: ffff9dad7edaed28
RBP: ffff9daf5c6a7bc8 R8: 000000000001f100 R9: ffffffffc044f462
R10: ffff9dafffd1f100 R11: ffffed800a5bd0c0 R12: ffff9dad7edaed00
R13: ffff9dad7edaed10 R14: ffff9dafe32df800 R15: ffff9dafd7b22760
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff9daf5c6a7bd0] list_del at ffffffff8eba65fd
#8 [ffff9daf5c6a7be8] drm_property_free_blob at ffffffffc0456b1c [drm]
#9 [ffff9daf5c6a7c08] drm_mode_object_put at ffffffffc04562b8 [drm]
#10 [ffff9daf5c6a7c30] drm_property_blob_put at ffffffffc0456b63 [drm]
#11 [ffff9daf5c6a7c40] __drm_atomic_helper_plane_destroy_state at ffffffffc04eaafe [drm_kms_helper]
#12 [ffff9daf5c6a7c58] drm_atomic_helper_plane_destroy_state at ffffffffc04eaba5 [drm_kms_helper]
#13 [ffff9daf5c6a7c70] vmw_du_plane_destroy_state at ffffffffc0536d08 [vmwgfx]
#14 [ffff9daf5c6a7ca0] drm_atomic_state_default_clear at ffffffffc0450f0a [drm]
#15 [ffff9daf5c6a7ce0] drm_atomic_state_clear at ffffffffc0451035 [drm]
#16 [ffff9daf5c6a7cf0] __drm_atomic_state_free at ffffffffc0451058 [drm]
#17 [ffff9daf5c6a7d10] drm_atomic_helper_dirtyfb at ffffffffc04eb139 [drm_kms_helper]
#18 [ffff9daf5c6a7d98] vmw_fb_dirty_flush at ffffffffc053c9d7 [vmwgfx]
#19 [ffff9daf5c6a7e20] process_one_work at ffffffff8e8bdfdf
#20 [ffff9daf5c6a7e68] worker_thread at ffffffff8e8bf0f6
#21 [ffff9daf5c6a7ec8] kthread at ffffffff8e8c5fb1
Resolution
- The resolution is undetermined yet as the available data is insufficient.
Workaround
- Engage the hardware vendor for any issues from the hardware side.
- Blacklist the third-party modules and check if this issue is reproducible.
- Upgrade the system to the latest kernel and check if the issue persists with the latest kernel.
Root Cause
- The issue is occurring due to
memory corruption. However, the exact root cause of this memory corruption is unknown due to insufficient data. - The possible root cause for such memory corruption can be either
faulty hardware,3rd-party kernel modules, or akernel bug.
Diagnostic Steps
- The panic string shows the kernel panic due to a
general protection fault.
crash> sys | grep PANIC
PANIC: "general protection fault: 0000 [#1] SMP " <<-----
- The backtrace of the panic task shows that exception RIP as
__list_del_entry.
crash> bt
PID: 17886 TASK: ffff9dad9d721080 CPU: 2 COMMAND: "kworker/2:1"
#0 [ffff9daf5c6a7938] machine_kexec at ffffffff8e8662f4
...
#6 [ffff9daf5c6a7b10] general_protection at ffffffff8ef90718
[exception RIP: __list_del_entry+0x29] <<-----
RIP: ffffffff8eba6549 RSP: ffff9daf5c6a7bc8 RFLAGS: 00010203
RAX: 0038615acd3c2b91 RBX: ffff9dad7edaed28 RCX: dead000000000200
RDX: ffff9daac3d5b428 RSI: ffffffffc047e59d RDI: ffff9dad7edaed28
RBP: ffff9daf5c6a7bc8 R8: 000000000001f100 R9: ffffffffc044f462
R10: ffff9dafffd1f100 R11: ffffed800a5bd0c0 R12: ffff9dad7edaed00
R13: ffff9dad7edaed10 R14: ffff9dafe32df800 R15: ffff9dafd7b22760
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff9daf5c6a7bd0] list_del at ffffffff8eba65fd
#8 [ffff9daf5c6a7be8] drm_property_free_blob at ffffffffc0456b1c [drm] <<-----
#9 [ffff9daf5c6a7c08] drm_mode_object_put at ffffffffc04562b8 [drm]
#10 [ffff9daf5c6a7c30] drm_property_blob_put at ffffffffc0456b63 [drm]
#11 [ffff9daf5c6a7c40] __drm_atomic_helper_plane_destroy_state at ffffffffc04eaafe [drm_kms_helper]
#12 [ffff9daf5c6a7c58] drm_atomic_helper_plane_destroy_state at ffffffffc04eaba5 [drm_kms_helper]
#13 [ffff9daf5c6a7c70] vmw_du_plane_destroy_state at ffffffffc0536d08 [vmwgfx]
#14 [ffff9daf5c6a7ca0] drm_atomic_state_default_clear at ffffffffc0450f0a [drm]
#15 [ffff9daf5c6a7ce0] drm_atomic_state_clear at ffffffffc0451035 [drm]
#16 [ffff9daf5c6a7cf0] __drm_atomic_state_free at ffffffffc0451058 [drm]
#17 [ffff9daf5c6a7d10] drm_atomic_helper_dirtyfb at ffffffffc04eb139 [drm_kms_helper]
#18 [ffff9daf5c6a7d98] vmw_fb_dirty_flush at ffffffffc053c9d7 [vmwgfx]
#19 [ffff9daf5c6a7e20] process_one_work at ffffffff8e8bdfdf
#20 [ffff9daf5c6a7e68] worker_thread at ffffffff8e8bf0f6
#21 [ffff9daf5c6a7ec8] kthread at ffffffff8e8c5fb1
- Kernel panicked in the function
__list_del_entry()called bydrm_property_free_blob().
drm_property_free_blob() source:
524 static void drm_property_free_blob(struct kref *kref)
525 {
526 struct drm_property_blob *blob =
527 container_of(kref, struct drm_property_blob, base.refcount);
528
529 mutex_lock(&blob->dev->mode_config.blob_lock);
530 list_del(&blob->head_global); <<<< PANIC HERE
531 mutex_unlock(&blob->dev->mode_config.blob_lock);
532
533 drm_mode_object_unregister(blob->dev, &blob->base);
534
535 kvfree(blob);
536 }
- Disassembly of the function
drm_property_free_bloband its address ffffffffc0456b1c
crash> dis -r ffffffffc0456b1c
0xffffffffc0456af0 <drm_property_free_blob>: data16 data16 data16 xchg %ax,%ax
0xffffffffc0456af5 <drm_property_free_blob+0x5>: push %rbp
0xffffffffc0456af6 <drm_property_free_blob+0x6>: mov %rsp,%rbp
0xffffffffc0456af9 <drm_property_free_blob+0x9>: push %r12
0xffffffffc0456afb <drm_property_free_blob+0xb>: push %rbx
0xffffffffc0456afc <drm_property_free_blob+0xc>: mov 0x10(%rdi),%rax
0xffffffffc0456b00 <drm_property_free_blob+0x10>: mov %rdi,%rbx
...
0xffffffffc0456b17 <drm_property_free_blob+0x27>: call 0xffffffff8eba65f0 <list_del> <<-----
0xffffffffc0456b1c <drm_property_free_blob+0x2c>: mov 0x10(%rbx),%rax
crash> dis 0xffffffff8eba65f0
0xffffffff8eba65f0 <list_del>: push %rbp
0xffffffff8eba65f1 <list_del+0x1>: mov %rsp,%rbp
0xffffffff8eba65f4 <list_del+0x4>: push %rbx
#7 [ffff9daf5c6a7bd0] list_del at ffffffff8eba65fd
ffff9daf5c6a7bd8: ffff9dad7edaed10 ffff9daf5c6a7c00
ffff9daf5c6a7be8: ffffffffc0456b1c
#8 [ffff9daf5c6a7be8] drm_property_free_blob at ffffffffc0456b1c [drm]
krefpointer is at ffff9dad7edaed10 anddrm_property_blobstructure is at ffff9dad7edaed00
crash> eval ffff9dad7edaed10 - 0x10
hexadecimal: ffff9dad7edaed00
crash> drm_property_blob.head_global ffff9dad7edaed00
head_global = {
next = 0xffff9daac3d5b428,
prev = 0x38615acd3c2b91
},
- prev pointer is equal to 0x38615acd3c2b91. This is not a valid pointer that shows an issue of memory corruption.
crash> kmem 0x38615acd3c2b91
kmem: cannot determine page for 38615acd3c2b91
38615acd3c2b91: physical address not found in mem map
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments