kernel panic with task 'gfs_controld' and RIP in 'kref_put+0x4c/0x68'

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux Server 5 (with the High Availability and Resilient Storage Add Ons)

Issue

  • System had a kernel panic with task gfs_controld running

Resolution

The issue is resolved by the following two erratas for Red Hat Enterprise Linux (RHEL):
* RHEL 5.9 for kmod-gfs: RHBA-2013-0082
* RHEL 5.9 for kernel for GFS2: RHBA-2013-0006
* RHEL 6.4 for kernel for GFS2: RHSA-2013-0496

Root Cause

The issue that was addressed in the errata was to add a kobject release function that properly maintains the kobject use count, so that accesses to the sysfs files do not cause an access to freed kernel memory after an unmount.

Diagnostic Steps

A kernel panic will be triggered and a vmcore will need to be captured so that back traces can be analyzed. This issue will contain a back trace that is similar to the following where an error occurs trying to close a file descriptor. This issue usually occurs on shutdown or when unmounting a GFS or GFS2 filesystem:

PID: 18804  TASK: ffff810439f22100  CPU: 4   COMMAND: "gfs_controld"
 #0 [ffff8103fe849c00] crash_kexec at ffffffff800af805
 #1 [ffff8103fe849cc0] __die at ffffffff80065117
 #2 [ffff8103fe849d00] do_page_fault at ffffffff8006748d
 #3 [ffff8103fe849df0] error_exit at ffffffff8005dde9
    [exception RIP: kref_put+76]
    RIP: ffffffff80035725  RSP: ffff8103fe849ea8  RFLAGS: 00010212
    RAX: 0000000000000000  RBX: ffffc200110c9cf4  RCX: ffffffff802993d3
    RDX: 0000000000000034  RSI: ffffffff80154634  RDI: ffffc200110c9cf4
    RBP: ffffffff80154634   R8: 0000000000000000   R9: 0a0a0a0a0a0a0a0a
    R10: 0000000000000000  R11: 0000000000000246  R12: ffff8104263c3d70
    R13: ffff8104263c3d70  R14: ffff81043f2b60c0  R15: ffff8103fc5e1150
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #4 [ffff8103fe849ec0] kref_put at ffffffff80035736
 #5 [ffff8103fe849ee0] sysfs_release at ffffffff801108e7
 #6 [ffff8103fe849f00] __fput at ffffffff80012bf3
 #7 [ffff8103fe849f40] filp_close at ffffffff80023d54
 #8 [ffff8103fe849f60] sys_close at ffffffff8001e1c7
 #9 [ffff8103fe849f80] tracesys at ffffffff8005d28d (via system_call)
    RIP: 0000003125ec6330  RSP: 00007fff318edd08  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
    RDX: 0a0a0a0a0a0a0a0a  RSI: 000000000000000a  RDI: 000000000000000a
    RBP: 000000000000000a   R8: fefefefefefefeff   R9: 0a0a0a0a0a0a0a0a
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000007
    R13: 0000000000000100  R14: ffffffff8001e1c7  R15: ffff810098b68ac0
    ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments