In MRG, when a node was fencing another node, its rgmanager coredumped.

Solution Verified - Updated -

Environment

  • Red Hat Messaging Realtime Grid (MRG)
  • rgmanager-3.0.12.1-12

Issue

  • When a node was fencing another node, its rgmanager coredumped.
  • When running rgmanager without the -q flag (the default), rgmanager can crash inside dbus library functions as a result of unlocked access to internal dbus data structures from different rgmanager threads.

Resolution

Any dbus related core dump should be resolved by upgrading to rgmanager-3.0.12.1-17 1.

Root Cause

When running rgmanager without the -q flag (the default), rgmanager can crash inside dbus library functions as a result of unlocked access to internal dbus data structures from different rgmanager threads. The errata above resolves a bug where the absence of a dbus_threads_init_default call would cause improper locking resulting in many different crash scenarios.

Diagnostic Steps

When a node was fencing another node, its rgmanager core dumped. The core dump is of dbus method dbus_warn_check_failed where rgmanager called unref() too many times.

This is a backtrace of the relevant thread:

Thread 1 (Thread 0x7f3527e52700 (LWP 31263)):
#0  0x0000003e70632885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
        resultvar = 0
        pid = 31169
        selftid = 31263
#1  0x0000003e70634065 in abort () at abort.c:92
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0, sa_sigaction = 0}, sa_mask = {__val = {268177094752, 500, 139866279324096, 0, 3, 1, 268174347451, 206158430232, 139866279320848, 
              139866279320640, 139866279320864, 139866279320656, 135200, 4096, 1, 139866279323392}}, sa_flags = 1768453152, sa_restorer = 0xa90}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x0000003a4e82a975 in _dbus_abort () at dbus-sysdeps.c:88
        s = <value optimized out>
#3  0x0000003a4e826845 in _dbus_warn_check_failed (
    format=0x3a4e82f388 "The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details.\n%s") at dbus-internals.c:283
        args = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7f3527e51e10, reg_save_area = 0x7f3527e51d40}}
#4  0x0000003a4e810c62 in _dbus_connection_read_write_dispatch (connection=0x16d5390, timeout_milliseconds=500, dispatch=<value optimized out>) at dbus-connection.c:3512
        dstatus = <value optimized out>
        progress_possible = 1
#5  0x000000000041b261 in _dbus_auto_flush (arg=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/daemons/update-dbus.c:169
        set = {__val = {18446744067267099543, 18446744073709551615 <repeats 15 times>}}
#6  0x0000003e70a077f1 in start_thread (arg=0x7f3527e52700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x7f3527e52700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139866279323392, -1227365437401539918, 268177576352, 139866279324096, 0, 3, 1197335628465232562, -1257800741152171342}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
#7  0x0000003e706e570d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.