In MRG, when a node was fencing another node, its rgmanager coredumped.
Environment
- Red Hat Messaging Realtime Grid (MRG)
rgmanager-3.0.12.1-12
Issue
- When a node was fencing another node, its
rgmanager
coredumped. - When running
rgmanager
without the -q flag (the default),rgmanager
can crash inside dbus library functions as a result of unlocked access to internal dbus data structures from differentrgmanager
threads.
Resolution
Any dbus related core dump should be resolved by upgrading to rgmanager-3.0.12.1-17
1.
Root Cause
When running rgmanager
without the -q flag (the default), rgmanager
can crash inside dbus library functions as a result of unlocked access to internal dbus data structures from different rgmanager
threads. The errata above resolves a bug where the absence of a dbus_threads_init_default
call would cause improper locking resulting in many different crash scenarios.
Diagnostic Steps
When a node was fencing another node, its rgmanager
core dumped. The core dump is of dbus method dbus_warn_check_failed
where rgmanager
called unref()
too many times.
This is a backtrace of the relevant thread:
Thread 1 (Thread 0x7f3527e52700 (LWP 31263)):
#0 0x0000003e70632885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
resultvar = 0
pid = 31169
selftid = 31263
#1 0x0000003e70634065 in abort () at abort.c:92
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0, sa_sigaction = 0}, sa_mask = {__val = {268177094752, 500, 139866279324096, 0, 3, 1, 268174347451, 206158430232, 139866279320848,
139866279320640, 139866279320864, 139866279320656, 135200, 4096, 1, 139866279323392}}, sa_flags = 1768453152, sa_restorer = 0xa90}
sigs = {__val = {32, 0 <repeats 15 times>}}
#2 0x0000003a4e82a975 in _dbus_abort () at dbus-sysdeps.c:88
s = <value optimized out>
#3 0x0000003a4e826845 in _dbus_warn_check_failed (
format=0x3a4e82f388 "The last reference on a connection was dropped without closing the connection. This is a bug in an application. See dbus_connection_unref() documentation for details.\n%s") at dbus-internals.c:283
args = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7f3527e51e10, reg_save_area = 0x7f3527e51d40}}
#4 0x0000003a4e810c62 in _dbus_connection_read_write_dispatch (connection=0x16d5390, timeout_milliseconds=500, dispatch=<value optimized out>) at dbus-connection.c:3512
dstatus = <value optimized out>
progress_possible = 1
#5 0x000000000041b261 in _dbus_auto_flush (arg=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/daemons/update-dbus.c:169
set = {__val = {18446744067267099543, 18446744073709551615 <repeats 15 times>}}
#6 0x0000003e70a077f1 in start_thread (arg=0x7f3527e52700) at pthread_create.c:301
__res = <value optimized out>
pd = 0x7f3527e52700
now = <value optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139866279323392, -1227365437401539918, 268177576352, 139866279324096, 0, 3, 1197335628465232562, -1257800741152171342},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <value optimized out>
pagesize_m1 = <value optimized out>
sp = <value optimized out>
freesize = <value optimized out>
#7 0x0000003e706e570d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments