rgmanager segfaults in a RHEL 6 High Availability cluster using RRP and cpglockd when stopping cman
Issue
- When stopping
cman
on a node,rgmanager
crashes - If I
service cman stop
, the node reboots itself.
Jun 4 14:40:37 node1 corosync[42343]: [QUORUM] Members[5]: 1 3
Jun 4 14:40:37 node1 corosync[42343]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 4 14:40:37 node1 kernel: dlm: closing connection to node 2
Jun 4 14:40:37 node1 corosync[42343]: [CPG ] chosen downlist: sender r(0) ip(192.168.10.11) r(1) ip(192.168.11.11) ; members(old:6 left:1)
Jun 4 14:40:37 node1 corosync[42343]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 4 14:43:17 node1 kernel: dlm: closing connection to node 3
Jun 4 14:43:17 node1 kernel: dlm: closing connection to node 1
Jun 4 14:43:20 node1 cpglockd[43118]: cman requested shutdown. Exiting.
Jun 4 14:43:20 node1 abrtd: Directory 'ccpp-2016-06-04-14:43:20-43197' creation detected
Jun 4 14:43:20 node1 abrt[40893]: Saved core dump of pid 43197 (/usr/sbin/rgmanager) to /var/spool/abrt/ccpp-2016-06-04-14:43:20-43197 (54935552 bytes)
Jun 4 14:43:20 node1 corosync[42343]: [SERV ] Unloading all Corosync service engines.
Jun 4 14:43:20 node1 corosync[42343]: [SERV ] Service engine unloaded: corosync extended virtual synchrony service
Jun 4 14:43:20 node1 corosync[42343]: [SERV ] Service engine unloaded: corosync configuration service
Jun 4 14:43:20 node1 corosync[42343]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
Jun 4 14:43:20 node1 corosync[42343]: [SERV ] Service engine unloaded: corosync cluster config database access v1.01
Jun 4 14:43:20 node1 corosync[42343]: [SERV ] Service engine unloaded: corosync profile loading service
Jun 4 14:43:20 node1 corosync[42343]: [SERV ] Service engine unloaded: openais checkpoint service B.01.01
Jun 4 14:43:20 node1 corosync[42343]: [SERV ] Service engine unloaded: corosync CMAN membership service 2.90
Jun 4 14:43:20 node1 corosync[42343]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
Jun 4 14:43:20 node1 corosync[42343]: [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1947.
rgmanager
segfaults with a backtrace showing a SIGSEGV in_cpg_lock
Core was generated by `rgmanager'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000309f80f5db in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
42 sig);
#0 0x000000309f80f5db in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1 0x00000000004235f2 in _cpg_lock (mode=<value optimized out>, lksb=0x6323b0, options=<value optimized out>, resource=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/clulib/cpg_lock.c:78
#2 0x00000000004117d6 in event_master () at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/daemons/rg_event.c:339
#3 0x0000000000411a75 in _event_thread_f (arg=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/daemons/rg_event.c:419
#4 0x000000309f8079d1 in start_thread (arg=0x7f3fddcb4700) at pthread_create.c:301
#5 0x000000309f4e8b6d in signalfd (fd=-573880576, mask=0x7f3fddcb3b20, flags=16843009) at ../sysdeps/unix/sysv/linux/signalfd.c:30
#6 0x0000000000000000 in ?? ()
Core was generated by `rgmanager'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000309f80f5db in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
42 sig);
#0 0x000000309f80f5db in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1 0x00000000004235f2 in _cpg_lock (mode=<value optimized out>, lksb=0x6323b0, options=<value optimized out>, resource=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/clulib/cpg_lock.c:78
#2 0x00000000004117d6 in event_master () at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/daemons/rg_event.c:339
#3 0x0000000000411a75 in _event_thread_f (arg=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/daemons/rg_event.c:419
#4 0x000000309f8079d1 in start_thread (arg=0x7f3225963700) at pthread_create.c:301
#5 0x000000309f4e8b6d in signalfd (fd=630601472, mask=0x7f3225962b20, flags=16843009) at ../sysdeps/unix/sysv/linux/signalfd.c:30
#6 0x0000000000000000 in ?? ()
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add-On
- Cluster configured to use RRP -
clusternodes
contain<altname/>
in/etc/cluster/cluster.conf
rgmanager
- Use of RRP causes
cpglockd
to be running
- Use of RRP causes
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.