rgmanager segfaults in a RHEL 6 High Availability cluster using RRP and cpglockd when stopping cman

Solution Unverified - Updated -

Issue

  • When stopping cman on a node, rgmanager crashes
  • If I service cman stop, the node reboots itself.
Jun  4 14:40:37 node1 corosync[42343]:   [QUORUM] Members[5]: 1 3
Jun  4 14:40:37 node1 corosync[42343]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun  4 14:40:37 node1 kernel: dlm: closing connection to node 2
Jun  4 14:40:37 node1 corosync[42343]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.10.11) r(1) ip(192.168.11.11) ; members(old:6 left:1)
Jun  4 14:40:37 node1 corosync[42343]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun  4 14:43:17 node1 kernel: dlm: closing connection to node 3
Jun  4 14:43:17 node1 kernel: dlm: closing connection to node 1
Jun  4 14:43:20 node1 cpglockd[43118]: cman requested shutdown. Exiting.
Jun  4 14:43:20 node1 abrtd: Directory 'ccpp-2016-06-04-14:43:20-43197' creation detected
Jun  4 14:43:20 node1 abrt[40893]: Saved core dump of pid 43197 (/usr/sbin/rgmanager) to /var/spool/abrt/ccpp-2016-06-04-14:43:20-43197 (54935552 bytes)
Jun  4 14:43:20 node1 corosync[42343]:   [SERV  ] Unloading all Corosync service engines.
Jun  4 14:43:20 node1 corosync[42343]:   [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
Jun  4 14:43:20 node1 corosync[42343]:   [SERV  ] Service engine unloaded: corosync configuration service
Jun  4 14:43:20 node1 corosync[42343]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Jun  4 14:43:20 node1 corosync[42343]:   [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
Jun  4 14:43:20 node1 corosync[42343]:   [SERV  ] Service engine unloaded: corosync profile loading service
Jun  4 14:43:20 node1 corosync[42343]:   [SERV  ] Service engine unloaded: openais checkpoint service B.01.01
Jun  4 14:43:20 node1 corosync[42343]:   [SERV  ] Service engine unloaded: corosync CMAN membership service 2.90
Jun  4 14:43:20 node1 corosync[42343]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Jun  4 14:43:20 node1 corosync[42343]:   [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:1947.
  • rgmanager segfaults with a backtrace showing a SIGSEGV in _cpg_lock
Core was generated by `rgmanager'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000309f80f5db in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
42               sig);
#0  0x000000309f80f5db in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x00000000004235f2 in _cpg_lock (mode=<value optimized out>, lksb=0x6323b0, options=<value optimized out>, resource=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/clulib/cpg_lock.c:78
#2  0x00000000004117d6 in event_master () at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/daemons/rg_event.c:339
#3  0x0000000000411a75 in _event_thread_f (arg=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/daemons/rg_event.c:419
#4  0x000000309f8079d1 in start_thread (arg=0x7f3fddcb4700) at pthread_create.c:301
#5  0x000000309f4e8b6d in signalfd (fd=-573880576, mask=0x7f3fddcb3b20, flags=16843009) at ../sysdeps/unix/sysv/linux/signalfd.c:30
#6  0x0000000000000000 in ?? ()
Core was generated by `rgmanager'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000309f80f5db in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
42               sig);
#0  0x000000309f80f5db in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x00000000004235f2 in _cpg_lock (mode=<value optimized out>, lksb=0x6323b0, options=<value optimized out>, resource=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/clulib/cpg_lock.c:78
#2  0x00000000004117d6 in event_master () at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/daemons/rg_event.c:339
#3  0x0000000000411a75 in _event_thread_f (arg=<value optimized out>) at /usr/src/debug/rgmanager-3.0.12.1/rgmanager/src/daemons/rg_event.c:419
#4  0x000000309f8079d1 in start_thread (arg=0x7f3225963700) at pthread_create.c:301
#5  0x000000309f4e8b6d in signalfd (fd=630601472, mask=0x7f3225962b20, flags=16843009) at ../sysdeps/unix/sysv/linux/signalfd.c:30
#6  0x0000000000000000 in ?? ()

Environment

  • Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add-On
  • Cluster configured to use RRP - clusternodes contain <altname/> in /etc/cluster/cluster.conf
  • rgmanager
    • Use of RRP causes cpglockd to be running

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content