Corosync crashes after "[TOTEM] FAILED TO RECEIVE" in RHEL 6 cluster

Issue

Node is removed from the cluster after a crash in corosync following a "FAILED TO RECEIVE" condition
corosync crashes due to a SIGABRT and dumps a core after seeing "[TOTEM] FAILED TO RECEIVE" in logs

  Oct  2 09:13:56 node1 corosync[5670]:   [TOTEM ] Retransmit List: 98
  Oct  2 09:13:56 node1 corosync[5670]:   [TOTEM ] Retransmit List: 98
  [...]
  Oct  2 09:31:31 node1 corosync[5670]:   [TOTEM ] Retransmit List: 98 99
  Oct  2 09:31:32 node1 corosync[5670]:   [TOTEM ] Retransmit List: 98 99
  [...]
  Oct  2 09:49:35 node1 corosync[5670]:   [TOTEM ] Retransmit List: 98 99 9b 9c
  Oct  2 09:49:35 node1 corosync[5670]:   [TOTEM ] Retransmit List: 98 99 9b 9c
  [...]
  Oct  2 10:24:57 node1 corosync[5670]:   [TOTEM ] Retransmit List: 98 99 9b 9c
  Oct  2 10:24:59 node1 corosync[5670]:   [TOTEM ] Retransmit List: 98 99 9b 9c
  Oct  2 10:24:59 node1 corosync[5670]:   [TOTEM ] FAILED TO RECEIVE
  Oct  2 10:25:01 node1 abrtd: Directory 'ccpp-2012-10-02-10:25:01-5670' creation detected
  Oct  2 10:25:01 node1 abrt[14835]: Saved core dump of pid 5670 (/usr/sbin/corosync) to /var/spool/abrt/ccpp-2012-10-02-10:25:01-5670 (65900544 bytes)
  Oct  2 10:25:01 node1 dlm_controld[5743]: cluster is down, exiting
  Oct  2 10:25:01 node1 gfs_controld[5792]: cluster is down, exiting

core dumped by corosync after FAILED TO RECEIVE shows a failed assertion in memb_consensus_agreed

#0  0x0000003416e32885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64    return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x0000003416e32885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003416e34065 in abort () at abort.c:92
#2  0x0000003416e2b9fe in __assert_fail_base (fmt=<value optimized out>, assertion=0x3462e23ef5 "token_memb_entries >= 1", file=0x3462e23e8d "totemsrp.c", 
    line=<value optimized out>, function=<value optimized out>) at assert.c:96
#3  0x0000003416e2bac0 in __assert_fail (assertion=0x3462e23ef5 "token_memb_entries >= 1", file=0x3462e23e8d "totemsrp.c", line=1211, function=0x3462e25150 "memb_consensus_agreed")
    at assert.c:105
#4  0x0000003462e12e86 in memb_consensus_agreed (instance=0x7f1852e24010) at totemsrp.c:1211
#5  0x0000003462e17513 in memb_join_process (instance=0x7f1852e24010, memb_join=0xf344fc) at totemsrp.c:4007
#6  0x0000003462e17839 in message_handler_memb_join (instance=0x7f1852e24010, msg=<value optimized out>, msg_len=<value optimized out>, 
    endian_conversion_needed=<value optimized out>) at totemsrp.c:4250
#7  0x0000003462e10d18 in rrp_deliver_fn (context=0xef19c0, msg=0xf344fc, msg_len=245) at totemrrp.c:1747
#8  0x0000003462e0b9a8 in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, revents=<value optimized out>, data=0xf33e30) at totemudp.c:1252
#9  0x0000003462e07132 in poll_run (handle=2111858625151500288) at coropoll.c:513
#10 0x0000000000406eb9 in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at main.c:1852

After we started the cman service on each node, node1 and node3 were rebooted and the corefile was generated by corosync on node2.

Environment

Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add on
Multicast communication between nodes (ie. not UDPU or broadcast)
corosync releases prior to 1.4.1-17.el6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Corosync crashes after "[TOTEM] FAILED TO RECEIVE" in RHEL 6 cluster

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links