How to prevent `FAILED TO RECEIVE` on overloaded network used by `corosync`?

Solution In Progress - Updated -

Issue

Corosync transmit error results in fencing and failover, which causes business interruption

2020-12-07T16:03:09.105053+00:00 node01 corosync[9759]: [TOTEM ] Retransmit List: 6724 
2020-12-07T16:03:09.422888+00:00 node01 corosync[9759]: [TOTEM ] Retransmit List: 6724
2020-12-07T16:03:09.422923+00:00 node01 corosync[9759]: [TOTEM ] FAILED TO RECEIVE

Environment

  • Red Hat Enterprise Linux Server 7, 8 (with the High Availability Add On)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content