How to prevent `FAILED TO RECEIVE` on overloaded network used by `corosync`?
Issue
Corosync transmit error results in fencing and failover, which causes business interruption
2020-12-07T16:03:09.105053+00:00 node01 corosync[9759]: [TOTEM ] Retransmit List: 6724
2020-12-07T16:03:09.422888+00:00 node01 corosync[9759]: [TOTEM ] Retransmit List: 6724
2020-12-07T16:03:09.422923+00:00 node01 corosync[9759]: [TOTEM ] FAILED TO RECEIVE
Environment
- Red Hat Enterprise Linux Server 7, 8, 9 (with the High Availability Add On)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.