JBoss errors when multicast stops working
Environment
- JBoss Enterprise Application Platform (EAP)
- JBoss Data Grid (JDG)
Issue
- If any of our nodes go down abruptly the entire cluster becomes unstable and starts throwing Replication exception
- All of a sudden WARNs such as the following start appearing in the logs, and last more than a few minutes
WARN [GMS] failed to collect all ACKs (3) for view [node1:1234|4] [node1:1234, node2:2345, node3:3456] after 2000ms,
missing ACKs from [node2:2345] (received=[node1:1234, node3:3456]), local_addr=node1:1234
netstat -gMulticast group membership which used by JBoss disappeared after several minutes
Resolution
- Disable "IGMP Snooping" in the network switches
- Use different network switch
Root Cause
- Multicast messages are not delivered between some nodes
- This can be caused by a firmware issue in some Cisco Catalyst switches when the "IGMP Snooping" option is enabled
Diagnostic Steps
- Enable JGroups TRACE logging
netstat -gto see multicast membership
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
