JBoss errors when multicast stops working

Solution Unverified - Updated -

Environment

  • JBoss Enterprise Application Platform (EAP)
  • JBoss Data Grid (JDG)

Issue

  • If any of our nodes go down abruptly the entire cluster becomes unstable and starts throwing Replication exception
  • All of a sudden WARNs such as the following start appearing in the logs, and last more than a few minutes
WARN  [GMS] failed to collect all ACKs (3) for view [node1:1234|4] [node1:1234, node2:2345, node3:3456] after 2000ms,
missing ACKs from [node2:2345] (received=[node1:1234, node3:3456]), local_addr=node1:1234
  • netstat -g Multicast group membership which used by JBoss disappeared after several minutes

Resolution

  • Disable "IGMP Snooping" in the network switches
  • Use different network switch

Root Cause

  • Multicast messages are not delivered between some nodes
  • This can be caused by a firmware issue in some Cisco Catalyst switches when the "IGMP Snooping" option is enabled

Diagnostic Steps

  • Enable JGroups TRACE logging
  • netstat -g to see multicast membership

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.