"Unable to announce backup" issue when using HornetQ with NFS in JBoss EAP

Solution Unverified - Updated -

Environment

  • Red Hat JBoss Enterprise Application Platform (EAP)
    • 5.1.2
    • 6.2

Issue

When starting the server the log shows the following repeating exception and HornetQ does not start:

2013-04-17 14:36:12,942 WARN  [org.hornetq.core.server.cluster.impl.ClusterConnectionImpl] Unable to announce backup, retrying
HornetQException[errorCode=3 message=Timed out waiting to receive initial broadcast from cluster]
        at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:726)
        at org.hornetq.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:603)
        at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$2.run(ClusterConnectionImpl.java:485)
        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

This is on the live server, same results with backup server up or down.

Resolution

Deleted all the journal files and server.lock then restart JBoss EAP and let HornetQ rebuild the journal.

Root Cause

  • Correlating the TRACE logging with the HornetQ code the node is getting confused because it isn't able to get a proper lock on the "server.lock" file which is in the journal directory.

Diagnostic Steps

  • Send the server.log with TRACE logging for "org.hornetq" to support for analysis.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments