State-Transfer issue during startup a new instance joining the cluster in RHDG

Solution Verified - Updated -


  • There are a huge number of caches configured and sometimes a startup from a new instance will fail and it get not ready, management, HTTP and HotRod ports are not listening.
DEBUG [org.infinispan.statetransfer.StateConsumerImpl] Adding inbound state transfer for segments [...] of cache demo
DEBUG [org.infinispan.statetransfer.StateConsumerImpl] Removing no longer owned entries for cache demo
DEBUG [org.infinispan.statetransfer.InboundTransferTask] Finished receiving state for segments [...] of cache demo
DEBUG [org.infinispan.statetransfer.StateConsumerImpl] Finished receiving of segments for cache demo for topology 123.
DEBUG [org.infinispan.statetransfer.StateConsumerImpl] Removing no longer owned entries for cache demo
INFO  [] DGISPN0001: Started demo cache from clustered container
  • Startup of a new instance will fail but all our caches are not waiting for initial-state-transfer or are configured with a proper timeout setting, what is the reason?
  • Startup will fail for state transfer after 60 seconds with a cache we don't use, so it is empty and should not cause issues with timeouts!
ERROR [] (MSC service thread 1-5) MSC000001: Failed to start service jboss.datagrid-infinispan.clustered.memcachedCache: org.jboss.msc.service.StartException in service jboss.datagrid-infinispan.clustered.memcachedCache: Failed to start service
    at org.jboss.msc.service.ServiceControllerImpl$ [jboss-msc-1.2.6.Final-redhat-1.jar:1.2.6.Final-redhat-1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker( [rt.jar:1.8.0_121]
    at java.util.concurrent.ThreadPoolExecutor$ [rt.jar:1.8.0_121]
    at [rt.jar:1.8.0_121]
Caused by: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.start() throws java.lang.Exception on object of type StateTransferManagerImpl
Caused by: org.infinispan.util.concurrent.TimeoutException: Replication timeout for (flags=0), site-id=s1, rack-id=r2, machine-id=m2)
  • Startup will fail randomly with timeouts but if tried right after that it will start without any issue, what is the reason for this and how to avoid it?


  • Red Hat Data Grid (RHDG)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In