JON storage node cluster maintenance failed due to offline agent leaving storage node stuck in maintenance operation mode

Solution Unverified - Updated -

Issue

  • Storage node's cluster status is DOWN
  • Operation mode is stuck in MAINTENANCE
  • Scheduled repair has been aborted due to failed resource operation [Repair]
  • Operation Repair Failed status:

    CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator
    
  • Server log contains the following:

    INFO  [org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean] (http-/0.0.0.0:7080-89) Starting anti-entropy repair on storage cluster
    ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] (RHQScheduler_Worker-3) {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.security-token=SE+Gt1qV0UEp5VFusV1mK20a6OmeDA1nNk1LZDf3Wn5GG/a5KZ2lXGcI2n0ve0G+7h4=, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[invokeOperation], targetInterfaceName=org.rhq.core.clientapi.agent.operation.OperationAgentService}]]. Cause: org.jboss.remoting.CannotConnectException:Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://jboss-01.example.com:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] -> java.net.ConnectException:Connection refused. Cause: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://jboss-01.example.com:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000]
    ERROR [org.rhq.enterprise.server.operation.ResourceOperationJob] (RHQScheduler_Worker-3) Failed to execute scheduled operation [ResourceOperationSchedule: resource=[Resource[id=10033, uuid=809f1b46-933f-4175-8791-f54207134a0b, type={RHQStorage}StorageService, key=org.apache.cassandra.db:type=StorageService, name=Storage Service]],job-name=[rhq-resource-10033--1783761045-1402096288183], job-group=[rhq-resource-10033], operation-name=[takeSnapshot], subject=[Subject[id=1,name=admin]], description=[Run by StorageNodeManagerBean]]: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://jboss-01.example.com:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000]
        at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:855) [jboss-remoting-2.5.4.SP5.jar:]
        at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:169) [jboss-remoting-2.5.4.SP5.jar:]
        at org.jboss.remoting.Client.invoke(Client.java:2084) [jboss-remoting-2.5.4.SP5.jar:]
        at org.jboss.remoting.Client.invoke(Client.java:879) [jboss-remoting-2.5.4.SP5.jar:]
        at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.rawSend(JBossRemotingRemoteCommunicator.java:514) [rhq-enterprise-comm-4.9.0.JON320GA.jar:4.9.0.JON320GA]
        ...
    Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method) [rt.jar:1.7.0_55]
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) [rt.jar:1.7.0_55]
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) [rt.jar:1.7.0_55]
        ...
    

Environment

  • Red Hat JBoss Operations Network (ON) 3.2
  • JBoss ON storage cluster containing two or more storage nodes
  • JBoss ON agent managing a storage node is down or unreachable
  • Storage node cluster maintenance is executed by the storage cluster weekly maintenance task or by invoking the StorageNodeManager.runClusterMaintenance JBoss ON remote API method

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content