JON storage node cluster maintenance failed due to offline agent leaving storage node stuck in maintenance operation mode
Issue
- Storage node's cluster status is DOWN
- Operation mode is stuck in MAINTENANCE
- Scheduled repair has been aborted due to failed resource operation [Repair]
-
Operation Repair Failed status:
CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator -
Server log contains the following:
INFO [org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean] (http-/0.0.0.0:7080-89) Starting anti-entropy repair on storage cluster ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] (RHQScheduler_Worker-3) {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.security-token=SE+Gt1qV0UEp5VFusV1mK20a6OmeDA1nNk1LZDf3Wn5GG/a5KZ2lXGcI2n0ve0G+7h4=, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[invokeOperation], targetInterfaceName=org.rhq.core.clientapi.agent.operation.OperationAgentService}]]. Cause: org.jboss.remoting.CannotConnectException:Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://jboss-01.example.com:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] -> java.net.ConnectException:Connection refused. Cause: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://jboss-01.example.com:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] ERROR [org.rhq.enterprise.server.operation.ResourceOperationJob] (RHQScheduler_Worker-3) Failed to execute scheduled operation [ResourceOperationSchedule: resource=[Resource[id=10033, uuid=809f1b46-933f-4175-8791-f54207134a0b, type={RHQStorage}StorageService, key=org.apache.cassandra.db:type=StorageService, name=Storage Service]],job-name=[rhq-resource-10033--1783761045-1402096288183], job-group=[rhq-resource-10033], operation-name=[takeSnapshot], subject=[Subject[id=1,name=admin]], description=[Run by StorageNodeManagerBean]]: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://jboss-01.example.com:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:855) [jboss-remoting-2.5.4.SP5.jar:] at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:169) [jboss-remoting-2.5.4.SP5.jar:] at org.jboss.remoting.Client.invoke(Client.java:2084) [jboss-remoting-2.5.4.SP5.jar:] at org.jboss.remoting.Client.invoke(Client.java:879) [jboss-remoting-2.5.4.SP5.jar:] at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.rawSend(JBossRemotingRemoteCommunicator.java:514) [rhq-enterprise-comm-4.9.0.JON320GA.jar:4.9.0.JON320GA] ... Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) [rt.jar:1.7.0_55] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) [rt.jar:1.7.0_55] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) [rt.jar:1.7.0_55] ...
Environment
- Red Hat JBoss Operations Network (ON) 3.2
- JBoss ON storage cluster containing two or more storage nodes
- JBoss ON agent managing a storage node is down or unreachable
- Storage node cluster maintenance is executed by the storage cluster weekly maintenance task or by invoking the
StorageNodeManager.runClusterMaintenanceJBoss ON remote API method
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
