JON storage node cluster maintenance failed due to offline agent leaving storage node stuck in maintenance operation mode
Issue
- Storage node's cluster status is DOWN
- Operation mode is stuck in MAINTENANCE
- Scheduled repair has been aborted due to failed resource operation [Repair]
-
Operation Repair Failed status:
CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator
-
Server log contains the following:
INFO [org.rhq.enterprise.server.storage.StorageNodeOperationsHandlerBean] (http-/0.0.0.0:7080-89) Starting anti-entropy repair on storage cluster ERROR [org.rhq.enterprise.communications.command.client.ClientCommandSenderTask] (RHQScheduler_Worker-3) {ClientCommandSenderTask.send-failed}Failed to send command [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.security-token=SE+Gt1qV0UEp5VFusV1mK20a6OmeDA1nNk1LZDf3Wn5GG/a5KZ2lXGcI2n0ve0G+7h4=, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[invokeOperation], targetInterfaceName=org.rhq.core.clientapi.agent.operation.OperationAgentService}]]. Cause: org.jboss.remoting.CannotConnectException:Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://jboss-01.example.com:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] -> java.net.ConnectException:Connection refused. Cause: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://jboss-01.example.com:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] ERROR [org.rhq.enterprise.server.operation.ResourceOperationJob] (RHQScheduler_Worker-3) Failed to execute scheduled operation [ResourceOperationSchedule: resource=[Resource[id=10033, uuid=809f1b46-933f-4175-8791-f54207134a0b, type={RHQStorage}StorageService, key=org.apache.cassandra.db:type=StorageService, name=Storage Service]],job-name=[rhq-resource-10033--1783761045-1402096288183], job-group=[rhq-resource-10033], operation-name=[takeSnapshot], subject=[Subject[id=1,name=admin]], description=[Run by StorageNodeManagerBean]]: org.jboss.remoting.CannotConnectException: Can not get connection to server. Problem establishing socket connection for InvokerLocator [socket://jboss-01.example.com:16163/?backlog=200&clientMaxPoolSize=304&enableTcpNoDelay=true&maxPoolSize=303&numAcceptThreads=1&rhq.communications.connector.rhqtype=agent&socketTimeout=60000] at org.jboss.remoting.transport.socket.MicroSocketClientInvoker.transport(MicroSocketClientInvoker.java:855) [jboss-remoting-2.5.4.SP5.jar:] at org.jboss.remoting.MicroRemoteClientInvoker.invoke(MicroRemoteClientInvoker.java:169) [jboss-remoting-2.5.4.SP5.jar:] at org.jboss.remoting.Client.invoke(Client.java:2084) [jboss-remoting-2.5.4.SP5.jar:] at org.jboss.remoting.Client.invoke(Client.java:879) [jboss-remoting-2.5.4.SP5.jar:] at org.rhq.enterprise.communications.command.client.JBossRemotingRemoteCommunicator.rawSend(JBossRemotingRemoteCommunicator.java:514) [rhq-enterprise-comm-4.9.0.JON320GA.jar:4.9.0.JON320GA] ... Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) [rt.jar:1.7.0_55] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) [rt.jar:1.7.0_55] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) [rt.jar:1.7.0_55] ...
Environment
- Red Hat JBoss Operations Network (ON) 3.2
- JBoss ON storage cluster containing two or more storage nodes
- JBoss ON agent managing a storage node is down or unreachable
- Storage node cluster maintenance is executed by the storage cluster weekly maintenance task or by invoking the
StorageNodeManager.runClusterMaintenance
JBoss ON remote API method
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.