Configuring jboss.as.management.blocking.timeout in JBoss EAP 6/7

Solution Verified - Updated -

Environment

  • Red Hat JBoss Enterprise Application Platform (EAP)
    • 6.3 or later
    • 7
  • Red Hat Single Sign-On (RH-SSO) 7.x

Issue

  • The deployment produces a ".failed" file in the deployments folder, rather than a ".deployment" file.
  • Slave servers don't start, master has error in log:
  • Operation timeout awaiting service container stability
  • java.util.concurrent.TimeoutException errors on startup

    • In JBoss EAP 7 (the timeout seconds and operation details can vary)

      ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0348: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[
          ("core-service" => "management"),
          ("management-interface" => "http-interface")
      ]'
      
    • In JBoss EAP 6 (the timeout seconds and operation details can vary)

      ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) JBAS013412: Timeout after [300] seconds waiting for service container stability. Operation will roll back. Step that first updated the service container was 'add' at address '[("interface" => "management")]'
      
  • Unable to deploy WAR/EAR file, master server returns deployment error

    "WFLYDC0074: Operation failed or was rolled back on all servers. Server failures:" => {"server-group" => {"main-server-group" => {"host" => {"slave1" => {"mainserver1" => "WFLYCTL0409: Execution of operation 'deploy' on remote process at address '[
        (\"host\" => \"slave1\"),
        (\"server\" => \"mainserver1\")
    ]' timed out after 305000 ms while awaiting initial response; remote process has been notified to terminate operation"}}}}}},
        "rolled-back" => true,
        "server-groups" => {"cbx-server-group" => {"host" => {"slave1" => {"mainserver1" => {"response" => {
            "outcome" => "failed",
            "result" => undefined,
            "failure-description" => "WFLYCTL0409: Execution of operation 'deploy' on remote process at address '[
        (\"host\" => \"slave1\"),
        (\"server\" => \"mainserver1\")
    ]' timed out after 305000 ms while awaiting initial response; remote process has been notified to terminate operation",
            "rolled-back" => true
        }}}}}}
    
  • Has Bugzilla issue 1117945 been fixed?

Resolution

Configure jboss.as.management.blocking.timeout as a system property to tune the timeout (seconds) waiting for service container stability. The default is 300 seconds.

See the solution Add/remove/update system properties in JBoss EAP 6/7 for how to set system properties in JBoss EAP in various modes of operation.

For standalone mode, use the CLI command /system-property=jboss.as.management.blocking.timeout:add(value=N), N being a value to set. Example:

[standalone@localhost:9990 /] /system-property=jboss.as.management.blocking.timeout:add(value=600)
{"outcome" => "success"}

This will be in the configuration file, standalone.xml, as:

    </extensions>
    <system-properties>
        <property name="jboss.as.management.blocking.timeout" value="600"/>
    </system-properties>
    <management>

Or this can be set in standalone.confor run JBoss EAP directly with the system property on the command line:

./bin/standalone.sh  -Djboss.as.management.blocking.timeout=600

In Openshift, users must set it as following:

embed-server --std-out=echo  --server-config=standalone-openshift.xml
batch

/system-property=jboss.as.management.blocking.timeout:add(value=900)

run-batch
quit

In domain mode, users must set it per server

/host=master/server-config=server-one/system-property=jboss.as.management.blocking.timeout:add(boot-time=true,value=600)  

And set it for the domain controller (master and any remote slave controllers)

  • Command line

    • Red Hat Enterprise Linux: bin/domain.sh -Djboss.as.management.blocking.timeout=600
    • Microsoft Windows: bin\domain.bat -Djboss.as.management.blocking.timeout=600
  • Configuration file for service startup

    • domain.conf

      PROCESS_CONTROLLER_JAVA_OPTS="$PROCESS_CONTROLLER_JAVA_OPTS -Djboss.as.management.blocking.timeout=600"
      ...
      HOST_CONTROLLER_JAVA_OPTS="$HOST_CONTROLLER_JAVA_OPTS -Djboss.as.management.blocking.timeout=600"
      
    • Edit domain.bat

      "%JAVA%" %PROCESS_CONTROLLER_JAVA_OPTS% ^
          "-Dorg.jboss.boot.log.file=%JBOSS_LOG_DIR%\process-controller.log" ^
          "-Dlogging.configuration=file:%JBOSS_CONFIG_DIR%/logging.properties" ^
          -jar "%JBOSS_HOME%\jboss-modules.jar" ^
          %MODULE_OPTS% ^
          -mp "%JBOSS_MODULEPATH%" ^
          org.jboss.as.process-controller ^
          -jboss-home "%JBOSS_HOME%" ^
          -jvm "%JAVA%" ^
          %MODULE_OPTS% ^
          -mp "%JBOSS_MODULEPATH%" ^
          -- ^
          "-Dorg.jboss.boot.log.file=%JBOSS_LOG_DIR%\host-controller.log" ^
          "-Dlogging.configuration=file:%JBOSS_CONFIG_DIR%/logging.properties" ^
          %HOST_CONTROLLER_JAVA_OPTS% ^
          -- ^
          -default-jvm "%JAVA%" ^
          -Djboss.as.management.blocking.timeout=600 ^
          %*
      

Notes

  • The only use case for setting a property at the server-group or individual domain server level would be if users wanted a lower timeout than what users configure on the host controllers.
  • The range of values is 1 to 2147483 seconds, setting the value to 0 will log a message and set it to 300 seconds.

Root Cause

The org.jboss.as.controller.BlockingTimeout class loads the value of system property jboss.as.management.blocking.timeout or defaults to 300 (seconds).

This property is not used as timeout per deployment but a timeout on container stability and if jboss.as.management.blocking.timeout is reached during startup then all applications will be undeployed and the container shutdown.

The reasoning behind this is that having a half-working server is potentially dangerous as users may not notice major failures.

Diagnostic Steps

  1. Confirm whether a virus scanner is installed to the server and activated.

  2. If the issue was caused after making changes in the application, check what exactly has changed since it was working previously? Was something upgraded? Added? Pointing to a new database or another external system?

  3. Collect a series of thread dumps during the startup period so Red Hat can see what it might be getting stuck on. If you are using JBoss EAP 7.4.8 or later, the thread dump is automatically logged to server.log. This enhancement has been implemented by JBEAP-23951.

  4. Make sure to add the setting to the domain.conf on the Domain controller and Slave Host controller

Once users collect the thread dumps then Red Hat can analyse them to see whether there is some sort of deadlock or resource that the threads are waiting on that's preventing them from completing the deployment etc.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments