Defining a Unique Node Identifier in Your JBoss Operations Network High Availability System

Updated -

A Red Hat JBoss Operations Network system can be configured in a high-availability (HA) mode meaning two or more JBoss Operations Network servers are sharing the same back-end database. This allows for scaling and fault tolerance by distributing management and monitoring operations to multiple server processes or hosts.

Because each of the JBoss Operations Network servers in an HA configuration use the same database, they will each require a separate and unique node identifier for the purpose of managing their database transactions. However, the default JBoss Operations Network server configuration does not specify a node identifier value to be used for database transaction meaning that the underlying application server will use a default node identifier value of 1. This can result in transaction execution and recovery failures that seem sporadic and can result in extra load on the JBoss Operations Network server or the database it is using.

This article describes the necessary steps to set a unique node identifier for your JBoss Operations Network server.

New server installation

Before running the JBoss Operations Network server installer using the rhqctl command, you will need to apply JBoss Operations Network 3.3 Update 11 or later and define the jboss.tx.node.id property and provide an acceptable value in rhq-server.properties.

  1. Apply JBoss Operations Network 3.3 Update 11 or later if you have not already.

    This should be done following the normal update instructions.

  2. Open bin/rhq-server.properties and define the the following property/value pair:

    jboss.tx.node.id=<UNIQUE-ID>
    

    If you merged the new or changed properties found in rhq-server.properties.new with your rhq-server.properties file, the jboss.tx.node.id property will be commented out. You can remove the comment mark and change the value from 1 to your desired unique identifier.

    UNIQUE-ID can be no longer then 23 characters and must be unique for each server in a high-availability cluster. For example:

    jboss.tx.node.id=my-jon-server-host-01
    

    Be sure you have saved the changes.

  3. Proceed with configuring and installing your JBoss Operations Network server normally.
  4. Be sure to repeat the necessary steps for each server that will be added to the cluster.

Existing server installation

Because this configuration property was not exposed or used until JBoss Operations Network 3.3 Update 11 and later, it is likely that you are already running a high-availability cluster in where all the servers are using the default node identifier value of 1. This will result in transaction execution and recovery failures that could lead to performance problems, noisy logs, or even data loss.

You can define the unique node identifier for the existing servers in the system but will need to first shutdown all servers in the cluster. It is also recommended that a complete database and file system backup of your JBoss Operations Network servers be performed.

  1. If you are certain you have not manually changed the node-identifier attribute for your JBoss Operations Network server using the jboss-cli command, skip to the next step. Otherwise, you will need to remove or properly define the attribute.

    • If you have not applied JBoss Operations Network 3.3 Update 11 or later, you can simply remove the attribute and the update process will fix the configuration for you. To remove the attribute, make sure your JBoss Operations Network server is running and use the following jboss-cli command:

      "$RHQ_SERVER_HOME"/jbossas/bin/jboss-cli.sh --controller=127.0.0.1:6999 --connect --command='/subsystem=transactions:undefine-attribute(name=node-identifier)'
      
    • If you have already applied the update, make sure your JBoss Operations Network server is running and use the following jboss-cli command:

      "$RHQ_SERVER_HOME"/jbossas/bin/jboss-cli.sh --controller=127.0.0.1:6999 --connect --command='/subsystem=transactions:write-attribute(name=node-identifier,value=${jboss.tx.node.id:1})'
      

    You will need to perform this process on each server in your cluster in where you have manually set the node-identifier attribute.

  2. Shutdown all servers in the JBoss Operations Network high-availability cluster.

    IMPORTANT: Each server should be allowed to shutdown gracefully. If you have to terminate a server before it completes its shutdown operation or if you are forced to kill it due to it taking too long, it will leave in-progress transactions that will not be recoverable after completing these steps.
  3. Backup the file system of each JBoss Operations Network server in the cluster.

    For example:

    tar czf ~/jboss-operationsnetwork-server-"$HOSTNAME".tar.gz /opt/operationsnetwork/jon-server-3.3.0.GA
    
  4. Backup the database used by the JBoss Operations Network system.

  5. Apply JBoss Operations Network 3.3 Update 11 or later to each server in the cluster, if you have not already.

    This should be done following the normal update instructions.

  6. Open bin/rhq-server.properties on each server in the cluster and define the the following property/value pair:

    jboss.tx.node.id=<UNIQUE-ID>
    

    If you merged the new or changed properties found in rhq-server.properties.new with your rhq-server.properties file, the jboss.tx.node.id property will be commented out. You can remove the comment mark and change the value from 1 to your desired unique identifier.

    UNIQUE-ID can be no longer then 23 characters and must be unique for each server in a high-availability cluster. For example:

    jboss.tx.node.id=my-jon-server-host-01
    

    You must repeat this step for each JBoss Operations Network server in the cluster and confirm you have saved the changes.

  7. Start your JBoss Operations Network system normally.

After your server has been running for several minutes, you should review the server.log file for any recent transaction recovery warnings similar to the following:

WARN  [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016037: Could not find new XAResource to use for recovering non-serializable XAResource XAResourceRecord < resource:null, txid:< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff0a2c0a0d:-140df526:50d1ddd3:ac883, node_name=1, ...
WARN  [com.arjuna.ats.jta] (Periodic Recovery) ARJUNA016038: No XAResource to recover < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff0a2c0a0d:-140df526:50d1ddd3:ac883, node_name=1, ...

If you see these warnings with a node_name value of 1, it will indicate that there were transactions left in an in-progress or unknown state prior to changing the node identifier. To clean up these abandoned transactions, see the JBoss EAP 6 section of Removing traces of failed transactions from the object store in JBoss EAP.

Comments