Chapter 5. Handling Transaction Manager Exceptions

5.1. Debugging a Timed-out Transaction

There can be many reasons for a transaction timeout, such as:

  • Slow server performance
  • Thread is stuck waiting for something or hangs up
  • Thread needs more than the configured transaction timeout time to complete the processing

You can look at the logs for following error message to identify a timed-out transaction:

WARN ARJUNA012117 "TransactionReaper::check timeout for TX {0} in state {1}"

where {0} is the Uid of the transaction and {1} is the transaction manager’s view of the state {1} of the timed-out transaction.

Transaction Manager provides the following options to debug the transaction timeouts:

  • You can configure timeout values for transactions to control the transaction lifetimes. The transactions subsystem rolls back the transaction if the timeout value elapses before a transaction terminates because of committing or rolling back.
  • You can use the setTransactionTimeout method of the XAResource interface to propagate the current transaction to the resource manager. If supported, this operation overrides any default timeout associated with the resource manager. Overriding the timeout is useful in situations like the following:

    • when long-running transactions have lifetimes that exceed the default
    • when using the default timeout might cause the resource manager to roll back before the transaction terminates, causing the transaction to roll back as well.
  • If you do not specify a timeout value or use a value of 0, transaction manager uses an implementation-specific default value. In JBoss EAP transaction manager, the CoordinatorEnvironmentBean.defaultTimeout property represents this implementation-specific default value. The default value is 300 seconds. A value of 0 disables the default transaction timeouts.

    You can modify the default transaction timeout using the following management CLI command:

    /subsystem=transactions:write-attribute(name=default-timeout,value=VALUE)

    When running in a managed domain, you must specify which profile to update by preceding the command with /profile=PROFILE_NAME

  • JBoss EAP Transaction Manager supports an all-or-nothing approach to call the setTransactionTimeout method on the XAResource instances. You can set the JTAEnvironmentBean.xaTransactionTimeoutEnabled property to true, which is the default, to call the method on all the instances. Otherwise, you can use the setXATransactionTimeoutEnabled method of the com.arjuna.ats.jta.common.JTAEnvironmentBean class to disable timeout and specify them on a per-transaction basis.

5.2. Migrating Logs to a New JBoss EAP Server

Prerequisites

Ensure that the transactions subsystem is configured identically between the old and the new JBoss EAP. An identical configuration, which includes the list of Jakarta Transactions datasources, is required because any logs that need to be recovered must contact the datasources.

5.2.1. Migrating the File-based Log Storage

To migrate the transaction manager logs to a new JBoss EAP server, you can copy the logs to the new JBoss EAP server.

You can use the following commands to copy the file-based logs:

  1. Browse to your EAP_HOME directory.
  2. Create an archive of the logs using the following command:

    $ tar -cf logs.tar ./standalone/data/tx-object-store
  3. Extract the archived logs to the new EAP_HOME directory using the following command:

    $ tar -xf logs.tar -C NEW_EAP_HOME

5.2.2. Migrating the JDBC Store-based Log Storage

  • You can configure the new JBoss EAP server to use the old database and tables as described in Using a JDBC as a Transactions Object Store.
  • Alternatively, you can determine the database and the tables used for the transaction logs. Then, you can use an SQL tool to back up the tables and restore them to the new database.

    Note

    You can find an SQL query tool in the h2 JAR file shipped with JBoss EAP.

5.3. Enabling XTS on JBoss EAP

XML Transaction Service (XTS) component of the transaction manager supports the coordination of private and public web services in a business transaction. XTS provides WS-AT and WS-BA support for web services hosted on the JBoss EAP server. It is an optional subsystem, which can be enabled using the standalone-xts.xml configuration.

Starting JBoss EAP Server with XTS Enabled

  1. Change to the JBoss EAP server directory:

    cd $EAP_HOME
  2. Copy the example XTS configuration file into the /configuration directory:

    cp docs/examples/configs/standalone-xts.xml standalone/configuration
  3. Start the JBoss EAP server, specifying the xts configuration:

    Linux:

    bin/standalone.sh --server-config=standalone-xts.xml

    Windows:

    bin\standalone.bat --server-config=standalone-xts.xml

5.4. Clearing Up Expired Transactions

The following properties allow you to clear up the expired transactions:

ExpiryEntryMonitor

When the Recovery Manager initializes an expiry scanner thread, the ExpiryEntryMonitor object is created, which is used to remove dead items from the object store. A number of scanner modules are loaded dynamically, which removes the dead items for a particular type.

You can configure the scanner modules in the properties file using the RecoveryEnvironmentBean.expiryScanners system property. The scanner modules are loaded at the time of initialization.

$ EAP_HOME/bin/standalone.sh -DRecoveryEnvironmentBean.expiryScanners=CLASSNAME1,CLASSNAME2
expiryScanInterval

All the scanner modules are called periodically to scan for dead items by the ExpiryEntryMonitor thread. You can configure this period, in hours, using the expiryScanInterval system property, as shown in the example below:

$ EAP_HOME/bin/standalone.sh -DRecoveryEnvironmentBean.expiryScanInterval=EXPIRY_SCAN_INTERVAL

All scanner modules inherit the same behavior from the ExpiryScanner interface. This interface provides a scan method that is implemented by all the scanner modules, including the following. The scanner thread calls this scan method.

ExpiredTransactionStatusManagerScanner

The ExpiredTransactionStatusManagerScanner removes the dead TransactionStatusManagerItems from the object store. These items remain in the object store for a certain period before they are deleted, which is 12 hours by default. You can configure this time period, in hours, using the transactionStatusManagerExpiryTime system property as shown in the example below:

$ EAP_HOME/bin/standalone.sh -DRecoveryEnvironmentBean.transactionStatusManagerExpiryTime=TRANSACTION_STATUS_MANAGER_EXPIRY_TIME
AtomicActionExpiryScanner

The AtomicActionExpiryScanner moves transaction logs for AtomicActions that are assumed to have completed. For example, if a failure occurs after a participant has been told to commit but before the transactions subsystem can update the logs, then upon recovery the JBoss EAP transaction manager attempts to replay the commit request. This replay will obviously fail, thus preventing the log from being removed. The AtomicActionExpiryScanner is also used when logs cannot be recovered automatically for reasons such as being corrupt or zero length. All logs are moved to a specific location based on the old location appended with /Expired.

Note

AtomicActionExpiryScanner is disabled by default. You can enable it by adding it to the transaction manager properties file. You need not enable it to cope with corrupt logs.

5.5. Recovering Heuristic Outcomes

A heuristic completion occurs when a transaction resource makes a one-sided decision, during the completion stage of a distributed transaction, to commit or rollback the transaction updates. This can leave distributed data in an indeterminate state. Network failures or resource timeouts are possible causes for heuristic completion. Heuristic completion throws one of the following heuristic outcome exceptions:

HEURISTIC_COMMIT
This exception is thrown when the transaction manager decides to rollback, but somehow all the resources had already committed on their own. In this case, you need not do anything because a consistent termination was reached.
HEURISTIC_ROLLBACK
This exception implies that the resources have all done a rollback because the commit decision from the transaction manager was delayed. Similar to HEURISTIC_COMMIT, in this case also you need not do anything because a consistent termination was reached.
HEURISTIC_HAZARD
This exception occurs when the disposition of some of the updates is unknown. For those that are known, they have either all been committed or all rolled back.
HEURISTIC_MIXED
This exception occurs when some parts of the transaction were rolled back while others were committed.

This procedure shows how to handle a heuristic outcome of a transaction using the Jakarta Transactions.

  1. The cause of a heuristic outcome in a transaction is that a resource manager promised it could commit or rollback, and then failed to fulfill the promise. This could be due to a problem with a third-party component, the integration layer between the third-party component and JBoss EAP, or JBoss EAP itself.

    By far, the most common two causes of heuristic errors are transient failures in the environment and coding errors dealing with resource managers.

  2. Usually, if there is a transient failure in your environment, you will know about it before you find out about the heuristic error. This could be due to a network outage, hardware failure, database failure, power outage, or a host of other things.

    If you come across a heuristic outcome in a test environment during stress testing, it implies weaknesses in your test environment.

    Warning

    JBoss EAP automatically recovers transactions that were in a non-heuristic state at the time of failure, but it does not attempt to recover the heuristic transactions.

  3. If you have no obvious failure in your environment, or if the heuristic outcome is easily reproducible, it is probably due to a coding error. You must contact the third-party vendors to find out if a solution is available.

    If you suspect the problem is in the transaction manager of JBoss EAP itself, you must raise a support ticket.

  4. You can attempt to recover the transaction manually using the management CLI. For instructions on manually recovering a transaction, see the Recovering a Transaction Participant section.
  5. The process of resolving the transaction outcome manually is dependent on the exact circumstance of the failure. Perform the following steps, as applicable to your environment:

    1. Identify which resource managers were involved.
    2. Examine the state of the transaction manager and the resource managers.
    3. Manually force log cleanup and data reconciliation in one or more of the involved components.
  6. In a test environment, or if you do not care about the integrity of the data, deleting the transaction logs and restarting JBoss EAP gets rid of the heuristic outcome. By default, the transaction logs are located in the EAP_HOME/standalone/data/tx-object-store/ directory for a standalone server, or the EAP_HOME/domain/servers/SERVER_NAME/data/tx-object-store/ directory in a managed domain. In the case of a managed domain, SERVER_NAME refers to the name of the individual server participating in a server group.

    Note

    The location of the transaction log also depends on the object store in use and the values set for the object-store-relative-to and object-store-path parameters. For file system logs, such as a standard shadow and Apache ActiveMQ Artemis logs, the default directory location is used, but when using a JDBC object store, the transaction logs are stored in a database.

5.5.1. Guidelines on Making Decisions for Heuristic Outcomes

Problem Detection

A heuristic decision is one of the most critical errors that can happen in a transaction system. It can lead to parts of the transaction being committed, while other parts are rolled back. Thus, it can violate the atomicity property of the transaction and can possibly lead to corruption of the data integrity.

A recoverable resource maintains all the information about the heuristic decision in stable storage until it is required by the transaction manager. The actual data saved in stable storage depends on the type of recoverable resource and is not standardized. You can parse through the data and possibly edit the resource to correct any data integrity problems.

Heuristic outcomes are stored in the server log and can be identified using the resource manager and transaction manager.

Manually Committing or Rolling Back a Transaction

Generally, you cannot manually commit or rollback a transaction. From the JBoss EAP transaction management perspective, you can move a transaction back to the pending list for automated recovery to try again or delete the record. For example:

You can use the read-resource operation to check the status of the participants in the transaction:

/subsystem=transactions/log-store=log-store/transactions=0\:ffff7f000001\:-b66efc2\:4f9e6f8f\:9/participants=2:read-resource

The result will look similar to this:

{
   "outcome" => "success",
   "result" => {
       "eis-product-name" => "ArtemisMQ",
       "eis-product-version" => "2.0",
       "jndi-name" => "java:/JmsXA",
       "status" => "HEURISTIC_HAZARD",
       "type" => "/StateManager/AbstractRecord/XAResourceRecord"
   }
}

The outcome status shown here is a HEURISTIC_HAZARD state and is eligible for recovery.

Recovering the HEURISTIC_HAZARD Exception

The following steps show an example of how to recover a hazard type heuristic outcome.

  1. To begin the recovery, you must consult each resource manager and establish the outcomes of the various branches that are identifiable from the transaction manager tooling. However, you should not need to force a resource manager to commit or rollback. You must rather inspect the resource manager to know the state of the heuristic exception.

    The following are reference links for listing and resolving heuristic outcomes for various resource managers:

    Note

    These links are for reference purpose only and are subject to change. Please consult the vendor documentation for details.

  2. You must execute the recover operation, as shown in the following example:

    /subsystem=transactions/log-store=log-store/transactions=0\:ffff7f000001\:-b66efc2\:4f9e6f8f\:9/participants=2:recover

    Running the recover operation changes the state of the transaction to PREPARE and triggers a recovery attempt by replaying the commit operation. If the recovery attempt is successful, the participant is removed from the transaction log.

    You can verify this by running the probe operation on the log-store element again. The participant should no longer be listed. If this is the last participant, the transaction is also deleted.

Recovering the HEURISTIC_ROLLBACK and HEURISTIC_COMMIT Exceptions

If the heuristic outcome is a rollback type, then:

  • The resource should not be able to commit the transaction, provided the resource manager is well implemented.
  • You must decide whether you should delete the branch from the resource manager, using a forget call, so that the rest of the transaction can commit normally and be cleaned from the transaction store.
  • If you do not delete the branch from the resource manager, then the transaction will remain in the transaction store forever.

On the other hand, if the heuristic outcome was a commit type, then you must use the business semantics to deal with the inconsistent outcome.

Further Actions When Manual Reconciliation Fails

You can check the database transaction table, which is the DBA_2PC_PENDING table for Oracle. However, these will depend upon the specific resource managers. Transaction Manager can provide you with the branches to inspect in each resource manager.

You should consult the vendor’s documentation on this resource manager for details. If you suspect that the problem is caused by the third party resource manager, you must consider raising a support ticket with your supplier.





Revised on 2021-10-07 13:46:08 UTC