EJB/Remoting XA transaction across multiple servers can leave unfinished transactions if JVM/Network crashes in 2PC prepare

Solution In Progress - Updated -

Issue

  • EJB/Remoting XA transaction across multiple servers can leave unfinished transactions if JVM/Network crashes in 2PC prepare

Issue description: EJB remote server to server sub-transaction may be locked of not being rolled-back in case of of network failure or JVM crash.

Details: When server A communicates with server B with jboss remoting EJB calls (this does not happen for EJB2 IIOP communication) then from transaction manager perspective the EJB remoting behaves as a XAResource. For correct processing of transaction recovery in case of failure the EJB remoting stores a persistent record on the server A (on the side initializing the remote call). If there is some system failure (e.g. intermittent network failure or JVM crash) then rollbacking of the participant on the remote server (server B) could not be finished as the EJB remoting erases the persistent record before the prepare on the server B can be processed.

The system failure has to happen during the first phase of the 2PC protocol when the participants on the server B are prepared - they take the lock and waiting for the final commit command from the transaction manager. But as some other participant may fail to prepare the final outcome is rollback. The transaction manager tries to rollback the participants on the server B. If the system failure occurs at this particular time then transaction processing may suffer of never roll-backing the participants on the server B.

Some of the possible scenarios when this may happen: server A calls to server B, all business activity succeeds and the EJB method finishes. The EJB method worked with 2 resources on server A. There was an insertion to a DB and then call to server B (it behaves as a XAResource). The transaction manager starts with 2PC.

  • the server A calls prepare on server B, all participants/resources prepare on the server B and suddenly a network error occurs. The response about the successful outcome of the prepare call is lost and the server A gets only a network exception. The transaction manager decides to roll-back the whole transaction. The DB is rolled-back and as the network is down the abort on server B fails. The periodic recovery processing tries to roll-back the prepared resources on server B. Because of the issue such an action may never happen.

  • the server A calls prepare on server B, all participants prepare on the server B the success is returned to server A. Then JVM of server A crashes. The server B is left with prepared participants (XA resources). When the server A is restarted it's assumed that all the prepared transaction's participants are finished by rolling-back. This may never happen.

Environment

  • Red Hat Enterprise Application Platform (EAP) 7.2
    • Update 8
    • Update 7

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content