Red Hat Training

A Red Hat training course is available for JBoss Enterprise Application Platform Common Criteria Certification

Chapter 4. Transactions and Concurrency

The most important point about Hibernate Entity Manager and concurrency control is that it is very easy to understand. Hibernate Entity Manager directly uses JDBC connections and JTA resources without adding any additional locking behavior. We highly recommend you spend some time with the JDBC, ANSI, and transaction isolation specification of your database management system. Hibernate Entity Manager only adds automatic versioning but does not lock objects in memory or change the isolation level of your database transactions. Basically, use Hibernate Entity Manager like you would use direct JDBC (or JTA/CMT) with your database resources.
We start the discussion of concurrency control in Hibernate with the granularity of EntityManagerFactory, and EntityManager, as well as database transactions and long units of work.
In this chapter, and unless explicitly expressed, we will mix and match the concept of entity manager and persistence context. One is an API and programming object, the other a definition of scope. However, keep in mind the essential difference. A persistence context is usually bound to a JTA transaction in Java EE, and a persistence context starts and ends at transaction boundaries (transaction-scoped) unless you use an extended entity manager. Please refer to Section 1.2.3, “Persistence context scope” for more information.

4.1. Entity manager and transaction scopes

A EntityManagerFactory is an expensive-to-create, threadsafe object intended to be shared by all application threads. It is created once, usually on application startup.
An EntityManager is an inexpensive, non-threadsafe object that should be used once, for a single business process, a single unit of work, and then discarded. An EntityManager will not obtain a JDBC Connection (or a Datasource) unless it is needed, so you may safely open and close an EntityManager even if you are not sure that data access will be needed to serve a particular request. (This becomes important as soon as you are implementing some of the following patterns using request interception.)
To complete this picture you also have to think about database transactions. A database transaction has to be as short as possible, to reduce lock contention in the database. Long database transactions will prevent your application from scaling to highly concurrent load.
What is the scope of a unit of work? Can a single Hibernate EntityManager span several database transactions or is this a one-to-one relationship of scopes? When should you open and close a Session and how do you demarcate the database transaction boundaries?

4.1.1. Unit of work

First, don't use the entitymanager-per-operation antipattern, that is, don't open and close an EntityManager for every simple database call in a single thread! Of course, the same is true for database transactions. Database calls in an application are made using a planned sequence, they are grouped into atomic units of work. (Note that this also means that auto-commit after every single SQL statement is useless in an application, this mode is intended for ad-hoc SQL console work.)
The most common pattern in a multi-user client/server application is entitymanager-per-request. In this model, a request from the client is send to the server (where the EJB3 persistence layer runs), a new EntityManager is opened, and all database operations are executed in this unit of work. Once the work has been completed (and the response for the client has been prepared), the persistence context is flushed and closed, as well as the entity manager object. You would also use a single database transaction to serve the clients request. The relationship between the two is one-to-one and this model is a perfect fit for many applications.
This is the default EJB3 persistence model in a Java EE environment (JTA bounded, transaction-scoped persistence context); injected (or looked up) entity managers share the same persistence context for a particular JTA transaction. The beauty of EJB3 is that you don't have to care about that anymore and just see data access through entity manager and demaraction of transaction scope on session beans as completely orthogonal.
The challenge is the implementation of this (and other) behavior outside an EJB3 container: not only has the EntityManager and resource-local transaction to be started and ended correctly, but they also have to be accessible for data access operations. The demarcation of a unit of work is ideally implemented using an interceptor that runs when a request hits the non-EJB3 container server and before the response will be send (i.e. a ServletFilter if you are using a standalone servlet container). We recommend to bind the EntityManager to the thread that serves the request, using a ThreadLocal variable. This allows easy access (like accessing a static variable) in all code that runs in this thread. Depending on the database transaction demarcation mechanism you chose, you might also keep the transaction context in a ThreadLocal variable. The implementation patterns for this are known as ThreadLocal Session and Open Session in View in the Hibernate community. You can easily extend the HibernateUtil shown in the Hibernate reference documentation to implement this pattern, you don't need any external software (it's in fact very trivial). Of course, you'd have to find a way to implement an interceptor and set it up in your environment. See the Hibernate website for tips and examples. Once again, remember that your first choice is naturally an EJB3 container - preferably a light and modular one such as JBoss application server.

4.1.2. Long units of work

The entitymanager-per-request pattern is not the only useful concept you can use to design units of work. Many business processes require a whole series of interactions with the user interleaved with database accesses. In web and enterprise applications it is not acceptable for a database transaction to span a user interaction with possibly long waiting time between requests. Consider the following example:
  • The first screen of a dialog opens, the data seen by the user has been loaded in a particular EntityManager and resource-local transaction. The user is free to modify the detached objects.
  • The user clicks "Save" after 5 minutes and expects his modifications to be made persistent; he also expects that he was the only person editing this information and that no conflicting modification can occur.
We call this unit of work, from the point of view of the user, a long running application transaction. There are many ways how you can implement this in your application.
A first naive implementation might keep the EntityManager and database transaction open during user think time, with locks held in the database to prevent concurrent modification, and to guarantee isolation and atomicity. This is of course an anti-pattern, a pessimistic approach, since lock contention would not allow the application to scale with the number of concurrent users.
Clearly, we have to use several database transactions to implement the application transaction. In this case, maintaining isolation of business processes becomes the partial responsibility of the application tier. A single application transaction usually spans several database transactions. It will be atomic if only one of these database transactions (the last one) stores the updated data, all others simply read data (e.g. in a wizard-style dialog spanning several request/response cycles). This is easier to implement than it might sound, especially if you use EJB3 entity manager and persistence context features:
  • Automatic Versioning - An entity manager can do automatic optimistic concurrency control for you, it can automatically detect if a concurrent modification occured during user think time (usually by comparing version numbers or timestamps when updating the data in the final resource-local transaction).
  • Detached Entities - If you decide to use the already discussed entity-per-request pattern, all loaded instances will be in detached state during user think time. The entity manager allows you to merge the detached (modified) state and persist the modifications, the pattern is called entitymanager-per-request-with-detached-entities. Automatic versioning is used to isolate concurrent modifications.
  • Extended Entity Manager - The Hibernate Entity Manager may be disconnected from the underlying JDBC connection between two client calls and reconnected when a new client request occurs. This pattern is known as entitymanager-per-application-transaction and makes even merging unnecessary. An extend persistence context is responsible to collect and retain any modification (persist, merge, remove) made outside a transaction. The next client call made inside an active transaction (typically the last operation of a user conversation) will execute all queued modifications. Automatic versioning is used to isolate concurrent modifications.
Both entitymanager-per-request-with-detached-objects and entitymanager-per-application-transaction have advantages and disadvantages, we discuss them later in this chapter in the context of optimistic concurrency control.

4.1.3. Considering object identity

An application may concurrently access the same persistent state in two different persistence contexts. However, an instance of a managed class is never shared between two persistence contexts. Hence there are two different notions of identity:
Database Identity
foo.getId().equals( bar.getId() )
JVM Identity
foo==bar
Then for objects attached to a particular persistence context (i.e. in the scope of an EntityManager) the two notions are equivalent, and JVM identity for database identity is guaranteed by the Hibernate Entity Manager. However, while the application might concurrently access the "same" (persistent identity) business object in two different persistence contexts, the two instances will actually be "different" (JVM identity). Conflicts are resolved using (automatic versioning) at flush/commit time, using an optimistic approach.
This approach leaves Hibernate and the database to worry about concurrency; it also provides the best scalability, since guaranteeing identity in single-threaded units of work only doesn't need expensive locking or other means of synchronization. The application never needs to synchronize on any business object, as long as it sticks to a single thread per EntityManager. Within a persistence context, the application may safely use == to compare entities.
However, an application that uses == outside of a persistence context might see unexpected results. This might occur even in some unexpected places, for example, if you put two detached instances into the same Set. Both might have the same database identity (i.e. they represent the same row), but JVM identity is by definition not guaranteed for instances in detached state. The developer has to override the equals() and hashCode() methods in persistent classes and implement his own notion of object equality. There is one caveat: Never use the database identifier to implement equality, use a business key, a combination of unique, usually immutable, attributes. The database identifier will change if a transient entity is made persistent (see the contract of the persist() operation). If the transient instance (usually together with detached instances) is held in a Set, changing the hashcode breaks the contract of the Set. Attributes for good business keys don't have to be as stable as database primary keys, you only have to guarantee stability as long as the objects are in the same Set. See the Hibernate website for a more thorough discussion of this issue. Also note that this is not a Hibernate issue, but simply how Java object identity and equality has to be implemented.

4.1.4. Common concurrency control issues

Never use the anti-patterns entitymanager-per-user-session or entitymanager-per-application (of course, there are rare exceptions to this rule, e.g. entitymanager-per-application might be acceptable in a desktop application, with manual flushing of the persistence context). Note that some of the following issues might also appear with the recommended patterns, make sure you understand the implications before making a design decision:
  • An entity manager is not thread-safe. Things which are supposed to work concurrently, like HTTP requests, session beans, or Swing workers, will cause race conditions if an EntityManager instance would be shared. If you keep your Hibernate EntityManager in your HttpSession (discussed later), you should consider synchronizing access to your Http session. Otherwise, a user that clicks reload fast enough may use the same EntityManager in two concurrently running threads. You will very likely have provisions for this case already in place, for other non-threadsafe but session-scoped objects.
  • An exception thrown by the Entity Manager means you have to rollback your database transaction and close the EntityManager immediately (discussed later in more detail). If your EntityManager is bound to the application, you have to stop the application. Rolling back the database transaction doesn't put your business objects back into the state they were at the start of the transaction. This means the database state and the business objects do get out of sync. Usually this is not a problem, because exceptions are not recoverable and you have to start over your unit of work after rollback anyway.
  • The persistence context caches every object that is in managed state (watched and checked for dirty state by Hibernate). This means it grows endlessly until you get an OutOfMemoryException, if you keep it open for a long time or simply load too much data. One solution for this is some kind batch processing with regular flushing of the persistence context, but you should consider using a database stored procedure if you need mass data operations. Some solutions for this problem are shown in Chapter 6, Batch processing. Keeping a persistence context open for the duration of a user session also means a high probability of stale data, which you have to know about and control appropriately.