4.5. Entity Beans

Entity beans are avoided by some because of historical issues with EJB 1 and EJB 2 but with EJB 3 their use is rising. In discussing how to get the best possible throughout while using entity beans, there are four topics to cover:
  • second-level cache
  • prepared statements
  • batch inserts
  • batching database operations

4.5.1. Second level cache

As Hibernate is the JPA provider, entity beans in EJB 3 sit on top of Hibernate. This is in stark contrast to the old EJB 2.x entities, which had their own complete implementation apart from Hibernate. In fact, Hibernate, and other object relational mapping (ORM) frameworks were the inspiration for JPA and EJB 3 entities. Since Hibernate is the underlying implementation, we have a second level cache that we can utilize, just as if we were using Hibernate directly instead of entities. Hibernate's second level cache has gone through some evolution over the years, and has improved with each release. The JBoss Cache implementation as the second level cache, has some very useful features, and the default configuration for EJB 3 entities is very good. To enable the use of the second level cache for entities in an EJB3 application, persistence units are defined in the persistence.xml file that is packaged with an application. Here is an extract from persistence.xml:
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd"
   version="1.0">
   <persistence-unit name="services" transaction-type="JTA">
     <provider>org.hibernate.ejb.HibernatePersistence</provider>
         <jta-data-source>java:/MySqlDS</jta-data-source>
         <properties>
           <property name="hibernate.cache.region.factory_class" value="org.hibernate.cache.jbc2.JndiMultiplexedJBossCacheRegionFactory"/>
           <property name="hibernate.cache.region.jbc2.cachefactory" value="java:CacheManager"/>
           <property name="hibernate.cache.use_second_level_cache" value="true"/>
           <property name="hibernate.cache.use_query_cache" value="false"/>
           <property name="hibernate.cache.use_minimal_puts" value="true"/>
           <property name="hibernate.cache.region.jbc2.cfg.entity" value="mvcc-entity"/>
           <property name="hibernate.cache.region_prefix" value="services"/>
         </properties>
   </persistence-unit>
</persistence>
The configuration parameters relevant to the second level cache are:
hibernate.cache.region.factory_class specifies the cache factory to be used by the underlying Hibernate session factory, in this example JndiMultiplexedJBossCacheRegionFactory. This factory implementation creates a single cache for all types of data that can be cached (entities, collections, query results and timestamps). With other options you can create caches that are tailored to each type of data, in separate cache instances. In this example, there is only one cache instance, and only entities and collections are cached. The second important parameter above is the hibernate.cache.use_second_level_cache, which is set to true, enabling the cache. The query cache is disabled with hibernate.cache.use_query_cache set to false.
hibernate.cache.use_minimal_puts, set to true, specifies the behavior of writes to the cache. It minimizes the writes to the cache at the expense of more reads, the default for a clustered cache.
hibernate.cache.region.jbc2.cfg.entity specifies the underlying JBoss Cache configuration, in this case the multiversion concurrency control (MVCC) entity cache (mvcc-entity).
hibernate.cache.region_prefix is set to the same name as the persistent unit itself. Specifying a name here is optional, but if you do not specify a name, a long default name is generated. The mvcc-entity configuration is in the file jboss-cache-manager-jboss-beans.xml, in the directory: JBOSS_EAP_DIST/jboss-as/server/<PROFILE>/deploy/cluster/jboss-cache-manager.sar/META-INF. Note that the default, standard and minimal configurations do not have the JBoss Cache configured or deployed.
Below is an extract from the configuration of the MVCC cache:
<!-- A config appropriate for entity/collection caching that uses MVCC locking -->
   <entry><key>mvcc-entity</key>
   <value>
      <bean name="MVCCEntityCache" class="org.jboss.cache.config.Configuration">
<!-- Node locking scheme -->
<property name="nodeLockingScheme">MVCC</property>
<!-- READ_COMMITTED is as strong as necessary for most
     2nd Level Cache use cases. -->
<property name="isolationLevel">READ_COMMITTED</property>
<property name="useLockStriping">false</property>
<!-- Mode of communication with peer caches.
     INVALIDATION_SYNC is highly recommended as the mode for use
     with entity and collection caches.     -->
<property name="cacheMode">INVALIDATION_SYNC</property>
   <property name="evictionConfig">
      <bean class="org.jboss.cache.config.EvictionConfig">
         <property name="wakeupInterval">5000</property>
         <!--  Overall default -->
         <property name="defaultEvictionRegionConfig">
            <bean class="org.jboss.cache.config.EvictionRegionConfig">
               <property name="regionName">/</property>
               <property name="evictionAlgorithmConfig">
                  <bean class="org.jboss.cache.eviction.LRUAlgorithmConfig">
                     <!-- Evict LRU node once we have more than this number of nodes -->
                     <property name="maxNodes">500000</property>
                     <!-- And, evict any node that hasn't been accessed in this many seconds -->
                     <property name="timeToLiveSeconds">7200</property>
                     <!-- Do not evict a node that's been accessed within this many seconds.
                             Set this to a value greater than your max expected transaction length. -->
                     <property name="minTimeToLiveSeconds">300</property>
                  </bean>
               </property>
      </bean>
</property>
In the configuration above, the following parameters are of particular interest in tuning the cache:
  • isolationLevel
  • cacheMode
  • maxNodes
  • timeToLiveSeconds
  • minTimeToLiveSeconds
isolationLevel is similar to database isolation level for transactions. JBoss Cache is fully transactional and can participate as a full resource in transactions, so that stale data is not stored in the cache. Some applications may not be affected by stale data in a cache so configuration can vary accordingly. The default is READ_COMMITTED, which is the same as in the example data source configuration for the database connection pool. It's recommended to set this the same as in the data source to avoid odd behavior in the application. JBoss Cache supports the following isolation levels:
  • NONE
  • READ_UNCOMMITTED
  • READ_COMMITTED
  • REPEATABLE_READ
  • SERIALIZABLE
The default is REPEATABLE_READ, which is used in the example configuration.
cacheMode specifies that across the cluster, cached entities will be invalidated on other nodes, so that another node does not return a different value. Invalidation is done in a synchronous manner, which ensures that the cache is in a correct state when the invalidation request completes. This is very important for caches when in a cluster, and is the recommended setting. Replication, instead of invalidation, is an option, but is much more expensive and limits scalability, possibly preventing caching from being effective in providing increased throughput. In this example, cacheMode is set to INVALIDATION_SYNC.
The following three parameters - maxNodes, timeToLiveSeconds, and minTimeToLiveSeconds - define the size of the cache and how long things live in the cache.
maxNodes specifies the maximum number of nodes that can be in the cache at any one time. The default for maxNodes is 10,000, which is quite small, and in the example configuration it was set to 500,000. Deciding how large to make this value depends on the entities being cached, the access pattern of those entities, and how much memory is available to use. If the cache uses too much memory, other platform components could be starved of resources and so performance may be degraded. If the cache is too small, not enough entities may be stored in the cache to be of benefit.
timeToLiveSeconds specifies how long something remains in the cache before it becomes eligible to be evicted. The default value is 1,000 seconds or about 17 minutes, which is a quite short duration. Understanding the access and load pattern is important. Some applications have very predictable load patterns, where the majority of the load occurs at certain times of day, and lasts a known duration. Tailoring the time that entities stay in the cache towards that pattern helps tune performance.
minTimeToLive sets the minimum amount of time an entity will remain in the cache, the default being 120 seconds, or two minutes. This parameter should be set to equal or greater than the maximum transaction timeout value, otherwise it's possible for a cached entity to be evicted from the cache before the transaction completes.

4.5.1.1. Marking entities to be cached

The @Cache annotation is added on an entity bean you want to cache and it takes one argument: CacheConcurrencyStrategy. The @Cache annotation requires the following two imports:
  • import org.hibernate.annotations.Cache;
  • import org.hibernate.annotations.CacheConcurrencyStrategy;
The @Cache annotation looks like the following in the entity code: @Cache(usage = CacheConcurrencyStrategy.READ_ONLY) where the CacheConcurrentyStrategy can be:
  • NONE
  • NONSTRICT_READ_WRITE
  • READ_ONLY
  • TRANSACTIONAL
Of these options, only two strategies are relevant to JBoss Cache as the second level cache provider: READ_ONLY, and TRANSACTIONAL.
READ_ONLY guarantees that the entity will never change while the application is running. This allows the use of read only semantics, which is by far the most optimal performing cache concurrency strategy.
TRANSACTIONAL allows the use of database ACID semantics on entities in the cache. Anything to be cached while the application is running should be marked TRANSACTIONAL. Avoid caching entities that are likely to be updated frequently. If an entity is updated too frequently, caching can actually increase overhead and so slow throughput. Each update sends an invalidation request across the cluster, so that the state is correctly maintained as well, so the overhead affects every node, not only the node where the update occurs.
Starting with the JBoss Cache 3.x series, which made its debut in EAP 5.0.0, we have a transactional cache that uses MVCC. Also, the mvcc-entity configuration we looked at earlier is the default for entities with the platform. MVCC is a well-known algorithm that allows updates to be in process, but not blocking other transactions from reading the data. So writers (updates) do not block readers from reading a consistent image of the data. This is very important for concurrency, as it's not a pessimistic lock that will block anyone from reading the data. For the window of time that a transaction may be updating a particular entity, other transactions in flight that are readers of that entity will not block, and get a consistent image of the data for their use, until, of course, the transaction commits. This provides a level of scalability that was non-existent in any second level cache providers until JBoss Cache 3.x introduced it (at least for updates). Of course, multiple updates in different transactions to the same entity will still block. Once again the read/write ratios are extremely important to getting good throughput with any entities that are cached and can be updated while the application is running.