4.5.3. Batch Inserts

Batch inserts is the ability to send a set of inserts to a single table, once to the database as a single insert statement instead of individual statements. This method improves latency, and data insert times. These improvements occur not only with batch processing, loading large quantities of data, but also with OLTP workloads.
The following is an example of regular inserts:
INSERT INTO EJB3.OrderLine (OrderId, LineNumber, ProductId, Quantity, Price, ExtendedPrice) VALUES("67ef590kalk4568901thbn7190akioe1", 1, 25, 10, 1.00, 10.00);
INSERT INTO EJB3.OrderLine (OrderId, LineNumber, ProductId, Quantity, Price, ExtendedPrice) VALUES("67ef590kalk4568901thbn7190akioe1", 2, 16, 1, 1.59, 1.59);
INSERT INTO EJB3.OrderLine (OrderId, LineNumber, ProductId, Quantity, Price, ExtendedPrice) VALUES("67ef590kalk4568901thbn7190akioe1", 3, 55, 5, 25.00, 125.00);
INSERT INTO EJB3.OrderLine (OrderId, LineNumber, ProductId, Quantity, Price, ExtendedPrice) VALUES("67ef590kalk4568901thbn7190akioe1", 4, 109, 1, 29.98, 29.98);
The following is an example of batch inserts:
INSERT INTO EJB3.OrderLine (OrderId, LineNumber, ProductId, Quantity, Price, ExtendedPrice) VALUES("67ef590kalk4568901thbn7190akioe1", 1, 25, 10, 1.00, 10.00)
 , ("67ef590kalk4568901thbn7190akioe1", 2, 16, 1, 1.59, 1.59)
 , ("67ef590kalk4568901thbn7190akioe1", 3, 55, 5, 25.00, 125.00)
 , ("67ef590kalk4568901thbn7190akioe1", 4, 109, 1, 29.98, 29.98);
Before discussing how to enable this behavior from an EJB 3 application using Hibernate, the example data model above needs further explanation. The primary key of the EJB3.OrderLine table is the OrderId, and the LineNumber. Originally, the entity Order and OrderLine had a MySQL auto increment column as the OrderId, and the primary key of Order, and the first half of the primary key of the OrderLine entity. The SQL syntax for normal inserts is unaffected by this, as Hibernate does not need to know the primary key value ahead of time. In fact, the order in which those SQL statements are sent to the database does not matter. With batch inserts, since we want to send them all over as one set, and eventually as one SQL statement, Hibernate needs to know the primary key ahead of time. This is for the simple reason that Hibernate pushes each "persist" request from an EJB 3 application into an action queue, and there may be inserts to various tables interleaved in a single transaction. Hibernate has to sort the inserts by primary key in order to batch them at all. An auto increment column value is only known after the insert, so this will fail. Therefore the model has to use a primary key that can be known prior to being inserted into the database. A generated value can be used as the primary key, but the generation strategy used must allow it to be retrieved via JDBC before the inserts are flushed to the database. IDENTITY cannot be used as a primary key generation strategy, but TABLE, SEQUENCE and UUID can be used. It's important to know that when Hibernate performs sorting to get the insert statements batched together it uses the hash code value of the entity so it's imperative that the hash code method is properly defined.

Note

UUID is a Hibernate-specific key generation technique and is not in the JPA or JPA 2 specification so code will not be portable if it is used.
The following code sample illustrates batch processing. It's an extract of the Order performance application, and inserts reference data that is needed for the test runs.
@TransactionTimeout(4800)
 public void createInventories(int batchSize) throws CreateDataException {
if(numberOfInventoryRecords() > 0) {
   throw new CreateDataException("Inventory already exists!");
}
int rowsToFlush = 0, totalRows = 0;
Random quantity = new Random(System.currentTimeMillis());
List<Product> products = productManager.findAllProducts();
List<DistributionCenter> distributionCenters =      distributionCenterManager.findAllDistributionCenters();
InventoryPK inventoryPk = null;
Inventory inventory = null;
for(Product product: products) {
   for(DistributionCenter distributionCenter: distributionCenters) {
      inventoryPk = new InventoryPK();
      inventory = new Inventory();
inventoryPk.setProductId(product.getProductId());
inventoryPk.setDistributionCenterId(distributionCenter.getDistributionCenterId());
inventory.setPrimaryKey(inventoryPk);
inventory.setQuantityOnHand(quantity.nextInt(25000));
inventory.setBackorderQuantity(0);
inventory.setVersion(1);
batchEntityManager.persist(inventory);
rowsToFlush++;
totalRows++;
   if(rowsToFlush == batchSize) {
      batchEntityManager.flush();
      rowsToFlush = 0;
      batchEntityManager.clear();
      if(log.isTraceEnabled()) {
         log.trace("Just flushed " + batchSize + " rows to the database.");
         log.trace("Total rows flushed is " + totalRows);
      }
      }
    }
}
return;
}
The method is annotated with a TransactionTimeout annotation that specifies a longer transaction timeout value, because the default of 300 seconds is too short in this instance. The method takes a single parameter called batchSize which allows tuning of the batch insert size. This is good practice because it allows different batch insert sizes to be tested to optimize the results. After creating the inventory entity, persist is called on the entity manager. The code that follows is significant in demonstrating the batch processing method.
After calling persist a check is made to confirm that the number of entities or rows persisted is equal to the batchSize parameter. If so, the entity manager's flush method is called. Since this is a single large transaction, the normal Hibernate behavior is to flush everything on the transaction commit, or when the JDBC batch size has been reached. To flush a specific number of rows, instead of letting Hibernate decide, an explicit call to flush is made. Note that Hibernate will flush at two intervals: at the commit of the transaction and when it gets to the JDBC batch size parameter. So there's no need to explicitly call flush at all, instead set the JDBC batch size parameter and avoid the batch size parameter altogether. In this case the batch size parameter was explicitly set to the maximum once, then different batch sizes less than or equal to the JDBC batch size parameter were tested. This could be simpler, but the alternative would be to change the Hibernate configuration and redeploy the application in between tests. The Hibernate JDBC batch size parameter (hibernate.jdbc.batch_size) is specified in the properties values of persistence.xml for the persistence unit of the application entities.
The parameter hibernate.order_inserts tells Hibernate to order the inserts by the primary key (using the entities hash code value actually). If entities use a generated primary key, you need to make sure that fetching generated key values is enabled, but this is enabled in Hibernate by default.
Below is an extract from the configuration file with both these parameters set:
<persistence-unit name="batch-services" transaction-type="JTA">
   <provider>org.hibernate.ejb.HibernatePersistence</provider>
   <jta-data-source>java:/MySqlDS</jta-data-source>
   <properties>
          <property name="hibernate.hbm2ddl.auto" value="none"/>
          <property name="hibernate.default_catalog" value="EJB3"/>
          <property name="hibernate.dialect" value="org.hibernate.dialect.MySQL5InnoDBDialect"/>
          <property name="hibernate.order_updates" value="true"/>
          <property name="hibernate.order_inserts" value="true"/>
          <property name="hibernate.jdbc.batch_versioned_data" value="true"/>
          <property name="hibernate.jdbc.fetch_size" value="50000"/>
          <property name="hibernate.jdbc.batch_size" value="50000"/>
          <property name="hibernate.default_batch_fetch_size" value="50000"/>
          <property name="hibernate.connection.release_mode" value="auto"/>
   </properties>
</persistence-unit>
The following code sample illustrates OLTP. It creates an Order which contains OrderLines (has-a relationship in the object model). The OrderLine entities can be batch inserted, as what is passed into this method is the Order object with its OrderLine collection.
public Order createOrder(Customer customer
   , List<OrderLine> orderLines
   , BigDecimal totalOrderAmount) {
   String addressLine2 = customer.getAddressLine2();
   if (addressLine2 == null) {
          addressLine2 = "";
   }
   Address shippingAddress = new Address();
   shippingAddress.setAddressLine1(customer.getAddressLine1());
   shippingAddress.setAddressLine2(addressLine2);
   shippingAddress.setCity(customer.getCity());
   shippingAddress.setState(customer.getState());
   shippingAddress.setZipCode(customer.getZipCode());
   shippingAddress.setZipCodePlusFour(customer.getZipCodePlusFour());
  Order newOrder = new Order();
  newOrder.setCustomerId(customer.getCustomerId());
  newOrder.setDistributionCenterId(customer.getDistributionCenterId());
  newOrder.setShippingAddressLine1(shippingAddress.getAddressLine1());
  newOrder.setShippingAddressLine2(shippingAddress.getAddressLine2());
  newOrder.setShippingCity(shippingAddress.getCity());
  newOrder.setShippingState(shippingAddress.getState());
  newOrder.setShippingZipCode(shippingAddress.getZipCode());
  newOrder.setShippingZipCodePlusFour(shippingAddress.getZipCodePlusFour());
  newOrder.setTotalOrderAmount(totalOrderAmount);
  newOrder.setOrderDate(new Date());
  newOrder.setCustomer(customer);
  entityManager.persist(newOrder);
  String orderId = newOrder.getOrderId();
  for (OrderLine orderLine: orderLines) {
         orderLine.getOrderLinePK().setOrderId(orderId);
  }
  newOrder.setOrderLines(orderLines);
  entityManager.persist(newOrder);
  return newOrder;
}
Note there is no call to the entity manager's flush method. Everything is accomplished through the persistence unit's Hibernate configuration, which is as follows:
<persistence-unit name="services" transaction-type="JTA">
   <provider>org.hibernate.ejb.HibernatePersistence</provider>
   <jta-data-source>java:/MySqlDS</jta-data-source>
   <properties>
          <property name="hibernate.hbm2ddl.auto" value="none"/>
          <property name="hibernate.default_catalog" value="EJB3"/>
          <property name="hibernate.dialect" value="org.hibernate.dialect.MySQL5InnoDBDialect"/>
          <property name="hibernate.order_updates" value="true"/>
          <property name="hibernate.order_inserts" value="true"/>
          <property name="hibernate.jdbc.batch_versioned_data" value="true"/>
          <property name="hibernate.jdbc.fetch_size" value="500"/>
          <property name="hibernate.jdbc.batch_size" value="500"/>
          <property name="hibernate.default_batch_fetch_size" value="16"/>
          <property name="hibernate.connection.release_mode" value="auto"/>
          <property name="hibernate.cache.region.factory_class" value="org.hibernate.cache.jbc2.JndiMultiplexedJBossCacheRegionFactory"/>
          <property name="hibernate.cache.region.jbc2.cachefactory" value="java:CacheManager"/>
          <property name="hibernate.cache.use_second_level_cache" value="true"/>
          <property name="hibernate.cache.use_query_cache" value="false"/>
          <property name="hibernate.cache.use_minimal_puts" value="true"/>
          <property name="hibernate.cache.region.jbc2.cfg.entity" value="mvcc-entity"/>
          <property name="hibernate.cache.region_prefix" value="services"/>
   </properties>
</persistence-unit>
The same two configuration parameters for Hibernate are used, where the inserts are ordered, and the JDBC batch size is set. In this case, an order that has up to 500 line items could be batched. Because OrderLine entities are persisted and the transaction will commit upon exiting the createOrder method, there is no need to do anything further. Of course, the data model changes to use a generated key strategy that can be known prior to the inserts being executed is important. Finally, there's a database-specific configuration parameter required to make this work just as you want.
With the MySQL JDBC driver a connection property must be set to enable the driver to detect inserts to the same table that can be combined into a single insert statement: rewriteBatchedStatements. Since Hibernate will send the inserts it's batching using the JDBC method executeBatch, the JDBC driver can rewrite the multiple insert statements into a single insert statement as illustrated at the top of the article. This property must be specified with a value of true, and you can do this through setting the connection property in the data source XML file for MySQL as such:
<datasources>
   <local-tx-datasource>
          <jndi-name>MySqlDS</jndi-name>
          <connection-url>jdbc:mysql://localhost:3306/EJB3</connection-url>
          <driver-class>com.mysql.jdbc.Driver</driver-class>
          <user-name>username</user-name>
          <password>password</password>
          <exception-sorter-class-name>org.jboss.resource.adapter.jdbc.vendor.MySQLExceptionSorter</exception-sorter-class-name>
          <min-pool-size>400</min-pool-size>
          <max-pool-size>450</max-pool-size>
          <!-- The transaction isolation level must be read committed for optimitic locking to work properly -->
          <transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
          <connection-property name="rewriteBatchedStatements">true</connection-property>
          <!-- corresponding type-mapping in the standardjbosscmp-jdbc.xml (optional) -->
          <metadata>
                 <type-mapping>mySQL</type-mapping>
          </metadata>
   </local-tx-datasource>
</datasources>

Warning

MYSQL has a default maximum for SQL statement size, and depending on the number of columns, column types, and values, you could exceed that limit. The limit is 1MB, but can be made larger if necessary by setting max_allowed_packet variable in my.cnf. It takes as its value the number of megabytes such as 2MB, or 16MB, etc. If you are dealing with BLOB, or other large data for the values of the columns, keep this limit in mind.

4.5.3.1. Batch Processing; Tests And Results

Below are the parameters and results of a series of performance test conducted to illustrate the effectiveness of the examples above.
Each set of tests has an in-context result comment and there is a Conclusions overview at the end of the section.

Example 4.1. Batch Processing Test

Testing performed with the reference data load function of the Order performance application.

Test Parameters

  • Environment:
    • Quad-core Intel i7
    • 8 GB of RAM
    • MySQL database
    • JBoss Enterprise Application Platform production profile.
  • Table Contents:
    • 41,106 rows
    • 250,000 products
    • 6.25 million inventory records
    • 2 thousand suppliers
    • 1.25 million supplier inventory records
    • 2 million rows for customers and pricing methods.
  • Batching enabled in Hibernate.
  • Batch insert size of 10,000 rows.
  • MySQL Batching refers to the rewriteBatchedStatements parameter (which is set in mysql-ds.xml).

Table 4.1. Batch Processing Test Results

MySQL Batching Throughput (rows per second)
Result 487.7% increase in throughput.
Off 6,734.01
On 39,575.75

Example 4.2. OLTP Baseline Testing

Three tests were run against the OLTP example. This test does not use any batching features and thus sets the baseline for further tests.

Test Parameters

  • Environment:
    • Quad-core Intel i7
    • 8 GB of RAM
    • MySQL database
    • JBoss Enterprise Application Platform production profile.
  • The composite benchmark is run with randomly created orders with 1 to 9 order line items. The average is around 4.5 to 5 order line items per order. This mimics a real world scenario.
  • Batch processing off.
  • rewriteBatchedStatements parameter unset in mysql-ds.xml.

Table 4.2. OLTP Baseline Results

Order Inserts Throughput (orders per second).
Result Single item orders have the highest throughput, with five and nine following respectively.
10,000 order inserts with a single order line item 45.80
10,000 order inserts with five order line items 43.96
10,000 order inserts with nine order line items 43.14

Example 4.3. OLTP Hibernate Batch Processing Test

The second OLTP test was run with Hibernate batching switched on. All other parameters stayed the same as the previous test.

Test Parameters

  • Environment:
    • Quad-core Intel i7
    • 8 GB of RAM
    • MySQL database
    • JBoss Enterprise Application Platform production profile.
  • The composite benchmark is run with randomly created orders with 1 to 9 order line items. The average is around 4.5 to 5 order line items per order. This mimics a real world scenario.
  • Hibernate batch processing on.
  • MySQL batching (rewriteBatchedStatements) unset in mysql-ds.xml.

Table 4.3. OLTP Hibernate Batch Processing Results

Order Inserts Throughput (orders per second).
Result Ordering the inserts lowered throughput slightly; differences range from .24 to .36 orders per second.
10,000 order inserts with a single order line item 45.64
10,000 order inserts with five order line items 43.62
10,000 order inserts with nine order line items 42.78

Example 4.4. OLTP Hibernate and MySQL Batch Processing Test

The third OLTP test enabled both Hibernate and MySQL batching features.

Test Parameters

  • Environment:
    • Quad-core Intel i7
    • 8 GB of RAM
    • MySQL database
    • JBoss Enterprise Application Platform production profile.
  • The composite benchmark is run with randomly created orders with 1 to 9 order line items. The average is around 4.5 to 5 order line items per order. This mimics a real world scenario.
  • Hibernate batch processing on.
  • MySQL batching (rewriteBatchedStatements) on in mysql-ds.xml.

Table 4.4. OLTP Hibernate and MySQL Batch Processing Results

Order Inserts Throughput (orders per second).
Result
An increase in throughput ranging from .21 to .63 orders per second, with nine line item orders showing the highest improvement.
Five line item orders show a .66% improvement over the baseline and nine line item orders show a 1.46% improvement.
10,000 order inserts with a single order line item 46.11
10,000 order inserts with five order line items 44.25
10,000 order inserts with nine order line items 43.77

Example 4.5. Higher Line Items Baseline Test

Some addition testing was performed using higher line item numbers. This test does not use any batching features and thus sets the baseline for further tests.

Test Parameters

  • Environment:
    • Quad-core Intel i7
    • 8 GB of RAM
    • MySQL database
    • JBoss Enterprise Application Platform production profile.
  • The composite benchmark is run with randomly created orders with 1 to 9 order line items. The average is around 4.5 to 5 order line items per order. This mimics a real world scenario.
  • Hibernate batch processing off.
  • MySQL batching (rewriteBatchedStatements) unset in mysql-ds.xml.

Table 4.5. Higher Line Items Baseline Results

Order Inserts Throughput (orders per second).
Result Because 15 and 20 line items orders take much longer, the server can process even more single line item orders.
10,000 order inserts with a single order line item 53.34
10,000 order inserts with 15 order line items 31.25
10,000 order inserts with 20 order line items 30.09

Example 4.6. Higher Line Items Hibernate Batch Processing Test

In this test, Hibernate's batching feature is enabled. All other parameters (including higher line items) remain the same.

Test Parameters

  • Environment:
    • Quad-core Intel i7
    • 8 GB of RAM
    • MySQL database
    • JBoss Enterprise Application Platform production profile.
  • The composite benchmark is run with randomly created orders with 1 to 9 order line items. The average is around 4.5 to 5 order line items per order. This mimics a real world scenario.
  • Hibernate batch processing on.
  • MySQL batching (rewriteBatchedStatements) unset in mysql-ds.xml.

Table 4.6. Higher Line Items Hibernate Batch Processing Results

Order Inserts Throughput (orders per second).
Result
Single line item orders showed a 2.79% increase (an extra 1.49 orders per second).
15 line item orders increased by .06 orders per second and the twenty line item orders decreased by .03 orders per second. These results are within the expected margin of test variations.
10,000 order inserts with a single order line item 54.83
10,000 order inserts with fifteen order line items 31.31
10,000 order inserts with twenty order line items 30.06

Example 4.7. Higher Line Items Hibernate And MySQL Batch Processing Test

In this last test, both the Hibernate and MySQL batching features are tested using higher line items.

Test Parameters

  • Environment:
    • Quad-core Intel i7
    • 8 GB of RAM
    • MySQL database
    • JBoss Enterprise Application Platform production profile.
  • The composite benchmark is run with randomly created orders with 1 to 9 order line items. The average is around 4.5 to 5 order line items per order. This mimics a real world scenario.
  • Hibernate batch processing on.
  • MySQL batching (rewriteBatchedStatements) on in mysql-ds.xml.

Table 4.7. Higher Line Items Hibernate And MySQL Batch Processing Results

Order Inserts Throughput (orders per second).
Result
Single line orders showed a 13.97% increase, 15 line item orders showed an increase of 19.65% and 20 line item orders increased by 18.05%.
10,000 order inserts with a single order line item 60.79
10,000 order inserts with fifteen order line items 37.39
10,000 order inserts with twenty order line items 35.52
Conclusions
For OLTP workloads, if you have situations where you are regularly going to insert tens of rows that could be batch inserted, you should consider batch insertion. Of course, this requires the JDBC driver to have the same capability as the MySQL JDBC driver to convert the multiple inserts into a single insert.
The fact that throughput improvement was found in the smaller test is significant, considering that there were other queries being executed. The customer must be found for each order to be processed, which requires a three-way join from customer to pricing method to distribution center. Each line item looks up its product cost information as well, and then the line items are priced, and a total is calculated for the entire order.
So, the insert does not dominate the time, which is a significant factor in any decision to use these settings for an application. Batch inserts clearly offer an advantage for batch operations but it is not clear if it can also improve OLTP.
It is necessary to understand the typical cases for how many times you have enough rows involved in an insert that can be converted to a batch insert, and how much of the overall processing time the inserts are versus the rest of the processing.
Additionally, batch inserts are much more efficient in database operations, so while you may not get direct benefit in the response time of transactions, that gained efficiency on the database may allow other processing to be better served.