Red Hat Training

A Red Hat training course is available for Red Hat JBoss Web Server

19.2. Understanding Collection Performance

19.2.1. Taxonomy

Hibernate defines three basic kinds of collections:
  • collections of values
  • one-to-many associations
  • many-to-many associations
This classification distinguishes the various table and foreign key relationships but does not tell us quite everything we need to know about the relational model. To fully understand the relational structure and performance characteristics, we must also consider the structure of the primary key that is used by Hibernate to update or delete collection rows. This suggests the following classification:
  • indexed collections
  • sets
  • bags
All indexed collections (maps, lists, and arrays) have a primary key consisting of the <key> and <index> columns. In this case, collection updates are extremely efficient. The primary key can be efficiently indexed and a particular row can be efficiently located when Hibernate tries to update or delete it.
Sets have a primary key consisting of <key> and element columns. This can be less efficient for some types of collection element, particularly composite elements or large text or binary fields, as the database may not be able to index a complex primary key as efficiently. However, for one-to-many or many-to-many associations, particularly in the case of synthetic identifiers, it is likely to be just as efficient. If you want SchemaExport to actually create the primary key of a <set>, you must declare all columns as not-null="true".
<idbag> mappings define a surrogate key, so they are efficient to update. In fact, they are the best case.
Bags are the worst case since they permit duplicate element values and, as they have no index column, no primary key can be defined. Hibernate has no way of distinguishing between duplicate rows. Hibernate resolves this problem by completely removing in a single DELETE and recreating the collection whenever it changes. This can be inefficient.
For a one-to-many association, the "primary key" may not be the physical primary key of the database table. Even in this case, the above classification is still useful. It reflects how Hibernate "locates" individual rows of the collection.

19.2.2. Lists, Maps, idbags and Sets

From the discussion above, it should be clear that indexed collections and sets allow the most efficient operation in terms of adding, removing and updating elements.
There is, arguably, one more advantage that indexed collections have over sets for many-to-many associations or collections of values. Because of the structure of a Set, Hibernate does not UPDATE a row when an element is "changed". Changes to a Set always work via INSERT and DELETE of individual rows. Once again, this consideration does not apply to one-to-many associations.
After observing that arrays cannot be lazy, you can conclude that lists, maps and idbags are the most performant (non-inverse) collection types, with sets not far behind. You can expect sets to be the most common kind of collection in Hibernate applications. This is because the "set" semantics are most natural in the relational model.
However, in well-designed Hibernate domain models, most collections are in fact one-to-many associations with inverse="true". For these associations, the update is handled by the many-to-one end of the association, and so considerations of collection update performance simply do not apply.

19.2.3. Bags and Lists as Inverse Collections

There is a particular case, however, in which bags, and also lists, are much more performant than sets. For a collection with inverse="true", the standard bidirectional one-to-many relationship idiom, for example, we can add elements to a bag or list without needing to initialize (fetch) the bag elements. This is because, unlike a set, Collection.add() or Collection.addAll() must always return true for a bag or List. This can make the following common code much faster:
Parent p = (Parent) sess.load(Parent.class, id);
Child c = new Child();
p.getChildren().add(c);  //no need to fetch the collection!

19.2.4. One Shot Delete

Deleting collection elements one by one can sometimes be extremely inefficient. Hibernate knows not to do that in the case of an newly-empty collection (if you called list.clear(), for example). In this case, Hibernate will issue a single DELETE.
Suppose you added a single element to a collection of size twenty and then remove two elements. Hibernate will issue one INSERT statement and two DELETE statements, unless the collection is a bag. This is certainly desirable.
However, suppose that we remove eighteen elements, leaving two and then add thee new elements. There are two possible ways to proceed
  • delete eighteen rows one by one and then insert three rows
  • remove the whole collection in one SQL DELETE and insert all five current elements one by one
Hibernate cannot know that the second option is probably quicker. It would probably be undesirable for Hibernate to be that intuitive as such behavior might confuse database triggers, etc.
Fortunately, you can force this behavior (i.e. the second strategy) at any time by discarding (i.e. dereferencing) the original collection and returning a newly instantiated collection with all the current elements.
One-shot-delete does not apply to collections mapped inverse="true".