Red Hat Training

A Red Hat training course is available for Red Hat JBoss Enterprise Application Platform

14.4. Manual Index Changes

As Hibernate Core applies changes to the database, Hibernate Search detects these changes and will update the index automatically (unless the EventListeners are disabled). Sometimes changes are made to the database without using Hibernate, as when backup is restored or your data is otherwise affected. In these cases Hibernate Search exposes the Manual Index APIs to explicitly update or remove a single entity from the index, rebuild the index for the whole database, or remove all references to a specific type.
All these methods affect the Lucene Index only, no changes are applied to the database.

14.4.1. Adding Instances to the Index

Using FullTextSession.index(T entity) you can directly add or update a specific object instance to the index. If this entity was already indexed, then the index will be updated. Changes to the index are only applied at transaction commit.

Example 14.61. Indexing an entity via FullTextSession.index(T entity)

FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
Object customer = fullTextSession.load( Customer.class, 8 );
fullTextSession.index(customer);
tx.commit(); //index only updated at commit time
In case you want to add all instances for a type, or for all indexed types, the recommended approach is to use a MassIndexer: see Section 14.4.3.2, “Using a MassIndexer” for more details.

14.4.2. Deleting Instances from the Index

It is equally possible to remove an entity or all entities of a given type from a Lucene index without the need to physically remove them from the database. This operation is named purging and is also done through the FullTextSession.

Example 14.62. Purging a specific instance of an entity from the index

FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
    fullTextSession.purge( Customer.class, customer.getId() );
}
tx.commit(); //index is updated at commit time
Purging will remove the entity with the given id from the Lucene index but will not touch the database.
If you need to remove all entities of a given type, you can use the purgeAll method. This operation removes all entities of the type passed as a parameter as well as all its subtypes.

Example 14.63. Purging all instances of an entity from the index

FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
fullTextSession.purgeAll( Customer.class );
//optionally optimize the index
//fullTextSession.getSearchFactory().optimize( Customer.class );
tx.commit(); //index changes are applied at commit time
It is recommended to optimize the index after such an operation.

Note

Methods index, purge, and purgeAll are available on FullTextEntityManager as well.

Note

All manual indexing methods (index, purge, and purgeAll) only affect the index, not the database, nevertheless they are transactional and as such they won't be applied until the transaction is successfully committed, or you make use of flushToIndexes.

14.4.3. Rebuilding the Index

If you change the entity mapping to the index, chances are that the whole Index needs to be updated; For example if you decide to index a an existing field using a different analyzer you'll need to rebuild the index for affected types. Also if the Database is replaced (like restored from a backup, imported from a legacy system) you'll want to be able to rebuild the index from existing data. Hibernate Search provides two main strategies to choose from:
  • Using FullTextSession.flushToIndexes() periodically, while using FullTextSession.index() on all entities.
  • Use a MassIndexer.

14.4.3.1. Using flushToIndexes()

This strategy consists of removing the existing index and then adding all entities back to the index using FullTextSession.purgeAll() and FullTextSession.index(), however there are some memory and efficiency constraints. For maximum efficiency Hibernate Search batches index operations and executes them at commit time. If you expect to index a lot of data you need to be careful about memory consumption since all documents are kept in a queue until the transaction commit. You can potentially face an OutOfMemoryException if you don't empty the queue periodically; to do this use fullTextSession.flushToIndexes(). Every time fullTextSession.flushToIndexes() is called (or if the transaction is committed), the batch queue is processed, applying all index changes. Be aware that, once flushed, the changes cannot be rolled back.

Example 14.64. Index rebuilding using index() and flushToIndexes()

fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
transaction = fullTextSession.beginTransaction();
//Scrollable results will avoid loading too many objects in memory
ScrollableResults results = fullTextSession.createCriteria( Email.class )
    .setFetchSize(BATCH_SIZE)
    .scroll( ScrollMode.FORWARD_ONLY );
int index = 0;
while( results.next() ) {
    index++;
    fullTextSession.index( results.get(0) ); //index each element
    if (index % BATCH_SIZE == 0) {
        fullTextSession.flushToIndexes(); //apply changes to indexes
        fullTextSession.clear(); //free memory since the queue is processed
    }
}
transaction.commit();

Note

hibernate.search.default.worker.batch_size has been deprecated in favor of this explicit API which provides better control
Try to use a batch size that guarantees that your application will not be out of memory: with a bigger batch size objects are fetched faster from database but more memory is needed.

14.4.3.2. Using a MassIndexer

Hibernate Search's MassIndexer uses several parallel threads to rebuild the index. You can optionally select which entities need to be reloaded or have it reindex all entities. This approach is optimized for best performance but requires to set the application in maintenance mode. Querying the index is not recommended when a MassIndexer is busy.

Example 14.65. Rebuild the Index Using a MassIndexer

fullTextSession.createIndexer().startAndWait();
This will rebuild the index, deleting it and then reloading all entities from the database. Although it is simple to use, some tweaking is recommended to speed up the process.

Warning

During the progress of a MassIndexer the content of the index is undefined! If a query is performed while the MassIndexer is working most likely some results will be missing.

Example 14.66. Using a Tuned MassIndexer

fullTextSession
 .createIndexer( User.class )
 .batchSizeToLoadObjects( 25 )
 .cacheMode( CacheMode.NORMAL )
 .threadsToLoadObjects( 12 )
 .idFetchSize( 150 )
 .progressMonitor( monitor ) //a MassIndexerProgressMonitor implementation
 .startAndWait();
This will rebuild the index of all User instances (and subtypes), and will create 12 parallel threads to load the User instances using batches of 25 objects per query. These same 12 threads will also need to process indexed embedded relations and custom FieldBridges or ClassBridges to output a Lucene document. The threads trigger lazyloading of additional attributes during the conversion process. Because of this, a high number of threads working in parallel is required. The number of threads working on actual index writing is defined by the backend configuration of each index.
It is recommended to leave cacheMode to CacheMode.IGNORE (the default), as in most reindexing situations the cache will be a useless additional overhead. It might be useful to enable some other CacheMode depending on your data as it could increase performance if the main entity is relating to enum-like data included in the index.

Note

The ideal of number of threads to achieve best performance is highly dependent on your overall architecture, database design and data values. All internal thread groups have meaningful names so they should be easily identified with most diagnostic tools, including threaddumps.

Note

The MassIndexer is unaware of transactions, therefore there is no need to begin one or commit afterward. Because it is not transactional it is not recommended to let users use the system during its processing, as it is unlikely people will be able to find results and the system load might be too high anyway.
Other parameters which affect indexing time and memory consumption are:
  • hibernate.search.[default|<indexname>].exclusive_index_use
  • hibernate.search.[default|<indexname>].indexwriter.max_buffered_docs
  • hibernate.search.[default|<indexname>].indexwriter.max_merge_docs
  • hibernate.search.[default|<indexname>].indexwriter.merge_factor
  • hibernate.search.[default|<indexname>].indexwriter.merge_min_size
  • hibernate.search.[default|<indexname>].indexwriter.merge_max_size
  • hibernate.search.[default|<indexname>].indexwriter.merge_max_optimize_size
  • hibernate.search.[default|<indexname>].indexwriter.merge_calibrate_by_deletes
  • hibernate.search.[default|<indexname>].indexwriter.ram_buffer_size
  • hibernate.search.[default|<indexname>].indexwriter.term_index_interval
Previous versions also had a max_field_length but this was removed from Lucene, it's possible to obtain a similar effect by using a LimitTokenCountAnalyzer.
All .indexwriter parameters are Lucene specific and Hibernate Search passes these parameters through.
The MassIndexer uses a forward only scrollable result to iterate on the primary keys to be loaded, but MySQL's JDBC driver will load all values in memory. To avoid this "optimization" set idFetchSize to Integer.MIN_VALUE.