Chapter 23. Hibernate Search
23.1. Getting Started with Hibernate Search
23.1.1. About Hibernate Search
Hibernate Search provides full-text search capability to Hibernate applications. It is especially suited to search applications for which SQL-based solutions are not suited, including: full-text, fuzzy and geolocation searches. Hibernate Search uses Apache Lucene as its full-text search engine, but is designed to minimize the maintenance overhead. Once it is configured, indexing, clustering and data synchronization is maintained transparently, allowing you to focus on meeting your business requirements.
Hibernate Search consists of an indexing component as well as an index search component, both are backed by Apache Lucene. Each time an entity is inserted, updated or removed in/from the database, Hibernate Search keeps track of this event (through the Hibernate event system) and schedules an index update. All these updates are handled without you having to interact with the Apache Lucene APIs directly. Instead, interaction with the underlying Lucene indexes is handled via an
Once the index is created, you can search for entities and return lists of managed entities instead of dealing with the underlying Lucene infrastructure. The same persistence context is shared between Hibernate and Hibernate Search. The
FullTextSessionclass is built on top of the Hibernate
Sessionclass so that the application code can use the unified
javax.persistence.QueryAPIs exactly the same way an HQL, JPA-QL, or native query would do.
It is recommended - for both your database and Hibernate Search - to execute your operations in a transaction, be it JDBC or JTA.
Hibernate Search works perfectly fine in the Hibernate / EntityManager long conversation pattern, known as atomic conversation.
23.1.3. About the Index Manager
Each time an entity is inserted, updated or removed from the database, Hibernate Search keeps track of this event through the Hibernate event system and schedules an index update. Interaction with the underlying Lucene indexes is handled by an IndexManager, each of which is uniquely identified by name. By default there is a one-to-one relationship between IndexManager and Lucene index. The IndexManager abstracts the specific index configuration, including the selected backend, reader strategy and the chosen DirectoryProvider.
23.1.4. About the Directory Provider
Apache Lucene, which is part of the Hibernate Search infrastructure, has the concept of a Directory for storage of indexes. Hibernate Search handles the initialization and configuration of a Lucene Directory instance via a Directory Provider.
directory_providerproperty specifies the directory provider to be used to store the indexes. The default filesystem directory provider is
filesystem, which uses the local filesystem to store indexes.
23.1.5. About the Worker
Updates to Lucene indexes are handled by the Hibernate Search Worker, which receives all entity changes, queues them by context and applies them once a context ends. The most common context is the transaction, but may be dependent on the number of entity changes or some other application (life cycle) events.
For better efficiency, interactions are batched and generally applied once the context ends. Outside a transaction, the index update operation is executed right after the actual database operation. In the case of an ongoing transaction, the index update operation is scheduled for the transaction commit phase and discarded in case of transaction rollback. A worker may be configured with a specific batch size limit, after which indexing occurs regardless of the context.
For details of Worker configuration options see Section 23.2.5, “Worker Configuration”.
There are two immediate benefits to this method of handling index updates:
- Performance: Lucene indexing works better when operation are executed in batch.
- ACIDity: The work executed has the same scoping as the one executed by the database transaction and is executed if and only if the transaction is committed. This is not ACID in the strict sense, but ACID behavior is rarely useful for full text search indexes since they can be rebuilt from the source at any time.
The two batch modes - no scope vs transactional - are the equivalent of autocommit versus transactional behavior. From a performance perspective, the transactional mode is recommended. The scoping choice is made transparently. Hibernate Search detects the presence of a transaction and adjust the scoping (see Section 23.2.5, “Worker Configuration”).
23.1.6. Back End Setup and Operations
18.104.22.168. Back End
Hibernate Search uses various back ends to process batches of work. The back end is not limited to the configuration option
default.worker.backend. This property specifies a implementation of the
BackendQueueProcessorinterface which is a part of a back end configuration. Additional settings are required to set up a back end, for example the JMS back end.
In the Lucene mode, all index updates for a node (JVM) are executed by the same node to the Lucene directories using the directory providers. Use this mode in a non-clustered environment or in clustered environments with a shared directory store.
Figure 23.1. Lucene Back End Configuration
Lucene mode targets non-clustered or clustered applications where the
Directorymanages the locking strategy. The primary advantage of Lucene mode is simplicity and immediate visibility of changes in Lucene queries. The Near Real Time (NRT) back end is an alternate back end for non-clustered and non-shared index configurations.
Index updates for a node are sent to the JMS queue. A unique reader processes the queue and updates the master index. The master index is subsequently replicated regularly to slave copies to establish the master/slave pattern. The master is responsible for Lucene index updates. The slaves accept read and write operations but process read operations on local index copies. The master is the sole responsible for updating the Lucene index. Only the master applies the local changes in an update operation.
Figure 23.2. JMS Backend Configuration
This mode targets clustered environments where throughput is critical and index update delays are affordable. The JMS provider ensures reliability and uses the slaves to change the local index copies.
23.1.7. Reader Strategies
When executing a query, Hibernate Search uses a reader strategy to interact with the Apache Lucene indexes. Choose a reader strategy based on the profile of the application (frequent updates, read mostly, asynchronous index update, etc).
22.214.171.124. The Shared Strategy
sharedstrategy, Hibernate Search shares the same
IndexReaderfor a given Lucene index across multiple queries and threads provided that the
IndexReaderremains updated. If the
IndexReaderis not updated, a new one is opened and provided. Each
IndexReaderis made of several
SegmentReaders. The shared strategy reopens segments that have been modified or created after the last opening and shares the already loaded segments from the previous instance. This is the default strategy.
126.96.36.199. The Not-shared Strategy
not-sharedstrategy, a Lucene
IndexReaderopens every time a query executes. Opening and starting up a
IndexReaderis an expensive operation. As a result, opening an
IndexReaderfor each query execution is not an efficient strategy.
188.8.131.52. Custom Reader Strategies
You can write a custom reader strategy using an implementation of
org.hibernate.search.reader.ReaderProvider. The implementation must be thread safe.
184.108.40.206. Reader Strategy Configuration
Change the strategy from the default (
hibernate.search.[default|<indexname>].reader.strategy = not-shared
Alternately, customize the reader strategy by replacing
my.corp.myapp.CustomReaderProviderwith the custom strategy implementation:
hibernate.search.[default|<indexname>].reader.strategy = my.corp.myapp.CustomReaderProvider