Apache Lucene Integration for use with Red Hat JBoss Enterprise Application Platform
Legal Notice
Abstract
- 1. Hibernate Search Future Distribution
- 2. Introduction to Red Hat JBoss Web Framework Kit
- 3. Getting Started with Hibernate Search
- 4. Architecture
- 5. Configuration
- 5.1. Hibernate Search and Automatic Indexing
- 5.2. Configuring the IndexManager
- 5.3. Directory Configuration
- 5.4. Worker Configuration
- 5.5. Tuning Lucene Indexing
- 5.6. LockFactory Configuration
- 5.7. Exception Handling Configuration
- 5.8. Index Format Compatibility
- 5.9. Hibernate Search as Red Hat JBoss EAP Module
- 6. Mapping Entities to the Index Structure
- 7. Querying
- 8. Manual Index Changes
- 9. Index Optimization
- 10. Monitoring
- 11. Advanced Features
- A. Revision History
Chapter 1. Hibernate Search Future Distribution
Chapter 2. Introduction to Red Hat JBoss Web Framework Kit
2.1. About Red Hat JBoss Web Framework Kit
2.2. About the JBoss Web Framework Kit Tiers
- Tier 1 - Included Components
- These are components that are based wholly or partly on open source technologies that support broad collaboration and where Red Hat maintains a leadership role; as such Red Hat is able to support these components and provide upgrades and fixes under our standard support terms and conditions.
- Tier 2 - Tested Frameworks
- These are third party frameworks where Red Hat does not have sufficient influence and does not provide upgrades and fixes under our standard support terms and conditions. Commercially reasonable support is provided by Red Hat Global Support Services for these frameworks.
- Tier 3 - Frameworks in Tested Examples
- These are third party frameworks where Red Hat does not have sufficient influence and does not provide upgrades and fixes under our standard support terms and conditions. Red Hat supports the examples these frameworks are used in and the generic use cases that these examples intend to demonstrate.
- Tier 4 - Confirmed Frameworks
- These are third party frameworks that do not receive any support from Red Hat, but Red Hat verifies that the frameworks run successfully on Red Hat JBoss Enterprise Application Platform. Frameworks and versions not listed here have not been explicitly tested and certified, and thus may be subject to support limitations.
2.3. About the JBoss Web Framework Kit Distribution
- TicketMonster is a moderately complex application demonstrating a number of the JBoss Web Framework Kit frameworks working together.
- Quickstarts illustrate subsets of the JBoss Web Framework Kit frameworks used to create simple applications.
- RichFaces, Snowdrop and Seam demonstrations showcase the power of each framework in web application development.
Chapter 3. Getting Started with Hibernate Search
3.1. Getting Started
3.2. System Requirements
Table 3.1. System requirements
| Java Runtime | A JDK or JRE version 6 or greater. You can download a Java Runtime for Windows/Linux/Solaris here. If using Java version 7 make sure you avoid builds 0 and 1: those versions contained an optimisation bug which would be triggered by Lucene. |
| Hibernate Search | hibernate-search-4.4.4.Final-redhat-wfk-1.jar and all runtime dependencies. You can get the jar artifacts from your distribution of Red Hat JBoss Web Framework Kit. |
| Hibernate ORM | These instructions have been tested against Hibernate 4.2 distributed with Red Hat JBoss Enterprise Application Platform 6. |
| JPA 2 | Even though Hibernate Search can be used without JPA annotations the following instructions will use them for basic entity configuration (@Entity, @Id, @OneToMany,...). This part of the configuration could also be expressed in xml or code. |
3.3. Using Maven
<dependencyManagement> <dependencies> <dependency> <groupId>org.jboss.bom.wfk</groupId> <artifactId>jboss-javaee-6.0-with-hsearch</artifactId> <version>2.7.0-redhat-1</version> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search</artifactId> </dependency> </dependencies>
3.4. Configuration
hibernate.properties or hibernate.cfg.xml. If you are using Hibernate via JPA you can also add the properties to persistence.xml. The good news is that for standard use most properties offer a sensible default. An example persistence.xml configuration could look like this:
Example 3.1. Basic configuration options to be added to , hibernate.properties or hibernate.cfg.xmlpersistence.xml
... <property name="hibernate.search.default.directory_provider" value="filesystem"/> <property name="hibernate.search.default.indexBase" value="/var/lucene/indexes"/> ...
DirectoryProvider to use. This can be achieved by setting the hibernate.search.default.directory_provider property. Apache Lucene has the notion of a Directory to store the index files. Hibernate Search handles the initialization and configuration of a Lucene Directory instance via a DirectoryProvider. In this tutorial we will use a directory provider storing the index in the file system. This will give us the ability to physically inspect the Lucene indexes created by Hibernate Search (eg via Luke). Once you have a working configuration you can start experimenting with other directory providers (see Section 5.3, “Directory Configuration”). Next to the directory provider you also have to specify the default base directory for all indexes via hibernate.search.default.indexBase.
example.Book and example.Author and you want to add free text search capabilities to your application in order to search the books contained in your database.
Example 3.2. Example Entities Book and Author Before Adding Hibernate Search Specific Annotations
package example; ... @Entity public class Book { @Id @GeneratedValue private Integer id; private String title; private String subtitle; @ManyToMany private Set<Author> authors = new HashSet<Author>(); private Date publicationDate; public Book() {} // standard getters/setters follow here ... }
package example; ... @Entity public class Author { @Id @GeneratedValue private Integer id; private String name; public Author() {} // standard getters/setters follow here ... }
Book and Author class. The first annotation @Indexed marks Book as indexable. By design Hibernate Search needs to store an untokenized id in the index to ensure index unicity for a given entity. @DocumentId marks the property to use for this purpose and is in most cases the same as the database primary key. The @DocumentId annotation is optional in the case where an @Id annotation exists.
title and subtitle and annotate both with @Field. The parameter index=Index.YES will ensure that the text will be indexed, while analyze=Analyze.YES ensures that the text will be analyzed using the default Lucene analyzer. Usually, analyzing means chunking a sentence into individual words and potentially excluding common words like 'a' or 'the'. We will talk more about analyzers a little later on. The third parameter we specify within @Field, store=Store.NO, ensures that the actual data will not be stored in the index. Whether this data is stored in the index or not has nothing to do with the ability to search for it. From Lucene's perspective it is not necessary to keep the data once the index is created. The benefit of storing it is the ability to retrieve it via projections ( see Section 7.1.10.5, “Projection”).
index=Index.YES, analyze=Analyze.YES and store=Store.NO are the default values for these parameters and could be ommitted.
Book class. Another annotation we have not yet discussed is @DateBridge. This annotation is one of the built-in field bridges in Hibernate Search. The Lucene index is purely string based. For this reason Hibernate Search must convert the data types of the indexed fields to strings and vice-versa. A range of predefined bridges are provided, including the DateBridge which will convert a java.util.Date into a String with the specified resolution. For more details see Section 6.4, “Bridges”.
@IndexedEmbedded.This annotation is used to index associated entities (@ManyToMany, @*ToOne, @Embedded and @ElementCollection) as part of the owning entity. This is needed since a Lucene index document is a flat data structure which does not know anything about object relations. To ensure that the authors' name will be searchable you have to make sure that the names are indexed as part of the book itself. On top of @IndexedEmbedded you will also have to mark all fields of the associated entity you want to have included in the index with @Indexed. For more details see Section 6.1.3, “Embedded and Associated Objects”
Example 3.3. Example entities after adding Hibernate Search annotations
package example; ... @Entity @Indexed public class Book { @Id @GeneratedValue private Integer id; @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO) private String title; @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO) private String subtitle; @Field(index = Index.YES, analyze=Analyze.NO, store = Store.YES) @DateBridge(resolution = Resolution.DAY) private Date publicationDate; @IndexedEmbedded @ManyToMany private Set<Author> authors = new HashSet<Author>(); public Book() { } // standard getters/setters follow here ... }
package example; ... @Entity public class Author { @Id @GeneratedValue private Integer id; @Field private String name; public Author() { } // standard getters/setters follow here ... }
3.5. Indexing
Example 3.4. Using the Hibernate Session to Index Data
FullTextSession fullTextSession = org.hibernate.search.Search.getFullTextSession(session); fullTextSession.createIndexer().startAndWait();
Example 3.5. Using JPA to Index Data
EntityManager em = entityManagerFactory.createEntityManager(); FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search.getFullTextEntityManager(em); fullTextEntityManager.createIndexer().startAndWait();
/var/lucene/indexes/example.Book. Go ahead an inspect this index with Luke. It will help you to understand how Hibernate Search works.
3.6. Searching
org.hibernate.Query to get the required functionality from the Hibernate API. The following code prepares a query against the indexed fields. Executing the code returns a list of Books.
Example 3.6. Using a Hibernate Search Session to Create and Execute a Search
FullTextSession fullTextSession = Search.getFullTextSession(session); Transaction tx = fullTextSession.beginTransaction(); // create native Lucene query unsing the query DSL // alternatively you can write the Lucene query using the Lucene query parser // or the Lucene programmatic API. The Hibernate Search DSL is recommended though QueryBuilder qb = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity( Book.class ).get(); org.apache.lucene.search.Query query = qb .keyword() .onFields("title", "subtitle", "authors.name", "publicationDate") .matching("Java rocks!") .createQuery(); // wrap Lucene query in a org.hibernate.Query org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(query, Book.class); // execute search List result = hibQuery.list(); tx.commit(); session.close();
Example 3.7. Using JPA to Create and Execute a Search
EntityManager em = entityManagerFactory.createEntityManager(); FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search.getFullTextEntityManager(em); em.getTransaction().begin(); // create native Lucene query unsing the query DSL // alternatively you can write the Lucene query using the Lucene query parser // or the Lucene programmatic API. The Hibernate Search DSL is recommended though QueryBuilder qb = fullTextEntityManager.getSearchFactory() .buildQueryBuilder().forEntity( Book.class ).get(); org.apache.lucene.search.Query query = qb .keyword() .onFields("title", "subtitle", "authors.name", "publicationDate") .matching("Java rocks!") .createQuery(); // wrap Lucene query in a javax.persistence.Query javax.persistence.Query persistenceQuery = fullTextEntityManager.createFullTextQuery(query, Book.class); // execute search List result = persistenceQuery.getResultList(); em.getTransaction().commit(); em.close();
3.7. Analyzer
Refactoring: Improving the Design of Existing Code and that hits are required for the following queries: refactor, refactors, refactored, and refactoring. Select an analyzer class in Lucene that applies word stemming when indexing and searching. Hibernate Search offers several ways to configure the analyzer (see Section 6.3.1, “Default Analyzer and Analyzer by Class” for more information):
- Set the
analyzerproperty in the configuration file. The specified class becomes the default analyzer. - Set the
annotation at the entity level.@Analyzer - Set the
@annotation at the field level.Analyzer
@AnalyzerDef annotation with the @Analyzer annotation. The Solr analyzer framework with its factories are utilized for the latter option. For more information about factory classes, see the Solr JavaDoc or read the corresponding section on the Solr Wiki (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters)
StandardTokenizerFactory is used by two filter factories: LowerCaseFilterFactory and SnowballPorterFilterFactory. The tokenizer splits words at punctuation characters and hyphens but keeping email addresses and internet hostnames intact. The standard tokenizer is ideal for this and other general operations. The lowercase filter converts all letters in the token into lowercase and the snowball filter applies language specific stemming.
Example 3.8. Using @AnalyzerDef and the Solr Framework to Define and Use an Analyzer
@Indexed @AnalyzerDef( name = "customanalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = { @Parameter(name = "language", value = "English") }) }) public class Book implements Serializable { @Field @Analyzer(definition = "customanalyzer") private String title; @Field @Analyzer(definition = "customanalyzer") private String subtitle; @IndexedEmbedded private Set authors = new HashSet(); @Field(index = Index.YES, analyze = Analyze.NO, store = Store.YES) @DateBridge(resolution = Resolution.DAY) private Date publicationDate; public Book() { } // standard getters/setters follow here ... }
@AnalyzerDef to define an analyzer, then apply it to entities and properties using @Analyzer. In the example, the customanalyzer is defined but not applied on the entity. The analyzer is only applied to the title and subtitle properties. An analyzer definition is global. Define the analyzer for an entity and reuse the definition for other entities as required.
Chapter 4. Architecture
4.1. Overview
IndexManagers.
IndexManager. The exceptions are the use cases of index sharding and index sharing. The former can be applied when the index for a single entity becomes too big and indexing operations are slowing down the application. In this case a single entity is indexed into multiple indexes each with its own index manager (see Section 11.4, “Sharding Indexes”). The latter, index sharing, is the ability to index multiple entities into the same Lucene index (see Section 11.5, “Sharing Indexes”).
DirectoryProvider. These components will be discussed in greater detail later on. It is recommended that you start with the default index manager which uses different Lucene Directory types to manage the indexes (see Section 5.3, “Directory Configuration”). You can, however, also provide your own IndexManager implementation (see Section 5.2, “Configuring the IndexManager”).
Document mapping. The same persistence context is shared between Hibernate and Hibernate Search. As a matter of fact, the FullTextSession is built on top of the Hibernate Session so that the application code can use the unified org.hibernate.Query or javax.persistence.Query APIs exactly the same way a HQL, JPA-QL or native query would do.
Worker. There are currently two types of batching. Outside a transaction, the index update operation is executed right after the actual database operation. This is really a no batching setup. In the case of an ongoing transaction, the index update operation is scheduled for the transaction commit phase and discarded in case of transaction rollback. The batching scope is the transaction. There are two immediate benefits:
- Performance: Lucene indexing works better when operation are executed in batch.
- ACIDity: The work executed has the same scoping as the one executed by the database transaction and is executed if and only if the transaction is committed. This is not ACID in the strict sense of it, but ACID behavior is rarely useful for full text search indexes since they can be rebuilt from the source at any time.
Note
Note
4.2. Back End Setup and Operations
4.2.1. Back End
default.worker.backend. This property specifies a implementation of the BackendQueueProcessor interface which is a part of a back end configuration. Additional settings are required to set up a back end, for example the JMS back end.
4.2.2. Lucene
Directory manages the locking strategy. The primary advantage of Lucene mode is simplicity and immediate visibility of changes in Lucene queries. The Near Real Time (NRT) back end is an alternate back end for non-clustered and non-shared index configurations.
4.2.3. JMS
4.3. Reader Strategies
4.3.1. The Shared Strategy
shared strategy, Hibernate Search shares the same IndexReader for a given Lucene index across multiple queries and threads provided that the IndexReader remains updated. If the IndexReader is not updated, a new one is opened and provided. Each IndexReader is made of several SegmentReaders. The shared strategy reopens segments that have been modified or created after the last opening and shares the already loaded segments from the previous instance. This is the default strategy.
4.3.2. The Not-shared Strategy
not-shared strategy, a Lucene IndexReader opens every time a query executes. Opening and starting up a IndexReader is an expensive operation. As a result, opening an IndexReader for each query execution is not an efficient strategy.
4.3.3. Custom Reader Strategies
org.hibernate.search.reader.ReaderProvider. The implementation must be thread safe.
4.3.4. Reader Strategy Configuration
shared) to not-shared as follows:
hibernate.search.[default|<indexname>].reader.strategy = not-shared
my.corp.myapp.CustomReaderProvider with the custom strategy implementation:
hibernate.search.[default|<indexname>].reader.strategy = my.corp.myapp.CustomReaderProvider
Chapter 5. Configuration
- 5.1. Hibernate Search and Automatic Indexing
- 5.2. Configuring the IndexManager
- 5.3. Directory Configuration
- 5.4. Worker Configuration
- 5.5. Tuning Lucene Indexing
- 5.6. LockFactory Configuration
- 5.7. Exception Handling Configuration
- 5.8. Index Format Compatibility
- 5.9. Hibernate Search as Red Hat JBoss EAP Module
5.1. Hibernate Search and Automatic Indexing
5.1.1. Enable and Disable Hibernate Search
hibernate.search.autoregister_listeners = false
5.1.2. Automatic Indexing
hibernate.search.indexing_strategy = manual
5.2. Configuring the IndexManager
directory-based: the default implementation which uses the LuceneDirectoryabstraction to manage index files.near-real-time: avoids flushing writes to disk at each commit. This index manager is alsoDirectorybased, but uses Lucene's NRT functionality.
hibernate.search.[default|<indexname>].indexmanager = near-real-time
5.2.1. Directory-based
Directory-based implementation is the default IndexManager implementation. It is highly configurable and allows separate configurations for the reader strategy, back ends, and directory providers. Refer Section 5.3, “Directory Configuration”, Section 5.4, “Worker Configuration” and Section 4.3.4, “Reader Strategy Configuration” for more details.
5.2.2. Near Real Time
NRTIndexManager is an extension of the default IndexManager and leverages the Lucene NRT (Near Real Time) feature for low latency index writes. However, it ignores configuration settings for alternative back ends other than lucene and acquires exclusive write locks on the Directory.
IndexWriter does not flush every change to the disk to provide low latency. Queries can read the updated states from the unflushed index writer buffers. However, this means that if the IndexWriter is killed or the application crashes, updates can be lost so the indexes must be rebuilt.
5.2.3. Custom
IndexManager. Set up a no-argument constructor for the implementation as follows:
[default|<indexname>].indexmanager = my.corp.myapp.CustomIndexManager
Directory interface.
5.3. Directory Configuration
Directory to store the index files. The Directory implementation can be customized and Lucene comes bundled with a file system and an in-memory implementation. DirectoryProvider is the Hibernate Search abstraction around a Lucene Directory and handles the configuration and the initialization of the underlying Lucene resources. Table 5.1, “List of built-in DirectoryProvider” shows the list of the directory providers available in Hibernate Search together with their corresponding options.
DirectoryProvider you have to understand that each indexed entity is associated to a Lucene index (except of the case where multiple entities share the same index - Section 11.5, “Sharing Indexes”). The name of the index is given by the index property of the @Indexed annotation. If the index property is not specified the fully qualified name of the indexed class will be used as name (recommended).
hibernate.search.<indexname>. The name default (hibernate.search.default) is reserved and can be used to define properties which apply to all indexes. Example 5.2, “Configuring Directory Providers” shows how hibernate.search.default.directory_provider is used to set the default directory provider to be the filesystem one. hibernate.search.default.indexBase sets then the default base directory for the indexes. As a result the index for the entity Status is created in /usr/lucene/indexes/org.hibernate.example.Status.
Rule entity, however, is using an in-memory directory, because the default directory provider for this entity is overridden by the property hibernate.search.Rules.directory_provider.
Action entity uses a custom directory provider CustomDirectoryProvider specified via hibernate.search.Actions.directory_provider.
Example 5.1. Specifying the Index Name
package org.hibernate.example; @Indexed public class Status { ... } @Indexed(index="Rules") public class Rule { ... } @Indexed(index="Actions") public class Action { ... }
Example 5.2. Configuring Directory Providers
hibernate.search.default.directory_provider = filesystem hibernate.search.default.indexBase=/usr/lucene/indexes hibernate.search.Rules.directory_provider = ram hibernate.search.Actions.directory_provider = com.acme.hibernate.CustomDirectoryProvider
Note
Table 5.1. List of built-in DirectoryProvider
| Name and description | Properties |
|---|---|
ram: Memory based directory, the directory will be uniquely identified (in the same deployment unit) by the @Indexed.index element | none |
| filesystem: File system based directory. The directory used will be <indexBase>/< indexName > | indexBase : base directory
indexName: override @Indexed.index (useful for sharded indexes)
locking_strategy : optional, see Section 5.6, “LockFactory Configuration”
filesystem_access_type: allows to determine the exact type of FSDirectory implementation used by this DirectoryProvider. Allowed values are auto (the default value, selects NIOFSDirectory on non Windows systems, SimpleFSDirectory on Windows), simple (SimpleFSDirectory), nio (NIOFSDirectory), mmap (MMapDirectory). Make sure to refer to Javadocs of these Directory implementations before changing this setting. Even though NIOFSDirectory or MMapDirectory can bring substantial performance boosts they also have their issues.
|
|
filesystem-master: File system based directory. Like
filesystem. It also copies the index to a source directory (aka copy directory) on a regular basis.
The recommended value for the refresh period is (at least) 50% higher that the time to copy the information (default 3600 seconds - 60 minutes).
Note that the copy is based on an incremental copy mechanism reducing the average copy time.
DirectoryProvider typically used on the master node in a JMS back end cluster.
The
buffer_size_on_copy optimum depends on your operating system and available RAM; most people reported good results using values between 16 and 64MB.
| indexBase: base directory
indexName: override @Indexed.index (useful for sharded indexes)
sourceBase: source (copy) base directory.
source: source directory suffix (default to @Indexed.index). The actual source directory name being <sourceBase>/<source>
refresh: refresh period in seconds (the copy will take place every refresh seconds). If a copy is still in progress when the following refresh period elapses, the second copy operation will be skipped.
buffer_size_on_copy: The amount of MegaBytes to move in a single low level copy instruction; defaults to 16MB.
locking_strategy : optional, see Section 5.6, “LockFactory Configuration”
filesystem_access_type: allows to determine the exact type of FSDirectory implementation used by this DirectoryProvider. Allowed values are auto (the default value, selects NIOFSDirectory on non Windows systems, SimpleFSDirectory on Windows), simple (SimpleFSDirectory), nio (NIOFSDirectory), mmap (MMapDirectory). Make sure to refer to Javadocs of these Directory implementations before changing this setting. Even though NIOFSDirectory or MMapDirectory can bring substantial performace boosts they also have their issues.
|
|
filesystem-slave: File system based directory. Like
filesystem, but retrieves a master version (source) on a regular basis. To avoid locking and inconsistent search results, 2 local copies are kept.
The recommended value for the refresh period is (at least) 50% higher that the time to copy the information (default 3600 seconds - 60 minutes).
Note that the copy is based on an incremental copy mechanism reducing the average copy time. If a copy is still in progress when
refresh period elapses, the second copy operation will be skipped.
DirectoryProvider typically used on slave nodes using a JMS back end.
The
buffer_size_on_copy optimum depends on your operating system and available RAM; most people reported good results using values between 16 and 64MB.
| indexBase: Base directory
indexName: override @Indexed.index (useful for sharded indexes)
sourceBase: Source (copy) base directory.
source: Source directory suffix (default to @Indexed.index). The actual source directory name being <sourceBase>/<source>
refresh: refresh period in second (the copy will take place every refresh seconds).
buffer_size_on_copy: The amount of MegaBytes to move in a single low level copy instruction; defaults to 16MB.
locking_strategy : optional, see Section 5.6, “LockFactory Configuration”
retry_marker_lookup : optional, default to 0. Defines how many times we look for the marker files in the source directory before failing. Waiting 5 seconds between each try.
retry_initialize_period : optional, set an integer value in seconds to enable the retry initialize feature: if the slave can't find the master index it will try again until it's found in background, without preventing the application to start: fullText queries performed before the index is initialized are not blocked but will return empty results. When not enabling the option or explicitly setting it to zero it will fail with an exception instead of scheduling a retry timer. To prevent the application from starting without an invalid index but still control an initialization timeout, see retry_marker_lookup instead.
filesystem_access_type: allows to determine the exact type of FSDirectory implementation used by this DirectoryProvider. Allowed values are auto (the default value, selects NIOFSDirectory on non Windows systems, SimpleFSDirectory on Windows), simple (SimpleFSDirectory), nio (NIOFSDirectory), mmap (MMapDirectory). Make sure to refer to Javadocs of these Directory implementations before changing this setting. Even though NIOFSDirectory or MMapDirectory can bring substantial performance boosts they also have their issues.
|
Note
org.hibernate.store.DirectoryProvider interface. In this case, pass the fully qualified class name of your provider into the directory_provider property. You can pass any additional properties using the prefix hibernate.search.<indexname>.
5.4. Worker Configuration
Worker. An implementation of the Worker interface is responsible for receiving all entity changes, queuing them by context and applying them once a context ends. The most intuitive context, especially in connection with ORM, is the transaction. For this reason Hibernate Search will per default use the TransactionalWorker to scope all changes per transaction. One can, however, imagine a scenario where the context depends for example on the number of entity changes or some other application (lifecycle) events. For this reason the Worker implementation is configurable as shown in Table 5.2, “Scope configuration”.
Table 5.2. Scope configuration
| Property | Description |
| hibernate.search.worker.scope | The fully qualified class name of the Worker implementation to use. If this property is not set, empty or transaction the default TransactionalWorker is used. |
| hibernate.search.worker.* | All configuration properties prefixed with hibernate.search.worker are passed to the Worker during initialization. This allows adding custom, worker specific parameters. |
| hibernate.search.worker.batch_size | Defines the maximum number of indexing operation batched per context. Once the limit is reached indexing will be triggered even though the context has not ended yet. This property only works if the Worker implementation delegates the queued work to BatchedQueueingProcessor (which is what the TransactionalWorker does) |
Note
default to set the default value for all indexes.
Table 5.3. Execution configuration
| Property | Description |
| hibernate.search.<indexName>.worker.execution | sync: synchronous execution (default)
async: asynchronous execution
|
| hibernate.search.<indexName>.worker.thread_pool.size | The backend can apply updates from the same transaction context (or batch) in parallel, using a threadpool. The default value is 1. You can experiment with larger values if you have many operations per transaction. |
| hibernate.search.<indexName>.worker.buffer_queue.max | Defines the maximal number of work queue if the thread poll is starved. Useful only for asynchronous execution. Default to infinite. If the limit is reached, the work is done by the main thread. |
Table 5.4. Backend configuration
| Property | Description |
| hibernate.search.<indexName>.worker.backend | lucene: The default backend which runs index updates in the same VM. Also used when the property is undefined or empty.
jms: JMS backend. Index updates are send to a JMS queue to be processed by an indexing master. See Table 5.5, “JMS backend configuration” for additional configuration options and Section 5.4.1, “JMS Master/Slave Back End” for a more detailed description of this setup.
blackhole: Mainly a test/developer setting which ignores all indexing work
You can also specify the fully qualified name of a class implementing
BackendQueueProcessor. This way you can implement your own communication layer. The implementation is responsible for returning a Runnable instance which on execution will process the index work.
|
Table 5.5. JMS backend configuration
| Property | Description |
| hibernate.search.<indexName>.worker.jndi.* | Defines the JNDI properties to initiate the InitialContext (if needed). JNDI is only used by the JMS back end. |
| hibernate.search.<indexName>.worker.jms.connection_factory | Mandatory for the JMS back end. Defines the JNDI name to lookup the JMS connection factory from (/ConnectionFactory by default in Red Hat JBoss Enterprise Application Platform) |
| hibernate.search.<indexName>.worker.jms.queue | Mandatory for the JMS back end. Defines the JNDI name to lookup the JMS queue from. The queue will be used to post work messages. |
Warning
Worker or BackendQueueProcessor implementation.
5.4.1. JMS Master/Slave Back End
5.4.2. Slave Nodes
Example 5.3. JMS Slave configuration
### slave configuration ## DirectoryProvider # (remote) master location hibernate.search.default.sourceBase = /mnt/mastervolume/lucenedirs/mastercopy # local copy location hibernate.search.default.indexBase = /Users/prod/lucenedirs # refresh every half hour hibernate.search.default.refresh = 1800 # appropriate directory provider hibernate.search.default.directory_provider = filesystem-slave ## Backend configuration hibernate.search.default.worker.backend = jms hibernate.search.default.worker.jms.connection_factory = /ConnectionFactory hibernate.search.default.worker.jms.queue = queue/hibernatesearch #optional jndi configuration (check your JMS provider for more information) ## Optional asynchronous execution strategy # hibernate.search.default.worker.execution = async # hibernate.search.default.worker.thread_pool.size = 2 # hibernate.search.default.worker.buffer_queue.max = 50
Note
5.4.3. Master Node
Example 5.4. JMS Master configuration
### master configuration ## DirectoryProvider # (remote) master location where information is copied to hibernate.search.default.sourceBase = /mnt/mastervolume/lucenedirs/mastercopy # local master location hibernate.search.default.indexBase = /Users/prod/lucenedirs # refresh every half hour hibernate.search.default.refresh = 1800 # appropriate directory provider hibernate.search.default.directory_provider = filesystem-master ## Backend configuration #Backend is the default lucene one
Example 5.5. Message Driven Bean processing the indexing queue
@MessageDriven(activationConfig = { @ActivationConfigProperty(propertyName="destinationType", propertyValue="javax.jms.Queue"), @ActivationConfigProperty(propertyName="destination", propertyValue="queue/hibernatesearch"), @ActivationConfigProperty(propertyName="DLQMaxResent", propertyValue="1") } ) public class MDBSearchController extends AbstractJMSHibernateSearchController implements MessageListener { @PersistenceContext EntityManager em; //method retrieving the appropriate session protected Session getSession() { return (Session) em.getDelegate(); } //potentially close the session opened in #getSession(), not needed here protected void cleanSessionIfNeeded(Session session) } }
getSession() and cleanSessionIfNeeded(), see AbstractJMSHibernateSearchController's javadoc.
5.5. Tuning Lucene Indexing
5.5.1. Tuning Lucene Indexing Performance
IndexWriter such as mergeFactor, maxMergeDocs, and maxBufferedDocs. Specify these parameters either as default values applying for all indexes, on a per index basis, or even per shard.
IndexWriter settings which can be tuned for different use cases. These parameters are grouped by the indexwriter keyword:
hibernate.search.[default|<indexname>].indexwriter.<parameter_name>
indexwriter value in a specific shard configuration, Hibernate Search checks the index section, then at the default section.
Animal index:
max_merge_docs= 10merge_factor= 20ram_buffer_size= 64MBterm_index_interval= Lucene default
2.4. For more information about Lucene indexing performance, see the Lucene documentation.
Note
batch and transaction properties. This is no longer the case as the backend will always perform work using the same settings.
Table 5.6. List of indexing performance and behavior properties
| Property | Description | Default Value |
|---|---|---|
|
hibernate.search.[default|<indexname>].exclusive_index_use
|
Set to
true when no other process will need to write to the same index. This enables Hibernate Search to work in exclusive mode on the index and improve performance when writing changes to the index.
| true (improved performance, releases locks only at shutdown) |
|
hibernate.search.[default|<indexname>].max_queue_length
|
Each index has a separate "pipeline" which contains the updates to be applied to the index. When this queue is full adding more operations to the queue becomes a blocking operation. Configuring this setting doesn't make much sense unless the
worker.execution is configured as async.
| 1000 |
|
hibernate.search.[default|<indexname>].indexwriter.max_buffered_delete_terms
|
Determines the minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed. If there are documents buffered in memory at the time, they are merged and a new segment is created.
| Disabled (flushes by RAM usage) |
|
hibernate.search.[default|<indexname>].indexwriter.max_buffered_docs
|
Controls the amount of documents buffered in memory during indexing. The bigger the more RAM is consumed.
| Disabled (flushes by RAM usage) |
|
hibernate.search.[default|<indexname>].indexwriter.max_merge_docs
|
Defines the largest number of documents allowed in a segment. Smaller values perform better on frequently changing indexes, larger values provide better search performance if the index does not change often.
| Unlimited (Integer.MAX_VALUE) |
|
hibernate.search.[default|<indexname>].indexwriter.merge_factor
|
Controls segment merge frequency and size.
Determines how often segment indexes are merged when insertion occurs. With smaller values, less RAM is used while indexing, and searches on unoptimized indexes are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indexes are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indexes that are interactively maintained. The value must not be lower than 2.
| 10 |
|
hibernate.search.[default|<indexname>].indexwriter.merge_min_size
|
Controls segment merge frequency and size.
Segments smaller than this size (in MB) are always considered for the next segment merge operation.
Setting this too large might result in expensive merge operations, even tough they are less frequent.
See also
org.apache.lucene.index.LogDocMergePolicy. minMergeSize.
| 0 MB (actually ~1K) |
|
hibernate.search.[default|<indexname>].indexwriter.merge_max_size
|
Controls segment merge frequency and size.
Segments larger than this size (in MB) are never merged in bigger segments.
This helps reduce memory requirements and avoids some merging operations at the cost of optimal search speed. When optimizing an index this value is ignored.
See also
org.apache.lucene.index.LogDocMergePolicy. maxMergeSize.
| Unlimited |
|
hibernate.search.[default|<indexname>].indexwriter.merge_max_optimize_size
|
Controls segment merge frequency and size.
Segments larger than this size (in MB) are not merged in bigger segments even when optimizing the index (see
merge_max_size setting as well).
Applied to
org.apache.lucene.index.LogDocMergePolicy. maxMergeSizeForOptimize.
| Unlimited |
|
hibernate.search.[default|<indexname>].indexwriter.merge_calibrate_by_deletes
|
Controls segment merge frequency and size.
Set to
false to not consider deleted documents when estimating the merge policy.
Applied to
org.apache.lucene.index.LogMergePolicy. calibrateSizeByDeletes.
| true |
|
hibernate.search.[default|<indexname>].indexwriter.ram_buffer_size
|
Controls the amount of RAM in MB dedicated to document buffers. When used together max_buffered_docs a flush occurs for whichever event happens first.
Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.
| 16 MB |
|
hibernate.search.[default|<indexname>].indexwriter.term_index_interval
|
Expert: Set the interval between indexed terms.
Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms. See Lucene documentation for more details.
| 128 |
|
hibernate.search.[default|<indexname>].indexwriter.use_compound_file
| The advantage of using the compound file format is that less file descriptors are used. The disadvantage is that indexing takes more time and temporary disk space. You can set this parameter to false in an attempt to improve the indexing time, but you could run out of file descriptors if mergeFactor is also large.
Boolean parameter, use "
true" or "false". The default value for this option is true.
| true |
|
hibernate.search.enable_dirty_check
|
Not all entity changes require a Lucene index update. If all of the updated entity properties (dirty properties) are not indexed, Hibernate Search skips the re-indexing process.
Disable this option if you use custom
FieldBridges which need to be invoked at each update event (even though the property for which the field bridge is configured has not changed).
This optimization will not be applied on classes using a
@ClassBridge or a @DynamicBoost.
Boolean parameter, use "
true" or "false". The default value for this option is true.
| true |
Warning
blackhole backend is not meant to be used in production, only as a tool to identify indexing bottlenecks.
5.5.2. The Lucene IndexWriter
IndexWriter settings which can be tuned for different use cases. These parameters are grouped by the indexwriter keyword:
default.<indexname>.indexwriter.<parameter_name>
indexwriter in a shard configuration, Hibernate Search looks at the index section and then at the default section.
5.5.3. Performance Option Configuration
Animal index:
Example 5.6. Example performance option configuration
default.Animals.2.indexwriter.max_merge_docs = 10 default.Animals.2.indexwriter.merge_factor = 20 default.Animals.2.indexwriter.term_index_interval = default default.indexwriter.max_merge_docs = 100 default.indexwriter.ram_buffer_size = 64
max_merge_docs= 10merge_factor= 20ram_buffer_size= 64MBterm_index_interval= Lucene default
2.4. For more information about Lucene indexing performance, see the Lucene documentation.
Note
Table 5.7. List of indexing performance and behavior properties
| Property | Description | Default Value |
|---|---|---|
|
default.<indexname>.exclusive_index_use
|
Set to
true when no other process will need to write to the same index. This enables Hibernate Search to work in exclusive mode on the index and improve performance when writing changes to the index.
| true (improved performance, releases locks only at shutdown) |
|
default.<indexname>.max_queue_length
|
Each index has a separate "pipeline" which contains the updates to be applied to the index. When this queue is full adding more operations to the queue becomes a blocking operation. Configuring this setting doesn't make much sense unless the
worker.execution is configured as async.
| 1000 |
|
default.<indexname>.indexwriter.max_buffered_delete_terms
|
Determines the minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed. If there are documents buffered in memory at the time, they are merged and a new segment is created.
| Disabled (flushes by RAM usage) |
|
default.<indexname>.indexwriter.max_buffered_docs
|
Controls the amount of documents buffered in memory during indexing. The bigger the more RAM is consumed.
| Disabled (flushes by RAM usage) |
|
default.<indexname>.indexwriter.max_merge_docs
|
Defines the largest number of documents allowed in a segment. Smaller values perform better on frequently changing indexes, larger values provide better search performance if the index does not change often.
| Unlimited (Integer.MAX_VALUE) |
|
default.<indexname>.indexwriter.merge_factor
|
Controls segment merge frequency and size.
Determines how often segment indexes are merged when insertion occurs. With smaller values, less RAM is used while indexing, and searches on unoptimized indexes are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indexes are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indexes that are interactively maintained. The value must not be lower than 2.
| 10 |
|
default.<indexname>.indexwriter.merge_min_size
|
Controls segment merge frequency and size.
Segments smaller than this size (in MB) are always considered for the next segment merge operation.
Setting this too large might result in expensive merge operations, even tough they are less frequent.
See also
org.apache.lucene.index.LogDocMergePolicy. minMergeSize.
| 0 MB (actually ~1K) |
|
default.<indexname>.indexwriter.merge_max_size
|
Controls segment merge frequency and size.
Segments larger than this size (in MB) are never merged in bigger segments.
This helps reduce memory requirements and avoids some merging operations at the cost of optimal search speed. When optimizing an index this value is ignored.
See also
org.apache.lucene.index.LogDocMergePolicy. maxMergeSize.
| Unlimited |
|
default.<indexname>.indexwriter.merge_max_optimize_size
|
Controls segment merge frequency and size.
Segments larger than this size (in MB) are not merged in bigger segments even when optimizing the index (see
merge_max_size setting as well).
Applied to
org.apache.lucene.index.LogDocMergePolicy. maxMergeSizeForOptimize.
| Unlimited |
|
default.<indexname>.indexwriter.merge_calibrate_by_deletes
|
Controls segment merge frequency and size.
Set to
false to not consider deleted documents when estimating the merge policy.
Applied to
org.apache.lucene.index.LogMergePolicy. calibrateSizeByDeletes.
| true |
|
default.<indexname>.indexwriter.ram_buffer_size
|
Controls the amount of RAM in MB dedicated to document buffers. When used together max_buffered_docs a flush occurs for whichever event happens first.
Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.
| 16 MB |
|
default.<indexname>.indexwriter.term_index_interval
|
Expert: Set the interval between indexed terms.
Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms. See Lucene documentation for more details.
| 128 |
|
default.<indexname>.indexwriter.use_compound_file
| The advantage of using the compound file format is that less file descriptors are used. The disadvantage is that indexing takes more time and temporary disk space. You can set this parameter to false in an attempt to improve the indexing time, but you could run out of file descriptors if mergeFactor is also large.
Boolean parameter, use "
true" or "false". The default value for this option is true.
| true |
|
default.enable_dirty_check
|
Not all entity changes require a Lucene index update. If all of the updated entity properties (dirty properties) are not indexed, Hibernate Search skips the re-indexing process.
Disable this option if you use custom
FieldBridges which need to be invoked at each update event (even though the property for which the field bridge is configured has not changed).
This optimization will not be applied on classes using a
@ClassBridge or a @DynamicBoost.
Boolean parameter, use "
true" or "false". The default value for this option is true.
| true |
5.5.4. Tuning the Indexing Speed
default.exclusive_index_use=true for improved index writing efficiency.
blackhole as worker back end and start your indexing routines. This back end does not disable Hibernate Search: it generates the required change sets to the index, but discards them instead of flushing them to the index. In contrast to setting the hibernate.search.indexing_strategy to manual, using blackhole will possibly load more data from the database because associated entities are re-indexed as well.
hibernate.search.[default|<indexname>].worker.backend blackhole
Warning
blackhole back end is not meant to be used in production, only as a tool to identify indexing bottlenecks.
5.5.5. Control Segment Size
merge_max_sizemerge_max_optimize_sizemerge_calibrate_by_deletes
//to be fairly confident no files grow above 15MB, use: hibernate.search.default.indexwriter.ram_buffer_size = 10 hibernate.search.default.indexwriter.merge_max_optimize_size = 7 hibernate.search.default.indexwriter.merge_max_size = 7
max_size for merge operations to less than half of the hard limit segment size, as merging segments combines two segments into one larger segment.
ram_buffer_size. This threshold is checked as an estimate.
5.6. LockFactory Configuration
LockingFactory for each index managed by Hibernate Search.
IndexBase configuration option must be specified to point to a filesystem location in which to store the lock marker files.
hibernate.search.<index>.locking_strategy option to one the following options:
simplenativesinglenone
Table 5.8. List of available LockFactory implementations
| name | Class | Description |
|---|---|---|
| simple | org.apache.lucene.store.SimpleFSLockFactory |
Safe implementation based on Java's File API, it marks the usage of the index by creating a marker file.
If for some reason you had to kill your application, you will need to remove this file before restarting it.
|
| native | org.apache.lucene.store.NativeFSLockFactory |
As does
simple this also marks the usage of the index by creating a marker file, but this one is using native OS file locks so that even if the JVM is terminated the locks will be cleaned up.
This implementation has known problems on NFS, avoid it on network shares.
native is the default implementation for the filesystem, filesystem-master and filesystem-slave directory providers.
|
| single | org.apache.lucene.store.SingleInstanceLockFactory |
This LockFactory doesn't use a file marker but is a Java object lock held in memory; therefore it's possible to use it only when you are sure the index is not going to be shared by any other process.
This is the default implementation for the
ram directory provider.
|
| none | org.apache.lucene.store.NoLockFactory |
Changes to this index are not coordinated by a lock.
|
hibernate.search.default.locking_strategy = simple hibernate.search.Animals.locking_strategy = native hibernate.search.Books.locking_strategy = org.custom.components.MyLockingFactory
5.7. Exception Handling Configuration
hibernate.search.error_handler = log
ErrorHandler interface, which provides the handle(ErrorContext context) method. ErrorContext provides a reference to the primary LuceneWork instance, the underlying exception and any subsequent LuceneWork instances that could not be processed due to the primary exception.
public interface ErrorContext {
List<LuceneWork> getFailingOperations();
LuceneWork getOperationAtFault();
Throwable getThrowable();
boolean hasErrors();
}
ErrorHandler implementation in the configuration properties:
hibernate.search.error_handler = CustomerErrorHandler
5.8. Index Format Compatibility
Warning
hibernate.search.lucene_version configuration property. This property instructs Analyzers and other Lucene classes to conform to their behaviour as defined in an older version of Lucene. See also org.apache.lucene.util.Version contained in the lucene-core.jar. If the option is not specified, Hibernate Search instructs Lucene to use the version default. It is recommended that the version used is explicitly defined in the configuration to prevent automatic changes when an upgrade occurs. After an upgrade, the configuration values can be updated explicitly if required.
Example 5.7. Force Analyzers to be compatible with a Lucene 3.0 created index
hibernate.search.lucene_version = LUCENE_30
SearchFactory is global and affects all Lucene APIs that contain the relevant parameter. If Lucene is used and Hibernate Search is bypassed, apply the same value to it for consistent results.
5.9. Hibernate Search as Red Hat JBoss EAP Module
jboss-wfk-2.7.0-maven-repository.zip file, and can be downloaded from Red Hat Customer Portal.
org/hibernate/hibernate-search-modules/4.4.4.Final-redhat-wfk-1/hibernate-search-modules-4.4.4.Final-redhat-wfk-1-jbossas-74-dist.zip
modules/ directory in the target JBoss Enterprise Application Platform. Modules for Hibernate Search, Apache Lucene, and some useful Solr libraries will be added. The Hibernate Search modules are:
- org.hibernate.search.orm:main for users of Hibernate Search with Hibernate; this transitively includes Hibernate ORM.
- org.hibernate.search.engine:main for projects depending on the internal indexing engine that do not require other dependencies to Hibernate.
- Using MANIFEST.MF file
- Add the following entry to the
MANIFEST.MFfile in the project archive:Dependencies: org.hibernate.search.orm services
- Using jboss-deployment-structure.xml file
- Add
WEB-INF/jboss-deployment-structure.xmlfile in the project archive with the following content:<jboss-deployment-structure> <deployment> <dependencies> <module name="org.hibernate.search.orm" services="export" /> </dependencies> </deployment> </jboss-deployment-structure>
Chapter 6. Mapping Entities to the Index Structure
6.1. Mapping an Entity
6.1.1. Basic Mapping
- @Indexed
- @Field
- @NumericField
- @Id
6.1.1.1. @Indexed
@Indexed (all entities not annotated with @Indexed will be ignored by the indexing process):
index attribute of the @Indexed annotation to change the default name of the index. For more information see Section 5.3, “Directory Configuration”.
6.1.1.2. @Field
@Field does declare a property as indexed and allows to configure several aspects of the indexing process by setting one or more of the following attributes:
name: describe under which name, the property should be stored in the Lucene Document. The default value is the property name (following the JavaBeans convention)store: describe whether or not the property is stored in the Lucene index. You can store the valueStore.YES(consuming more space in the index but allowing projection, see Section 7.1.10.5, “Projection”), store it in a compressed wayStore.COMPRESS(this does consume more CPU), or avoid any storageStore.NO(this is the default value). When a property is stored, you can retrieve its original value from the Lucene Document. This is not related to whether the element is indexed or not.index: describe whether the property is indexed or not. The different values areIndex.NO(no indexing, ie cannot be found by a query),Index.YES(the element gets indexed and is searchable). The default value isIndex.YES.Index.NOcan be useful for cases where a property is not required to be searchable, but should be available for projection.Note
Index.NOin combination withAnalyze.YESorNorms.YESis not useful, sinceanalyzeandnormsrequire the property to be indexedanalyze: determines whether the property is analyzed (Analyze.YES) or not (Analyze.NO). The default value isAnalyze.YES.Note
Whether or not you want to analyze a property depends on whether you wish to search the element as is, or by the words it contains. It make sense to analyze a text field, but probably not a date field.Note
Fields used for sorting must not be analyzed.norms: describes whether index time boosting information should be stored (Norms.YES) or not (Norms.NO). Not storing it can save a considerable amount of memory, but there won't be any index time boosting information available. The default value isNorms.YES.termVector: describes collections of term-frequency pairs. This attribute enables the storing of the term vectors within the documents during indexing. The default value isTermVector.NO.The different values of this attribute are:Value Definition TermVector.YES Store the term vectors of each document. This produces two synchronized arrays, one contains document terms and the other contains the term's frequency. TermVector.NO Do not store term vectors. TermVector.WITH_OFFSETS Store the term vector and token offset information. This is the same as TermVector.YES plus it contains the starting and ending offset position information for the terms. TermVector.WITH_POSITIONS Store the term vector and token position information. This is the same as TermVector.YES plus it contains the ordinal positions of each occurrence of a term in a document. TermVector.WITH_POSITION_OFFSETS Store the term vector, token position and offset information. This is a combination of the YES, WITH_OFFSETS and WITH_POSITIONS. indexNullAs: Per default null values are ignored and not indexed. However, usingindexNullAsyou can specify a string which will be inserted as token for thenullvalue. Per default this value is set toField.DO_NOT_INDEX_NULLindicating thatnullvalues should not be indexed. You can set this value toField.DEFAULT_NULL_TOKENto indicate that a defaultnulltoken should be used. This defaultnulltoken can be specified in the configuration usinghibernate.search.default_null_token. If this property is not set and you specifyField.DEFAULT_NULL_TOKENthe string "_null_" will be used as default.Note
When theindexNullAsparameter is used it is important to use the same token in the search query (see Chapter 7, Querying) to search fornullvalues. It is also advisable to use this feature only with un-analyzed fields ().analyze=Analyze.NOWarning
When implementing a customFieldBridgeorTwoWayFieldBridgeit is up to the developer to handle the indexing of null values (see JavaDocs ofLuceneOptions.indexNullAs()).
6.1.1.3. @NumericField
@Field called @NumericField that can be specified in the same scope as @Field or @DocumentId. It can be specified for Integer, Long, Float, and Double properties. At index time the value will be indexed using a Trie structure. When a property is indexed as numeric field, it enables efficient range query and sorting, orders of magnitude faster than doing the same query on standard @Field properties. The @NumericField annotation accept the following parameters:
| Value | Definition |
|---|---|
| forField | (Optional) Specify the name of the related @Field that will be indexed as numeric. It's only mandatory when the property contains more than a @Field declaration |
| precisionStep | (Optional) Change the way that the Trie structure is stored in the index. Smaller precisionSteps lead to more disk space usage and faster range and sort queries. Larger values lead to less space used and range query performance more close to the range query in normal @Fields. Default value is 4. |
@NumericField supports only Double, Long, Integer and Float. It is not possible to take any advantage from a similar functionality in Lucene for the other numeric types, so remaining types should use the string encoding via the default or custom TwoWayFieldBridge.
NumericFieldBridge assuming you can deal with the approximation during type transformation:
Example 6.2. Defining a custom NumericFieldBridge
public class BigDecimalNumericFieldBridge extends NumericFieldBridge { private static final BigDecimal storeFactor = BigDecimal.valueOf(100); @Override public void set(String name, Object value, Document document, LuceneOptions luceneOptions) { if ( value != null ) { BigDecimal decimalValue = (BigDecimal) value; Long indexedValue = Long.valueOf( decimalValue.multiply( storeFactor ).longValue() ); luceneOptions.addNumericFieldToDocument( name, indexedValue, document ); } } @Override public Object get(String name, Document document) { String fromLucene = document.get( name ); BigDecimal storedBigDecimal = new BigDecimal( fromLucene ); return storedBigDecimal.divide( storeFactor ); } }
6.1.1.4. @Id
@DocumentId annotation. If you are using JPA and you have specified @Id you can omit @DocumentId. The chosen entity id will also be used as document id.
Example 6.3. Specifying indexed properties
@Entity @Indexed public class Essay { ... @Id @DocumentId public Long getId() { return id; } @Field(name="Abstract", store=Store.YES) public String getSummary() { return summary; } @Lob @Field public String getText() { return text; } @Field @NumericField( precisionStep = 6) public float getGrade() { return grade; } }
id , Abstract, text and grade . Note that by default the field name is decapitalized, following the JavaBean specification. The grade field is annotated as Numeric with a slightly larger precision step than the default.
6.1.2. Mapping Properties Multiple Times
Example 6.4. Using @Fields to map a property multiple times
@Entity @Indexed(index = "Book" ) public class Book { @Fields( { @Field, @Field(name = "summary_forSort", analyze = Analyze.NO, store = Store.YES) } ) public String getSummary() { return summary; } ... }
summary is indexed twice, once as summary in a tokenized way, and once as summary_forSort in an untokenized way. @Field supports 2 attributes useful when @Fields is used:
6.1.3. Embedded and Associated Objects
address.city:Atlanta). The place fields will be indexed in the Place index. The Place index documents will also contain the fields address.id, address.street, and address.city which you will be able to query.
Example 6.5. Indexing associations
@Entity @Indexed public class Place { @Id @GeneratedValue @DocumentId private Long id; @Field private String name; @OneToOne( cascade = { CascadeType.PERSIST, CascadeType.REMOVE } ) @IndexedEmbedded private Address address; .... } @Entity public class Address { @Id @GeneratedValue private Long id; @Field private String street; @Field private String city; @ContainedIn @OneToMany(mappedBy="address") private Set<Place> places; ... }
@IndexedEmbedded technique, Hibernate Search needs to be aware of any change in the Place object and any change in the Address object to keep the index up to date. To make sure the Place Lucene document is updated when it's Address changes, you need to mark the other side of the bidirectional relationship with @ContainedIn.
Note
@ContainedIn is useful on both associations pointing to entities and on embedded (collection of) objects.
@IndexedEmbedded and @ContainedIn”.
Example 6.6. Nested usage of @IndexedEmbedded and @ContainedIn
@Entity @Indexed public class Place { @Id @GeneratedValue @DocumentId private Long id; @Field private String name; @OneToOne( cascade = { CascadeType.PERSIST, CascadeType.REMOVE } ) @IndexedEmbedded private Address address; .... } @Entity public class Address { @Id @GeneratedValue private Long id; @Field private String street; @Field private String city; @IndexedEmbedded(depth = 1, prefix = "ownedBy_") private Owner ownedBy; @ContainedIn @OneToMany(mappedBy="address") private Set<Place> places; ... } @Embeddable public class Owner { @Field private String name; ... }
@*ToMany, @*ToOne and @Embedded attribute can be annotated with @IndexedEmbedded. The attributes of the associated class will then be added to the main entity index. In Example 6.6, “Nested usage of @IndexedEmbedded and @ContainedIn” the index will contain the following fields:
- id
- name
- address.street
- address.city
- address.ownedBy_name
propertyName., following the traditional object navigation convention. You can override it using the prefix attribute as it is shown on the ownedBy property.
Note
depth property is necessary when the object graph contains a cyclic dependency of classes (not instances). For example, if Owner points to Place. Hibernate Search will stop including Indexed embedded attributes after reaching the expected depth (or the object graph boundaries are reached). A class having a self reference is an example of cyclic dependency. In our example, because depth is set to 1, any @IndexedEmbedded attribute in Owner (if any) will be ignored.
@IndexedEmbedded for object associations allows you to express queries (using Lucene's query syntax) such as:
- Return places where name contains JBoss and where address city is Atlanta. In Lucene query this would be
+name:jboss +address.city:atlanta
- Return places where name contains JBoss and where owner's name contain Joe. In Lucene query this would be
+name:jboss +address.orderBy_name:joe
Note
@Indexed
@ContainedIn (as seen in the previous example). If not, Hibernate Search has no way to update the root index when the associated entity is updated (in our example, a Place index document has to be updated when the associated Address instance is updated).
@IndexedEmbedded is not the object type targeted by Hibernate and Hibernate Search. This is especially the case when interfaces are used in lieu of their implementation. For this reason you can override the object type targeted by Hibernate Search using the targetElement parameter.
Example 6.7. Using the targetElement property of @IndexedEmbedded
@Entity @Indexed public class Address { @Id @GeneratedValue @DocumentId private Long id; @Field private String street; @IndexedEmbedded(depth = 1, prefix = "ownedBy_", targetElement = Owner.class) @Target(Owner.class) private Person ownedBy; ... } @Embeddable public class Owner implements Person { ... }
6.1.4. Limiting Object Embedding to Specific Paths
@IndexedEmbedded annotation provides also an attribute includePaths which can be used as an alternative to depth, or be combined with it.
depth all indexed fields of the embedded type will be added recursively at the same depth; this makes it harder to pick only a specific path without adding all other fields as well, which might not be needed.
includePaths property of @IndexedEmbedded”
Example 6.8. Using the includePaths property of @IndexedEmbedded
@Entity @Indexed public class Person { @Id public int getId() { return id; } @Field public String getName() { return name; } @Field public String getSurname() { return surname; } @OneToMany @IndexedEmbedded(includePaths = { "name" }) public Set<Person> getParents() { return parents; } @ContainedIn @ManyToOne public Human getChild() { return child; } ...//other fields omitted
includePaths property of @IndexedEmbedded”, you would be able to search on a Person by name and/or surname, and/or the name of the parent. It will not index the surname of the parent, so searching on parent's surnames will not be possible but speeds up indexing, saves space and improve overall performance.
@IndexedEmbeddedincludePaths will include the specified paths in addition to what you would index normally specifying a limited value for depth. When using includePaths, and leaving depth undefined, behavior is equivalent to setting depth=0: only the included paths are indexed.
Example 6.9. Using the includePaths property of @IndexedEmbedded
@Entity @Indexed public class Human { @Id public int getId() { return id; } @Field public String getName() { return name; } @Field public String getSurname() { return surname; } @OneToMany @IndexedEmbedded(depth = 2, includePaths = { "parents.parents.name" }) public Set<Human> getParents() { return parents; } @ContainedIn @ManyToOne public Human getChild() { return child; } ...//other fields omitted
includePaths property of @IndexedEmbedded”, every human will have it's name and surname attributes indexed. The name and surname of parents will be indexed too, recursively up to second line because of the depth attribute. It will be possible to search by name or surname, of the person directly, his parents or of his grand parents. Beyond the second level, we will in addition index one more level but only the name, not the surname.
id- as primary key_hibernate_class- stores entity typename- as direct fieldsurname- as direct fieldparents.name- as embedded field at depth 1parents.surname- as embedded field at depth 1parents.parents.name- as embedded field at depth 2parents.parents.surname- as embedded field at depth 2parents.parents.parents.name- as additional path as specified byincludePaths. The firstparents.is inferred from the field name, the remaining path is the attribute ofincludePaths
6.2. Boosting
6.2.1. Static Index Time Boosting
@Boost annotation. You can use this annotation within @Field or specify it directly on method or class level.
Example 6.10. Different ways of using @Boost
@Entity @Indexed @Boost(1.7f) public class Essay { ... @Id @DocumentId public Long getId() { return id; } @Field(name="Abstract", store=Store.YES, boost=@Boost(2f)) @Boost(1.5f) public String getSummary() { return summary; } @Lob @Field(boost=@Boost(1.2f)) public String getText() { return text; } @Field public String getISBN() { return isbn; } }
Essay's probability to reach the top of the search list will be multiplied by 1.7. The summary field will be 3.0 (2 * 1.5, because @Field.boost and @Boost on a property are cumulative) more important than the isbn field. The text field will be 1.2 times more important than the isbn field. Note that this explanation is wrong in strictest terms, but it is simple and close enough to reality for all practical purposes.
6.2.2. Dynamic Index Time Boosting
@Boostannotation used in Section 6.2.1, “Static Index Time Boosting” defines a static boost factor which is independent of the state of the indexed entity at runtime. However, there are usecases in which the boost factor may depend on the actual state of the entity. In this case you can use the @DynamicBoostannotation together with an accompanying custom BoostStrategy.
Example 6.11. Dynamic boost example
public enum PersonType { NORMAL, VIP } @Entity @Indexed @DynamicBoost(impl = VIPBoostStrategy.class) public class Person { private PersonType type; // .... } public class VIPBoostStrategy implements BoostStrategy { public float defineBoost(Object value) { Person person = ( Person ) value; if ( person.getType().equals( PersonType.VIP ) ) { return 2.0f; } else { return 1.0f; } } }
VIPBoostStrategy as implementation of the BoostStrategy interface to be used at indexing time. You can place the @DynamicBoost either at class or field level. Depending on the placement of the annotation either the whole entity is passed to the defineBoost method or just the annotated field/property value. It's up to you to cast the passed object to the correct type. In the example all indexed values of a VIP person would be double as important as the values of a normal person.
Note
BoostStrategy implementation must define a public no-arg constructor.
@Boost and @DynamicBoost annotations in your entity. All defined boost factors are cumulative.
6.3. Analysis
Analysis is the process of converting text into single terms (words) and can be considered as one of the key features of a full-text search engine. Lucene uses the concept of Analyzers to control this process. In the following section we cover the multiple ways Hibernate Search offers to configure the analyzers.
6.3.1. Default Analyzer and Analyzer by Class
hibernate.search.analyzer property. The default value for this property is org.apache.lucene.analysis.standard.StandardAnalyzer.
Example 6.12. Different ways of using @Analyzer
@Entity @Indexed @Analyzer(impl = EntityAnalyzer.class) public class MyEntity { @Id @GeneratedValue @DocumentId private Integer id; @Field private String name; @Field @Analyzer(impl = PropertyAnalyzer.class) private String summary; @Field( analyzer = @Analyzer(impl = FieldAnalyzer.class ) private String body; ... }
EntityAnalyzer is used to index all tokenized properties (example, name), except summary and body which are indexed with PropertyAnalyzer and FieldAnalyzer respectively.
Warning
6.3.2. Named Analyzers
@Analyzer declarations and is composed of:
- a name: the unique string used to refer to the definition
- a list of char filters: each char filter is responsible to pre-process input characters before the tokenization. Char filters can add, change, or remove characters; one common usage is for characters normalization
- a tokenizer: responsible for tokenizing the input stream into individual words
- a list of filters: each filter is responsible to remove, modify, or sometimes even add words into the stream provided by the tokenizer
Tokenizer starts the tokenizing process by turning the character input into tokens which are then further processed by the TokenFilters. Hibernate Search supports this infrastructure by utilizing the Solr analyzer framework.
Note
lucene-snowball jar and for the PhoneticFilterFactory you need the commons-codec jar. Your distribution of Hibernate Search provides these dependencies in its lib/optional directory.
<dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search-analyzers</artifactId> <version>4.4.4.Final-redhat-wfk-1</version> <dependency>
@AnalyzerDef and the Solr framework”. First a char filter is defined by its factory. In our example, a mapping char filter is used, and will replace characters in the input based on the rules specified in the mapping file. Next a tokenizer is defined. This example uses the standard tokenizer. Last but not least, a list of filters is defined by their factories. In our example, the StopFilter filter is built reading the dedicated words property file. The filter is also expected to ignore case.
Example 6.13. @AnalyzerDef and the Solr framework
@AnalyzerDef(name="customanalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "org/hibernate/search/test/analyzer/solr/mapping-chars.properties") }) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class), @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = StopFilterFactory.class, params = { @Parameter(name="words", value= "org/hibernate/search/test/analyzer/solr/stoplist.properties" ), @Parameter(name="ignoreCase", value="true") }) }) public class Team { ... }
Note
@AnalyzerDef annotation. Order matters!
resource_charset parameter.
Example 6.14. Use a specific charset to load the property file
@AnalyzerDef(name="customanalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "org/hibernate/search/test/analyzer/solr/mapping-chars.properties") }) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class), @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = StopFilterFactory.class, params = { @Parameter(name="words", value= "org/hibernate/search/test/analyzer/solr/stoplist.properties" ), @Parameter(name="resource_charset", value = "UTF-16BE"), @Parameter(name="ignoreCase", value="true") }) }) public class Team { ... }
@Analyzer declaration as seen in Example 6.15, “Referencing an analyzer by name”.
Example 6.15. Referencing an analyzer by name
@Entity @Indexed @AnalyzerDef(name="customanalyzer", ... ) public class Team { @Id @DocumentId @GeneratedValue private Integer id; @Field private String name; @Field private String location; @Field @Analyzer(definition = "customanalyzer") private String description; }
@AnalyzerDef are also available by their name in the SearchFactory which is quite useful wen building queries.
Analyzer analyzer = fullTextSession.getSearchFactory().getAnalyzer("customanalyzer");
6.3.3. Available Analyzers
Table 6.1. Example of available char filters
| Factory | Description | Parameters | Additional dependencies |
|---|---|---|---|
MappingCharFilterFactory | Replaces one or more characters with one or more characters, based on mappings specified in the resource file | mapping: points to a resource file containing the mappings using the format:
| none |
HTMLStripCharFilterFactory | Remove HTML standard tags, keeping the text | none | none |
Table 6.2. Example of available tokenizers
| Factory | Description | Parameters | Additional dependencies |
|---|---|---|---|
StandardTokenizerFactory | Use the Lucene StandardTokenizer | none | none |
HTMLStripCharFilterFactory | Remove HTML tags, keep the text and pass it to a StandardTokenizer. | none | solr-core |
PatternTokenizerFactory | Breaks text at the specified regular expression pattern. | pattern: the regular expression to use for tokenizing
group: says which pattern group to extract into tokens
| solr-core |
Table 6.3. Examples of available filters
| Factory | Description | Parameters | Additional dependencies |
|---|---|---|---|
StandardFilterFactory | Remove dots from acronyms and 's from words | none | solr-core |
LowerCaseFilterFactory | Lowercases all words | none | solr-core |
StopFilterFactory | Remove words (tokens) matching a list of stop words | words: points to a resource file containing the stop words
ignoreCase: true if
case should be ignored when comparing stop words, false otherwise
| solr-core |
SnowballPorterFilterFactory | Reduces a word to it's root in a given language. (example: protect, protects, protection share the same root). Using such a filter allows searches matching related words. | language: Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, Swedish and a few more | solr-core |
ISOLatin1AccentFilterFactory | Remove accents for languages like French | none | solr-core |
PhoneticFilterFactory | Inserts phonetically similar tokens into the token stream | encoder: One of DoubleMetaphone, Metaphone, Soundex or RefinedSoundex
inject:
true will add tokens to the stream, false will replace the existing token
maxCodeLength: sets the maximum length of the code to be generated. Supported only for Metaphone and DoubleMetaphone encodings
| solr-core and commons-codec |
CollationKeyFilterFactory | Converts each token into its java.text.CollationKey, and then encodes the CollationKey with IndexableBinaryStringTools, to allow it to be stored as an index term. | custom, language, country, variant, strength, decompositionsee Lucene's CollationKeyFilter javadocs for more info | solr-core and commons-io |
org.apache.solr.analysis.TokenizerFactory and org.apache.solr.analysis.TokenFilterFactory in your IDE to see the implementations available.
6.3.4. Dynamic Analyzer Selection
BlogEntry class for example the analyzer could depend on the language property of the entry. Depending on this property the correct language specific stemmer should be chosen to index the actual text.
AnalyzerDiscriminator annotation. Example 6.16, “Usage of @AnalyzerDiscriminator” demonstrates the usage of this annotation.
Example 6.16. Usage of @AnalyzerDiscriminator
@Entity @Indexed @AnalyzerDefs({ @AnalyzerDef(name = "en", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = EnglishPorterFilterFactory.class ) }), @AnalyzerDef(name = "de", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = GermanStemFilterFactory.class) }) }) public class BlogEntry { @Id @GeneratedValue @DocumentId private Integer id; @Field @AnalyzerDiscriminator(impl = LanguageDiscriminator.class) private String language; @Field private String text; private Set<BlogEntry> references; // standard getter/setter ... }
public class LanguageDiscriminator implements Discriminator { public String getAnalyzerDefinitionName(Object value, Object entity, String field) { if ( value == null || !( entity instanceof Article ) ) { return null; } return (String) value; } }
@AnalyzerDiscriminator is that all analyzers which are going to be used dynamically are predefined via @AnalyzerDef definitions. If this is the case, one can place the @AnalyzerDiscriminator annotation either on the class or on a specific property of the entity for which to dynamically select an analyzer. Via the impl parameter of the AnalyzerDiscriminator you specify a concrete implementation of the Discriminator interface. It is up to you to provide an implementation for this interface. The only method you have to implement is getAnalyzerDefinitionName() which gets called for each field added to the Lucene document. The entity which is getting indexed is also passed to the interface method. The value parameter is only set if the AnalyzerDiscriminator is placed on property level instead of class level. In this case the value represents the current value of this property.
Discriminator interface has to return the name of an existing analyzer definition or null if the default analyzer should not be overridden. Example 6.16, “Usage of @AnalyzerDiscriminator” assumes that the language parameter is either 'de' or 'en' which matches the specified names in the @AnalyzerDefs.
6.3.5. Retrieving an Analyzer
Note
Example 6.17. Using the scoped analyzer when building a full-text query
org.apache.lucene.queryParser.QueryParser parser = new QueryParser( "title", fullTextSession.getSearchFactory().getAnalyzer( Song.class ) ); org.apache.lucene.search.Query luceneQuery = parser.parse( "title:sky Or title_stemmed:diamond" ); org.hibernate.Query fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery, Song.class ); List result = fullTextQuery.list(); //return a list of managed objects
title and a stemming analyzer is used in the field title_stemmed. By using the analyzer provided by the search factory, the query uses the appropriate analyzer depending on the field targeted.
Note
@AnalyzerDef by their definition name using searchFactory.getAnalyzer(String).
6.4. Bridges
@Field have to be converted to strings to be indexed. The reason we have not mentioned it so far is, that for most of your properties Hibernate Search does the translation job for you thanks to set of built-in bridges. However, in some cases you need a more fine grained control over the translation process.
6.4.1. Built-in Bridges
- null
- Per default
nullelements are not indexed. Lucene does not support null elements. However, in some situation it can be useful to insert a custom token representing thenullvalue. See Section 6.1.1.2, “@Field” for more information. - java.lang.String
- Strings are indexed as are
- short, Short, integer, Integer, long, Long, float, Float, double, Double, BigInteger, BigDecimal
- Numbers are converted into their string representation. Note that numbers cannot be compared by Lucene (that is, used in ranged queries) out of the box: they have to be padded
Note
Using a Range query has drawbacks, an alternative approach is to use a Filter query which will filter the result query to the appropriate range.Hibernate Search also supports the use of a custom StringBridge as described in Section 6.4.2, “Custom Bridges”. - java.util.Date
- Dates are stored as yyyyMMddHHmmssSSS in GMT time (200611072203012 for Nov 7th of 2006 4:03PM and 12ms EST). You shouldn't really bother with the internal format. What is important is that when using a TermRangeQuery, you should know that the dates have to be expressed in GMT time.Usually, storing the date up to the millisecond is not necessary.
@DateBridgedefines the appropriate resolution you are willing to store in the index (@DateBridge(resolution=Resolution.DAY)). The date pattern will then be truncated accordingly.@Entity @Indexed public class Meeting { @Field(analyze=Analyze.NO) @DateBridge(resolution=Resolution.MINUTE) private Date date; ...
Warning
A Date whose resolution is lower thanMILLISECONDcannot be a@DocumentIdImportant
The defaultDatebridge uses Lucene'sDateToolsto convert from and toString. This means that all dates are expressed in GMT time. If your requirements are to store dates in a fixed time zone you have to implement a custom date bridge. Make sure you understand the requirements of your applications regarding to date indexing and searching. - java.net.URI, java.net.URL
- URI and URL are converted to their string representation
- java.lang.Class
- Class are converted to their fully qualified class name. The thread context classloader is used when the class is rehydrated
6.4.2. Custom Bridges
6.4.2.1. StringBridge
Object to String bridge. To do so you need to implement the org.hibernate.search.bridge.StringBridge interface. All implementations have to be thread-safe as they are used concurrently.
Example 6.18. Custom StringBridge implementation
/** * Padding Integer bridge. * All numbers will be padded with 0 to match 5 digits * * @author Emmanuel Bernard */ public class PaddedIntegerBridge implements StringBridge { private int PADDING = 5; public String objectToString(Object object) { String rawInteger = ( (Integer) object ).toString(); if (rawInteger.length() > PADDING) throw new IllegalArgumentException( "Try to pad on a number too big" ); StringBuilder paddedInteger = new StringBuilder( ); for ( int padIndex = rawInteger.length() ; padIndex < PADDING ; padIndex++ ) { paddedInteger.append('0'); } return paddedInteger.append( rawInteger ).toString(); } }
StringBridge implementation”, any property or field can use this bridge thanks to the @FieldBridge annotation:
@FieldBridge(impl = PaddedIntegerBridge.class) private Integer length;
6.4.2.2. Parameterized Bridge
ParameterizedBridge interface and parameters are passed through the @FieldBridge annotation.
Example 6.19. Passing parameters to your bridge implementation
public class PaddedIntegerBridge implements StringBridge, ParameterizedBridge { public static String PADDING_PROPERTY = "padding"; private int padding = 5; //default public void setParameterValues(Map<String,String> parameters) { String padding = parameters.get( PADDING_PROPERTY ); if (padding != null) this.padding = Integer.parseInt( padding ); } public String objectToString(Object object) { String rawInteger = ( (Integer) object ).toString(); if (rawInteger.length() > padding) throw new IllegalArgumentException( "Try to pad on a number too big" ); StringBuilder paddedInteger = new StringBuilder( ); for ( int padIndex = rawInteger.length() ; padIndex < padding ; padIndex++ ) { paddedInteger.append('0'); } return paddedInteger.append( rawInteger ).toString(); } } //property @FieldBridge(impl = PaddedIntegerBridge.class, params = @Parameter(name="padding", value="10") ) private Integer length;
ParameterizedBridge interface can be implemented by StringBridge, TwoWayStringBridge, FieldBridge implementations.
6.4.2.3. Type Aware Bridge
- the return type of the property for field/getter-level bridges.
- the class type for class-level bridges.
AppliedOnTypeAwareBridge will get the type the bridge is applied on injected. Like parameters, the type injected needs no particular care with regard to thread-safety.
6.4.2.4. Two-Way Bridge
@DocumentId ), you need to use a slightly extended version of StringBridge named TwoWayStringBridge. Hibernate Search needs to read the string representation of the identifier and generate the object out of it. There is no difference in the way the @FieldBridge annotation is used.
Example 6.20. Implementing a TwoWayStringBridge usable for id properties
public class PaddedIntegerBridge implements TwoWayStringBridge, ParameterizedBridge { public static String PADDING_PROPERTY = "padding"; private int padding = 5; //default public void setParameterValues(Map parameters) { Object padding = parameters.get( PADDING_PROPERTY ); if (padding != null) this.padding = (Integer) padding; } public String objectToString(Object object) { String rawInteger = ( (Integer) object ).toString(); if (rawInteger.length() > padding) throw new IllegalArgumentException( "Try to pad on a number too big" ); StringBuilder paddedInteger = new StringBuilder( ); for ( int padIndex = rawInteger.length() ; padIndex < padding ; padIndex++ ) { paddedInteger.append('0'); } return paddedInteger.append( rawInteger ).toString(); } public Object stringToObject(String stringValue) { return new Integer(stringValue); } } //id property @DocumentId @FieldBridge(impl = PaddedIntegerBridge.class, params = @Parameter(name="padding", value="10") private Integer id;
Important
6.4.2.5. FieldBridge
FieldBridge. This interface gives you a property value and let you map it the way you want in your Lucene Document. You can for example store a property in two different document fields. The interface is very similar in its concept to the Hibernate UserTypes.
Example 6.21. Implementing the FieldBridge Interface
/** * Store the date in 3 different fields - year, month, day - to ease Range Query per * year, month or day (eg get all the elements of December for the last 5 years). * @author Emmanuel Bernard */ public class DateSplitBridge implements FieldBridge { private final static TimeZone GMT = TimeZone.getTimeZone("GMT"); public void set(String name, Object value, Document document, LuceneOptions luceneOptions) { Date date = (Date) value; Calendar cal = GregorianCalendar.getInstance(GMT); cal.setTime(date); int year = cal.get(Calendar.YEAR); int month = cal.get(Calendar.MONTH) + 1; int day = cal.get(Calendar.DAY_OF_MONTH); // set year luceneOptions.addFieldToDocument( name + ".year", String.valueOf( year ), document ); // set month and pad it if needed luceneOptions.addFieldToDocument( name + ".month", month < 10 ? "0" : "" + String.valueOf( month ), document ); // set day and pad it if needed luceneOptions.addFieldToDocument( name + ".day", day < 10 ? "0" : "" + String.valueOf( day ), document ); } } //property @FieldBridge(impl = DateSplitBridge.class) private Date date;
LuceneOptions helper; this helper will apply the options you have selected on @Field, like Store or TermVector, or apply the choosen @Boost value. It is especially useful to encapsulate the complexity of COMPRESS implementations. Even though it is recommended to delegate to LuceneOptions to add fields to the Document, nothing stops you from editing the Document directly and ignore the LuceneOptions in case you need to.
Note
LuceneOptions are created to shield your application from changes in Lucene API and simplify your code. Use them if you can, but if you need more flexibility you're not required to.
6.4.2.6. ClassBridge
@ClassBridge respectively @ClassBridges annotations can be defined at class level (as opposed to the property level). In this case the custom field bridge implementation receives the entity instance as the value parameter instead of a particular property. Though not shown in Example 6.22, “Implementing a class bridge”, @ClassBridge supports the termVector attribute discussed in section Section 6.1.1, “Basic Mapping”.
Example 6.22. Implementing a class bridge
@Entity @Indexed @ClassBridge(name="branchnetwork", store=Store.YES, impl = CatFieldsClassBridge.class, params = @Parameter( name="sepChar", value=" " ) ) public class Department { private int id; private String network; private String branchHead; private String branch; private Integer maxEmployees ... } public class CatFieldsClassBridge implements FieldBridge, ParameterizedBridge { private String sepChar; public void setParameterValues(Map parameters) { this.sepChar = (String) parameters.get( "sepChar" ); } public void set( String name, Object value, Document document, LuceneOptions luceneOptions) { // In this particular class the name of the new field was passed // from the name field of the ClassBridge Annotation. This is not // a requirement. It just works that way in this instance. The // actual name could be supplied by hard coding it below. Department dep = (Department) value; String fieldValue1 = dep.getBranch(); if ( fieldValue1 == null ) { fieldValue1 = ""; } String fieldValue2 = dep.getNetwork(); if ( fieldValue2 == null ) { fieldValue2 = ""; } String fieldValue = fieldValue1 + sepChar + fieldValue2; Field field = new Field( name, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector() ); field.setBoost( luceneOptions.getBoost() ); document.add( field ); } }
CatFieldsClassBridge is applied to the department instance, the field bridge then concatenate both branch and network and index the concatenation.
Chapter 7. Querying
- Creating a
FullTextSession - Creating a Lucene query using either Hibernate Search query DSL (recommended) or using the Lucene Query API
- Wrapping the Lucene query using an
org.hibernate.Query - Executing the search by calling for example
list()orscroll()
FullTextSession. This Search specific session wraps a regular org.hibernate.Session in order to provide query and indexing capabilities.
Example 7.1. Creating a FullTextSession
Session session = sessionFactory.openSession(); ... FullTextSession fullTextSession = Search.getFullTextSession(session);
FullTextSession to build a full-text query using either the Hibernate Search query DSL or the native Lucene query.
final QueryBuilder b = fullTextSession.getSearchFactory().buildQueryBuilder().forEntity( Myth.class ).get(); org.apache.lucene.search.Query luceneQuery = b.keyword() .onField("history").boostedTo(3) .matching("storm") .createQuery(); org.hibernate.Query fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery ); List result = fullTextQuery.list(); //return a list of managed objects
Example 7.2. Creating a Lucene query via the QueryParser
SearchFactory searchFactory = fullTextSession.getSearchFactory(); org.apache.lucene.queryParser.QueryParser parser = new QueryParser("title", searchFactory.getAnalyzer(Myth.class) ); try { org.apache.lucene.search.Query luceneQuery = parser.parse( "history:storm^3" ); } catch (ParseException e) { //handle parsing failure } org.hibernate.Query fullTextQuery = fullTextSession.createFullTextQuery(luceneQuery); List result = fullTextQuery.list(); //return a list of managed objects
org.hibernate.Query. This query remains in the same paradigm as other Hibernate query facilities, such as HQL (Hibernate Query Language), Native, and Criteria. Use methods such as list(), uniqueResult(), iterate() and scroll() with the query.
Example 7.3. Creating a Search query using the JPA API
EntityManager em = entityManagerFactory.createEntityManager(); FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search.getFullTextEntityManager(em); ... final QueryBuilder b = fullTextEntityManager.getSearchFactory() .buildQueryBuilder().forEntity( Myth.class ).get(); org.apache.lucene.search.Query luceneQuery = b.keyword() .onField("history").boostedTo(3) .matching("storm") .createQuery(); javax.persistence.Query fullTextQuery = fullTextEntityManager.createFullTextQuery( luceneQuery ); List result = fullTextQuery.getResultList(); //return a list of managed objects
Note
FullTextQuery is retrieved.
7.1. Building Queries
7.1.1. Building a Lucene Query Using the Lucene API
7.1.2. Building a Lucene Query
QueryBuilder for this task.
- Method names are in English. As a result, API operations can be read and understood as a series of English phrases and instructions.
- It uses IDE autocompletion which helps possible completions for the current input prefix and allows the user to choose the right option.
- It often uses the chaining method pattern.
- It is easy to use and read the API operations.
QueryBuilder knows what analyzer to use and what field bridge to apply. Several QueryBuilders (one for each entity type involved in the root of your query) can be created. The QueryBuilder is derived from the SearchFactory.
QueryBuilder mythQB = searchFactory.buildQueryBuilder().forEntity( Myth.class ).get();
QueryBuilder mythQB = searchFactory.buildQueryBuilder() .forEntity( Myth.class ) .overridesForField("history","stem_analyzer_definition") .get();
Query objects assembled using the Lucene programmatic API are used with the Hibernate Search DSL.
7.1.3. Keyword Queries
Query luceneQuery = mythQB.keyword().onField("history").matching("storm").createQuery();
Table 7.1. Keyword query parameters
| Parameter | Description |
|---|---|
| keyword() | Use this parameter to find a specific word |
| onField() | Use this parameter to specify in which lucene field to search the word |
| matching() | use this parameter to specify the match for search string |
| createQuery() | creates the Lucene query object |
- The value "storm" is passed through the
historyFieldBridge. This is useful when numbers or dates are involved. - The field bridge value is then passed to the analyzer used to index the field
history. This ensures that the query uses the same term transformation than the indexing (lower case, ngram, stemming and so on). If the analyzing process generates several terms for a given word, a boolean query is used with theSHOULDlogic (roughly anORlogic).
@Indexed public class Myth { @Field(analyze = Analyze.NO) @DateBridge(resolution = Resolution.YEAR) public Date getCreationDate() { return creationDate; } public Date setCreationDate(Date creationDate) { this.creationDate = creationDate; } private Date creationDate; ... } Date birthdate = ...; Query luceneQuery = mythQb.keyword().onField("creationDate").matching(birthdate).createQuery();
Note
Date object had to be converted to its string representation (in this case the year)
FieldBridge has an objectToString method (and all built-in FieldBridge implementations do).
@AnalyzerDef(name = "ngram", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class ), filters = { @TokenFilterDef(factory = StandardFilterFactory.class), @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = StopFilterFactory.class), @TokenFilterDef(factory = NGramFilterFactory.class, params = { @Parameter(name = "minGramSize", value = "3"), @Parameter(name = "maxGramSize", value = "3") } ) } ) public class Myth { @Field(analyzer=@Analyzer(definition="ngram") @DateBridge(resolution = Resolution.YEAR) public String getName() { return name; } public String setName(Date name) { this.name = name; } private String name; ... } Date birthdate = ...; Query luceneQuery = mythQb.keyword().onField("name").matching("Sisiphus") .createQuery();
y). All that is transparently done for the user.
Note
ignoreAnalyzer() or ignoreFieldBridge() functions can be called.
//search document with storm or lightning in their history Query luceneQuery = mythQB.keyword().onField("history").matching("storm lightning").createQuery();
onFields method.
Query luceneQuery = mythQB .keyword() .onFields("history","description","name") .matching("storm") .createQuery();
andField() method for that.
Query luceneQuery = mythQB.keyword() .onField("history") .andField("name") .boostedTo(5) .andField("description") .matching("storm") .createQuery();
7.1.4. Fuzzy Queries
keyword query and add the fuzzy flag.
Query luceneQuery = mythQB .keyword() .fuzzy() .withThreshold( .8f ) .withPrefixLength( 1 ) .onField("history") .matching("starm") .createQuery();
threshold is the limit above which two terms are considering matching. It is a decimal between 0 and 1 and the default value is 0.5. The prefixLength is the length of the prefix ignored by the "fuzzyness". While the default value is 0, a non zero value is recommended for indexes containing a huge amount of distinct terms.
7.1.5. Wildcard Queries
? represents a single character and * represents any character sequence. Note that for performance purposes, it is recommended that the query does not start with either ? or *.
Query luceneQuery = mythQB .keyword() .wildcard() .onField("history") .matching("sto*") .createQuery();
Note
* or ? being mangled is too high.
7.1.6. Phrase Queries
phrase() to do so.
Query luceneQuery = mythQB .phrase() .onField("history") .sentence("Thou shalt not kill") .createQuery();
Query luceneQuery = mythQB .phrase() .withSlop(3) .onField("history") .sentence("Thou kill") .createQuery();
7.1.7. Range Queries
//look for 0 <= starred < 3 Query luceneQuery = mythQB .range() .onField("starred") .from(0).to(3).excludeLimit() .createQuery(); //look for myths strictly BC Date beforeChrist = ...; Query luceneQuery = mythQB .range() .onField("creationDate") .below(beforeChrist).excludeLimit() .createQuery();
7.1.8. Combining Queries
SHOULD: the query should contain the matching elements of the subquery.MUST: the query must contain the matching elements of the subquery.MUST NOT: the query must not contain the matching elements of the subquery.
//look for popular modern myths that are not urban Date twentiethCentury = ...; Query luceneQuery = mythQB .bool() .must( mythQB.keyword().onField("description").matching("urban").createQuery() ) .not() .must( mythQB.range().onField("starred").above(4).createQuery() ) .must( mythQB .range() .onField("creationDate") .above(twentiethCentury) .createQuery() ) .createQuery(); //look for popular myths that are preferably urban Query luceneQuery = mythQB .bool() .should( mythQB.keyword().onField("description").matching("urban").createQuery() ) .must( mythQB.range().onField("starred").above(4).createQuery() ) .createQuery(); //look for all myths except religious ones Query luceneQuery = mythQB .all() .except( monthQb .keyword() .onField( "description_stem" ) .matching( "religion" ) .createQuery() ) .createQuery();
7.1.9. Query Options
boostedTo(on query type and on field) boosts the whole query or the specific field to a given factorwithConstantScore(on query) returns all results that match the query have a constant score equals to the boostfilteredBy(Filter)(on query) filters query results using theFilterinstanceignoreAnalyzer(on field) ignores the analyzer when processing this fieldignoreFieldBridge(on field) ignores field bridge when processing this field
Query luceneQuery = mythQB .bool() .should( mythQB.keyword().onField("description").matching("urban").createQuery() ) .should( mythQB .keyword() .onField("name") .boostedTo(3) .ignoreAnalyzer() .matching("urban").createQuery() ) .must( mythQB .range() .boostedTo(5).withConstantScore() .onField("starred").above(4).createQuery() ) .createQuery();
7.1.10. Build a Hibernate Search Query
7.1.10.1. Generality
Example 7.4. Wrapping a Lucene Query in a Hibernate Query
FullTextSession fullTextSession = Search.getFullTextSession( session ); org.hibernate.Query fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery );
Example 7.5. Filtering the Search Result by Entity Type
fullTextQuery = fullTextSession .createFullTextQuery( luceneQuery, Customer.class ); // or fullTextQuery = fullTextSession .createFullTextQuery( luceneQuery, Item.class, Actor.class );
Customers. The second part of the same example returns matching Actors and Items. The type restriction is polymorphic. As a result, if the two subclasses Salesman and Customer of the base class Person return, specify Person.class to filter based on result types.
7.1.10.2. Pagination
Example 7.6. Defining pagination for a search query
org.hibernate.Query fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery, Customer.class ); fullTextQuery.setFirstResult(15); //start from the 15th element fullTextQuery.setMaxResults(10); //return 10 elements
Note
fulltextQuery.getResultSize()
7.1.10.3. Sorting
Example 7.7. Specifying a Lucene Sort
org.hibernate.search.FullTextQuery query = s.createFullTextQuery( query, Book.class ); org.apache.lucene.search.Sort sort = new Sort( new SortField("title", SortField.STRING)); query.setSort(sort); List results = query.list();
Note
7.1.10.4. Fetching Strategy
Example 7.8. Specifying FetchMode on a query
Criteria criteria = s.createCriteria( Book.class ).setFetchMode( "authors", FetchMode.JOIN ); s.createFullTextQuery( luceneQuery ).setCriteriaQuery( criteria );
Important
Criteria query because the getResultSize() throws a SearchException if used in conjunction with a Criteria with restriction.
setCriteriaQuery.
7.1.10.5. Projection
Object[]. Projections prevent a time consuming database round-trip. However, they have following constraints:
- The properties projected must be stored in the index (
@Field(store=Store.YES)), which increases the index size. - the properties projected must use a
FieldBridgeimplementingorg.hibernate.search.bridge.TwoWayFieldBridgeororg.hibernate.search.bridge.TwoWayStringBridge, the latter being the simpler version.Note
All Hibernate Search built-in types are two-way. - Only the simple properties of the indexed entity or its embedded associations can be projected. Therefore a whole embedded entity cannot be projected.
- Projection does not work on collections or maps which are indexed via
@IndexedEmbedded
Example 7.9. Using Projection to Retrieve Metadata
org.hibernate.search.FullTextQuery query = s.createFullTextQuery( luceneQuery, Book.class ); query.setProjection( FullTextQuery.SCORE, FullTextQuery.THIS, "mainAuthor.name" ); List results = query.list(); Object[] firstResult = (Object[]) results.get(0); float score = firstResult[0]; Book book = firstResult[1]; String authorName = firstResult[2];
FullTextQuery.THIS: returns the initialized and managed entity (as a non projected query would have done).FullTextQuery.DOCUMENT: returns the Lucene Document related to the object projected.FullTextQuery.OBJECT_CLASS: returns the class of the indexed entity.FullTextQuery.SCORE: returns the document score in the query. Scores are handy to compare one result against an other for a given query but are useless when comparing the result of different queries.FullTextQuery.ID: the id property value of the projected object.FullTextQuery.DOCUMENT_ID: the Lucene document id. Careful, Lucene document id can change overtime between two different IndexReader opening.FullTextQuery.EXPLANATION: returns the Lucene Explanation object for the matching object/document in the given query. This is not suitable for retrieving large amounts of data. Running explanation typically is as costly as running the whole Lucene query per matching element. As a result, projection is recommended.
7.1.10.6. Customizing Object Initialization Strategies
Example 7.10. Check the second-level cache before using a query
FullTextQuery query = session.createFullTextQuery(luceneQuery, User.class); query.initializeObjectWith( ObjectLookupMethod.SECOND_LEVEL_CACHE, DatabaseRetrievalMethod.QUERY );
ObjectLookupMethod defines the strategy to check if an object is easily accessible (without fetching it from the database). Other options are:
ObjectLookupMethod.PERSISTENCE_CONTEXTis used if many matching entities are already loaded into the persistence context (loaded in theSessionorEntityManager).ObjectLookupMethod.SECOND_LEVEL_CACHEchecks the persistence context and then the second-level cache.
- Correctly configure and activate the second-level cache.
- Enable the second-level cache for the relevant entity. This is done using annotations such as
@Cacheable. - Enable second-level cache read access for either
Session,EntityManagerorQuery. UseCacheMode.NORMALin Hibernate native APIs orCacheRetrieveMode.USEin Java Persistence APIs).
Warning
ObjectLookupMethod.SECOND_LEVEL_CACHE. Other second-level cache providers do not implement this operation efficiently.
DatabaseRetrievalMethod as follows:
QUERY(default) uses a set of queries to load several objects in each batch. This approach is recommended.FIND_BY_IDloads one object at a time using theSession.getorEntityManager.findsemantic. This is recommended if the batch size is set for the entity, which allows Hibernate Core to load entities in batches.
7.1.10.7. Limiting the Time of a Query
- Raise an exception when arriving at the limit.
- Limit to the number of results retrieved when the time limit is raised.
7.1.10.8. Raise an Exception on Time Limit
QueryTimeoutException is raised (org.hibernate.QueryTimeoutException or javax.persistence.QueryTimeoutException depending on the programmatic API).
Example 7.11. Defining a Timeout in Query Execution
Query luceneQuery = ...; FullTextQuery query = fullTextSession.createFullTextQuery(luceneQuery, User.class); //define the timeout in seconds query.setTimeout(5); //alternatively, define the timeout in any given time unit query.setTimeout(450, TimeUnit.MILLISECONDS); try { query.list(); } catch (org.hibernate.QueryTimeoutException e) { //do something, too slow }
getResultSize(), iterate() and scroll() honor the timeout until the end of the method call. As a result, Iterable or the ScrollableResults ignore the timeout. Additionally, explain() does not honor this timeout period. This method is used for debugging and to check the reasons for slow performance of a query.
Example 7.12. Defining a Timeout in Query Execution
Query luceneQuery = ...; FullTextQuery query = fullTextEM.createFullTextQuery(luceneQuery, User.class); //define the timeout in milliseconds query.setHint( "javax.persistence.query.timeout", 450 ); try { query.getResultList(); } catch (javax.persistence.QueryTimeoutException e) { //do something, too slow }
Important
7.2. Retrieving the Results
list(), uniqueResult(), iterate(), scroll() are available.
7.2.1. Performance Considerations
list() or uniqueResult() are recommended. list() work best if the entity batch-size is set up properly. Note that Hibernate Search has to process all Lucene Hits elements (within the pagination) when using list() , uniqueResult() and iterate().
scroll() is more appropriate. Don't forget to close the ScrollableResults object when you're done, since it keeps Lucene resources. If you expect to use scroll, but wish to load objects in batch, you can use query.setFetchSize(). When an object is accessed, and if not already loaded, Hibernate Search will load the next fetchSize objects in one pass.
Important
7.2.2. Result Size
- for the Google-like feature "1-10 of about 888,000,000"
- to implement a fast pagination navigation
- to implement a multi step search engine (adding approximation if the restricted query return no or not enough results)
- To provide a total search results feature, as provided by Google searches. For example, "1-10 of about 888,000,000 results".
- To implement fast pagination navigation.
- to implement a multi-step search engine that adds approximation if the restricted query returns zero or not enough results.
Example 7.13. Determining the Result Size of a Query
org.hibernate.search.FullTextQuery query = s.createFullTextQuery( luceneQuery, Book.class ); //return the number of matching books without loading a single one assert 3245 == query.getResultSize(); org.hibernate.search.FullTextQuery query = s.createFullTextQuery( luceneQuery, Book.class ); query.setMaxResult(10); List results = query.list(); //return the total number of matching books regardless of pagination assert 3245 == query.getResultSize();
Note
7.2.3. ResultTransformer
Object arrays. This data structure is not always matching the application needs. In this cases It is possible to apply a ResultTransformer which post query execution can build the needed data structure:
Example 7.14. Using ResultTransformer with Projections
org.hibernate.search.FullTextQuery query = s.createFullTextQuery( luceneQuery, Book.class ); query.setProjection( "title", "mainAuthor.name" ); query.setResultTransformer( new StaticAliasToBeanResultTransformer( BookView.class, "title", "author" ) ); List<BookView> results = (List<BookView>) query.list(); for(BookView view : results) { log.info( "Book: " + view.getTitle() + ", " + view.getAuthor() ); }
ResultTransformer implementations can be found in the Hibernate Core codebase.
7.2.4. Understanding Results
Explanation object for a given result (in a given query). This class is considered fairly advanced to Lucene users but can provide a good understanding of the scoring of an object. You have two ways to access the Explanation object for a given result:
- Use the
fullTextQuery.explain(int)method - Use projection
FullTextQuery.DOCUMENT_ID constant.
Warning
Explanation object using the FullTextQuery.EXPLANATION constant.
Example 7.15. Retrieving the Lucene Explanation Object Using Projection
FullTextQuery ftQuery = s.createFullTextQuery( luceneQuery, Dvd.class ) .setProjection( FullTextQuery.DOCUMENT_ID, FullTextQuery.EXPLANATION, FullTextQuery.THIS ); @SuppressWarnings("unchecked") List<Object[]> results = ftQuery.list(); for (Object[] result : results) { Explanation e = (Explanation) result[1]; display( e.toString() ); }
7.3. Filters
- security
- temporal data (example, view only last month's data)
- population filter (example, search limited to a given category)
- and many more
7.3.1. Using Filters in a Sharded Environment
- create a sharding strategy that does select a subset of
IndexManagers depending on some filter configuration - activate the proper filter at query time
public class CustomerShardingStrategy implements IndexShardingStrategy { // stored IndexManagers in a array indexed by customerID private IndexManager[] indexManagers; public void initialize(Properties properties, IndexManager[] indexManagers) { this.indexManagers = indexManagers; } public IndexManager[] getIndexManagersForAllShards() { return indexManagers; } public IndexManager getIndexManagerForAddition( Class<?> entity, Serializable id, String idInString, Document document) { Integer customerID = Integer.parseInt(document.getFieldable("customerID").stringValue()); return indexManagers[customerID]; } public IndexManager[] getIndexManagersForDeletion( Class<?> entity, Serializable id, String idInString) { return getIndexManagersForAllShards(); } /** * Optimization; don't search ALL shards and union the results; in this case, we * can be certain that all the data for a particular customer Filter is in a single * shard; simply return that shard by customerID. */ public IndexManager[] getIndexManagersForQuery( FullTextFilterImplementor[] filters) { FullTextFilter filter = getCustomerFilter(filters, "customer"); if (filter == null) { return getIndexManagersForAllShards(); } else { return new IndexManager[] { indexManagers[Integer.parseInt( filter.getParameter("customerID").toString())] }; } } private FullTextFilter getCustomerFilter(FullTextFilterImplementor[] filters, String name) { for (FullTextFilterImplementor filter: filters) { if (filter.getName().equals(name)) return filter; } return null; } }
customer is present, we make sure to only use the shard dedicated to this customer. Otherwise, we return all shards. A given Sharding strategy can react to one or more filters and depends on their parameters.
ShardSensitiveOnlyFilter class when declaring your filter.
@Indexed @FullTextFilterDef(name="customer", impl=ShardSensitiveOnlyFilter.class) public class Customer { ... } FullTextQuery query = ftEm.createFullTextQuery(luceneQuery, Customer.class); query.enableFulltextFilter("customer").setParameter("CustomerID", 5); @SuppressWarnings("unchecked") List<Customer> results = query.getResultList();
ShardSensitiveOnlyFilter, you do not have to implement any Lucene filter. Using filters and sharding strategy reacting to these filters is recommended to speed up queries in a sharded environment.
7.4. Faceting
Example 7.16. Search for Hibernate Search on Amazon
QueryBuilder and FullTextQuery are the entry point into the faceting API. The former creates faceting requests and the latter accesses the FacetManager. The FacetManager applies faceting requests on a query and selects facets that are added to an existing query to refine search results. The examples use the entity Cd as shown in Example 7.17, “Entity Cd”:
Example 7.17. Entity Cd
@Indexed public class Cd { private int id; @Fields( { @Field, @Field(name = "name_un_analyzed", analyze = Analyze.NO) }) private String name; @Field(analyze = Analyze.NO) @NumericField private int price; Field(analyze = Analyze.NO) @DateBridge(resolution = Resolution.YEAR) private Date releaseYear; @Field(analyze = Analyze.NO) private String label; // setter/getter ...
7.4.1. Creating a Faceting Request
FacetingRequest. Currently two types of faceting requests are supported. The first type is called discrete faceting and the second type range faceting request. In the case of a discrete faceting request you specify on which index field you want to facet (categorize) and which faceting options to apply. An example for a discrete faceting request can be seen in Example 7.18, “Creating a discrete faceting request”:
Example 7.18. Creating a discrete faceting request
QueryBuilder builder = fullTextSession.getSearchFactory() .buildQueryBuilder() .forEntity( Cd.class ) .get(); FacetingRequest labelFacetingRequest = builder.facet() .name( "labelFaceting" ) .onField( "label") .discrete() .orderedBy( FacetSortOrder.COUNT_DESC ) .includeZeroCounts( false ) .maxFacetCount( 1 ) .createFacetingRequest();
Facet instance will be created for each discrete value for the indexed field label. The Facet instance will record the actual field value including how often this particular field value occurs within the original query results. orderedBy, includeZeroCounts and maxFacetCount are optional parameters which can be applied on any faceting request. orderedBy allows to specify in which order the created facets will be returned. The default is FacetSortOrder.COUNT_DESC, but you can also sort on the field value or the order in which ranges were specified. includeZeroCount determines whether facets with a count of 0 will be included in the result (per default they are) and maxFacetCount allows to limit the maximum amount of facets returned.
Note
String, Date or a subtype of Number and null values should be avoided. Furthermore the property has to be indexed with Analyze.NO and in case of a numeric property @NumericField needs to be specified.
below and above can only be specified once, but you can specify as many from - to ranges as you want. For each range boundary you can also specify via excludeLimit whether it is included into the range or not.
Example 7.19. Creating a range faceting request
QueryBuilder builder = fullTextSession.getSearchFactory() .buildQueryBuilder() .forEntity( Cd.class ) .get(); FacetingRequest priceFacetingRequest = builder.facet() .name( "priceFaceting" ) .onField( "price" ) .range() .below( 1000 ) .from( 1001 ).to( 1500 ) .above( 1500 ).excludeLimit() .createFacetingRequest();
7.4.2. Applying a Faceting Request
FacetManager which can be retrieved via the FullTextQuery (see Example 7.20, “Applying a faceting request”).
Example 7.20. Applying a faceting request
// create a fulltext query Query luceneQuery = builder.all().createQuery(); // match all query FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery, Cd.class ); // retrieve facet manager and apply faceting request FacetManager facetManager = fullTextQuery.getFacetManager(); facetManager.enableFaceting( priceFacetingRequest ); // get the list of Cds List<Cd> cds = fullTextQuery.list(); ... // retrieve the faceting results List<Facet> facets = facetManager.getFacets( "priceFaceting" ); ...
getFacets() specifiying the faceting request name. There is also a disableFaceting() method which allows you to disable a faceting request by specifying its name.
7.4.3. Restricting Query Results
Facets as additional criteria on your original query in order to implement a "drill-down" functionality. For this purpose FacetSelection can be utilized. FacetSelections are available via the FacetManager and allow you to select a facet as query criteria (selectFacets), remove a facet restriction (deselectFacets), remove all facet restrictions (clearSelectedFacets) and retrieve all currently selected facets (getSelectedFacets). Example 7.21, “Restricting query results via the application of a FacetSelection” shows an example.
Example 7.21. Restricting query results via the application of a FacetSelection
// create a fulltext query Query luceneQuery = builder.all().createQuery(); // match all query FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery, clazz ); // retrieve facet manager and apply faceting request FacetManager facetManager = fullTextQuery.getFacetManager(); facetManager.enableFaceting( priceFacetingRequest ); // get the list of Cd List<Cd> cds = fullTextQuery.list(); assertTrue(cds.size() == 10); // retrieve the faceting results List<Facet> facets = facetManager.getFacets( "priceFaceting" ); assertTrue(facets.get(0).getCount() == 2) // apply first facet as additional search criteria facetManager.getFacetGroup( "priceFaceting" ).selectFacets( facets.get( 0 ) ); // re-execute the query cds = fullTextQuery.list(); assertTrue(cds.size() == 2);
7.5. Optimizing the Query Process
- the Lucene query itself: read the literature on this subject
- the number of object loaded: use pagination (always) or index projection (if needed)
- the way Hibernate Search interacts with the Lucene readers: defines the appropriate reader strategy. For details, refer to Section 4.3, “Reader Strategies”
- caching frequently extracted values from the index: see Section 7.5.1, “Caching Index Values: FieldCache”
7.5.1. Caching Index Values: FieldCache
CacheFromIndex annotation you can experiment different kinds of caching of the main metadata fields required by Hibernate Search:
import static org.hibernate.search.annotations.FieldCacheType.CLASS; import static org.hibernate.search.annotations.FieldCacheType.ID; @Indexed @CacheFromIndex( { CLASS, ID } ) public class Essay { ...
CLASS: Hibernate Search will use a Lucene FieldCache to improve peformance of the Class type extraction from the index.This value is enabled by default, and is what Hibernate Search will apply if you don't specify the @CacheFromIndexannotation.ID: Extracting the primary identifier will use a cache. This is likely providing the best performing queries, but will consume much more memory which in turn might reduce performance.
Note
- Memory usage: these caches can be quite memory hungry. Typically the CLASS cache has lower requirements than the ID cache.
- Index warmup: when using field caches, the first query on a new index or segment will be slower than when you don't have caching enabled.
CLASS field cache, this might not be used; for example if you are targeting a single class, obviously all returned values will be of that type (this is evaluated at each Query execution).
TwoWayFieldBridge (as all builting bridges), and all types being loaded in a specific query must use the fieldname for the id, and have ids of the same type (this is evaluated at each Query execution).
Chapter 8. Manual Index Changes
8.1. Adding Instances to the Index
FullTextSession.index(T entity) you can directly add or update a specific object instance to the index. If this entity was already indexed, then the index will be updated. Changes to the index are only applied at transaction commit.
Example 8.1. Indexing an entity via FullTextSession.index(T entity)
FullTextSession fullTextSession = Search.getFullTextSession(session); Transaction tx = fullTextSession.beginTransaction(); Object customer = fullTextSession.load( Customer.class, 8 ); fullTextSession.index(customer); tx.commit(); //index only updated at commit time
MassIndexer: see Section 8.3.2, “Using a MassIndexer” for more details.
8.2. Deleting Instances from the Index
FullTextSession.
Example 8.2. Purging a specific instance of an entity from the index
FullTextSession fullTextSession = Search.getFullTextSession(session); Transaction tx = fullTextSession.beginTransaction(); for (Customer customer : customers) { fullTextSession.purge( Customer.class, customer.getId() ); } tx.commit(); //index is updated at commit time
purgeAll method. This operation removes all entities of the type passed as a parameter as well as all its subtypes.
Example 8.3. Purging all instances of an entity from the index
FullTextSession fullTextSession = Search.getFullTextSession(session); Transaction tx = fullTextSession.beginTransaction(); fullTextSession.purgeAll( Customer.class ); //optionally optimize the index //fullTextSession.getSearchFactory().optimize( Customer.class ); tx.commit(); //index changes are applied at commit time
Note
index, purge, and purgeAll are available on FullTextEntityManager as well.
Note
index, purge, and purgeAll) only affect the index, not the database, nevertheless they are transactional and as such they won't be applied until the transaction is successfully committed, or you make use of flushToIndexes.
8.3. Rebuilding the Index
- Using
FullTextSession.flushToIndexes()periodically, while usingFullTextSession.index()on all entities. - Use a
MassIndexer.
8.3.1. Using flushToIndexes()
FullTextSession.purgeAll() and FullTextSession.index(), however there are some memory and efficiency constraints. For maximum efficiency Hibernate Search batches index operations and executes them at commit time. If you expect to index a lot of data you need to be careful about memory consumption since all documents are kept in a queue until the transaction commit. You can potentially face an OutOfMemoryException if you don't empty the queue periodically: to do this you can use fullTextSession.flushToIndexes(). Every time fullTextSession.flushToIndexes() is called (or if the transaction is committed), the batch queue is processed applying all index changes. Be aware that, once flushed, the changes cannot be rolled back.
Example 8.4. Index rebuilding using index() and flushToIndexes()
fullTextSession.setFlushMode(FlushMode.MANUAL); fullTextSession.setCacheMode(CacheMode.IGNORE); transaction = fullTextSession.beginTransaction(); //Scrollable results will avoid loading too many objects in memory ScrollableResults results = fullTextSession.createCriteria( Email.class ) .setFetchSize(BATCH_SIZE) .scroll( ScrollMode.FORWARD_ONLY ); int index = 0; while( results.next() ) { index++; fullTextSession.index( results.get(0) ); //index each element if (index % BATCH_SIZE == 0) { fullTextSession.flushToIndexes(); //apply changes to indexes fullTextSession.clear(); //free memory since the queue is processed } } transaction.commit();
Note
hibernate.search.default.worker.batch_size has been deprecated in favor of this explicit API which provides better control
8.3.2. Using a MassIndexer
MassIndexer uses several parallel threads to rebuild the index; you can optionally select which entities need to be reloaded or have it reindex all entities. This approach is optimized for best performance but requires to set the application in maintenance mode: making queries to the index is not recommended when a MassIndexer is busy.
Warning
Example 8.6. Using a Tuned MassIndexer
fullTextSession .createIndexer( User.class ) .batchSizeToLoadObjects( 25 ) .cacheMode( CacheMode.NORMAL ) .threadsToLoadObjects( 12 ) .idFetchSize( 150 ) .progressMonitor( monitor ) //a MassIndexerProgressMonitor implementation .startAndWait();
FieldBridges or ClassBridges to output a Lucene document. The threads trigger lazyloading of additional attributes during the conversion process. Because of this, a high number of threads working in parallel is required. The number of threads working on actual index writing is defined by the backend configuration of each index. .
CacheMode.IGNORE (the default), as in most reindexing situations the cache will be a useless additional overhead; it might be useful to enable some other CacheMode depending on your data: it could increase performance if the main entity is relating to enum-like data included in the index.
Note
Note
hibernate.search.[default|<indexname>].exclusive_index_usehibernate.search.[default|<indexname>].indexwriter.max_buffered_docshibernate.search.[default|<indexname>].indexwriter.max_merge_docshibernate.search.[default|<indexname>].indexwriter.merge_factorhibernate.search.[default|<indexname>].indexwriter.merge_min_sizehibernate.search.[default|<indexname>].indexwriter.merge_max_sizehibernate.search.[default|<indexname>].indexwriter.merge_max_optimize_sizehibernate.search.[default|<indexname>].indexwriter.merge_calibrate_by_deleteshibernate.search.[default|<indexname>].indexwriter.ram_buffer_sizehibernate.search.[default|<indexname>].indexwriter.term_index_interval
max_field_length but this was removed from Lucene, it's possible to obtain a similar effect by using a LimitTokenCountAnalyzer.
.indexwriter parameters are Lucene specific and Hibernate Search is just passing these parameters through - see Section 5.5.1, “Tuning Lucene Indexing Performance” for more details.
MassIndexer uses a forward only scrollable result to iterate on the primary keys to be loaded, but MySQL's JDBC driver will load all values in memory; to avoid this "optimization" set idFetchSize to Integer.MIN_VALUE.
Chapter 9. Index Optimization
- on an idle system or when searches are least frequent.
- after a large number of index modifications are applied.
MassIndexer (see Section 8.3.2, “Using a MassIndexer”) it will optimize involved indexes by default at the start and at the end of processing; you can change this behavior by using respectively MassIndexer.optimizeAfterPurge and MassIndexer.optimizeOnFinish.
MassIndexer (see Section 8.3.2, “Using a MassIndexer”) optimizes indexes by default at the start and at the end of processing. Use MassIndexer.optimizeAfterPurge and MassIndexer.optimizeOnFinish to change this default behavior.
9.1. Automatic Optimization
- a certain amount of operations (insertion or deletion).
- or a certain amount of transactions.
Example 9.1. Defining automatic optimization parameters
hibernate.search.default.optimizer.operation_limit.max = 1000 hibernate.search.default.optimizer.transaction_limit.max = 100 hibernate.search.Animal.optimizer.transaction_limit.max = 50
Animal index as soon as either:
- the number of additions and deletions reaches
1000. - the number of transactions reaches
50(hibernate.search.Animal.optimizer.transaction_limit.maxhas priority overhibernate.search.default.optimizer.transaction_limit.max)
org.hibernate.search.store.optimization.OptimizerStrategy and setting the optimizer.implementation property to the fully qualified name of your implementation. This implementation must implement the interface, be a public class and have a public constructor taking no arguments.
Example 9.2. Loading a custom OptimizerStrategy
hibernate.search.default.optimizer.implementation = com.acme.worlddomination.SmartOptimizer hibernate.search.default.optimizer.SomeOption = CustomConfigurationValue hibernate.search.humans.optimizer.implementation = default
default can be used to select the Hibernate Search default implementation; all properties after the .optimizer key separator will be passed to the implementation's initialize method at start.
9.2. Manual Optimization
SearchFactory:
Example 9.3. Programmatic Index Optimization
FullTextSession fullTextSession = Search.getFullTextSession(regularSession); SearchFactory searchFactory = fullTextSession.getSearchFactory(); searchFactory.optimize(Order.class); // or searchFactory.optimize();
Orders and the second optimizes all indexes.
Note
searchFactory.optimize() has no effect on a JMS backend. You must apply the optimize operation on the Master node.
9.3. Adjusting Optimization
hibernate.search.[default|<indexname>].indexwriter.max_buffered_docshibernate.search.[default|<indexname>].indexwriter.max_merge_docshibernate.search.[default|<indexname>].indexwriter.merge_factorhibernate.search.[default|<indexname>].indexwriter.ram_buffer_sizehibernate.search.[default|<indexname>].indexwriter.term_index_interval
Chapter 10. Monitoring
Statistics object via SearchFactory.getStatistics(). It allows you for example to determine which classes are indexed and how many entities are in the index. This information is always available. However, by specifying the hibernate.search.generate_statistics property in your configuration you can also collect total and average Lucene query and object loading timings.
10.1. JMX
10.1.1. About JMX
hibernate.search.jmx_enabled will automatically register the StatisticsInfoMBean. Depending on your the configuration the IndexControlMBean and IndexingProgressMonitorMBean will also be registered. Lets have a closer look at the different MBeans.
Note
com.sun.management.jmxremote to true.
10.1.2. StatisticsInfoMBean
Statistics object as desribed in the previous section.
10.1.3. IndexControlMBean
SessionFactory is bound to JNDI via the hibernate.session_factory_name property. Refer to the Hibernate Core manual for more information on how to configure JNDI. The IndexControlMBean and its API are for now experimental.
10.1.4. IndexingProgressMonitorMBean
MassIndexerProgressMonitor interface. If hibernate.search.jmx_enabled is enabled and the mass indexer API is used the indexing progress can be followed via this bean. The bean will only be bound to JMX while indexing is in progress. Once indexing is completed the MBean is not longer available.
Chapter 11. Advanced Features
11.1. Accessing the SearchFactory
SearchFactory object keeps track of the underlying Lucene resources for Hibernate Search. It is a convenient way to access Lucene natively. The SearchFactory can be accessed from a FullTextSession:
Example 11.1. Accessing the SearchFactory
FullTextSession fullTextSession = Search.getFullTextSession(regularSession); SearchFactory searchFactory = fullTextSession.getSearchFactory();
11.2. Using an IndexReader
IndexReader. Hibernate Search might cache index readers to maximize performance, or provide other efficient strategies to retrieve an updated IndexReader minimizing IO operations. Your code can access these cached resources, but you have to follow some "good citizen" rules.
Example 11.2. Accessing an IndexReader
IndexReader reader = searchFactory.getIndexReaderAccessor().open(Order.class); try { //perform read-only operations on the reader } finally { searchFactory.getIndexReaderAccessor().close(reader); }
SearchFactory figures out which indexes are needed to query this entity (considering a Sharding strategy). Using the configured ReaderProvider (described in Section 4.3, “Reader Strategies”) on each index, it returns a compound IndexReader on top of all involved indexes. Because this IndexReader is shared amongst several clients, you must adhere to the following rules:
- Never call indexReader.close(), instead use readerProvider.closeReader(reader) when necessary, preferably in a finally block.
- Don't use this
IndexReaderfor modification operations (it's a readonlyIndexReader, you would get an exception).
IndexReader freely, especially to do native Lucene queries. Using the shared IndexReaders will make most queries more efficient than by opening one directly from - for example - the filesystem.
open(Class... types) you can use open(String... indexNames); in this case you pass in one or more index names; using this strategy you can also select a subset of the indexes for any indexed type if sharding is used.
Example 11.3. Accessing an IndexReader by index names
IndexReader reader = searchFactory.getIndexReaderAccessor().open("Products.1", "Products.3");
11.3. Accessing a Lucene Directory
Directory is the most common abstraction used by Lucene to represent the index storage; Hibernate Search doesn't interact directly with a Lucene Directory but abstracts these interactions via an IndexManager: an index does not necessarily need to be implemented by a Directory.
Directory and need to access it, you can get a reference to the Directory via the IndexManager. Cast the IndexManager to a DirectoryBasedIndexManager and then use getDirectoryProvider().getDirectory() to get a reference to the underlying Directory. This is not recommended, we would encourage to use the IndexReader instead.
11.4. Sharding Indexes
Warning
- A single index is so huge that index update times are slowing the application down.
- A typical search will only hit a sub-set of the index, such as when data is naturally segmented by customer, region or application.
hibernate.search.<indexName>.sharding_strategy.nbr_of_shards property as seen in Example 11.4, “Enabling Index Sharding”. In this example 5 shards are enabled.
Example 11.4. Enabling Index Sharding
hibernate.search.<indexName>.sharding_strategy.nbr_of_shards = 5
IndexShardingStrategy. The default sharding strategy splits the data according to the hash value of the id string representation (generated by the FieldBridge). This ensures a fairly balanced sharding. You can replace the default strategy by implementing a custom IndexShardingStrategy. To use your custom strategy you have to set the hibernate.search.<indexName>.sharding_strategy property.
Example 11.5. Specifying a Custom Sharding Strategy
hibernate.search.<indexName>.sharding_strategy = my.shardingstrategy.Implementation
IndexShardingStrategy also allows for optimizing searches by selecting which shard to run the query against. By activating a filter (see Section 7.3.1, “Using Filters in a Sharded Environment”), a sharding strategy can select a subset of the shards used to answer a query (IndexShardingStrategy.getIndexManagersForQuery) and thus speed up the query execution.
IndexManager and so can be configured to use a different directory provider and back end configurations. The IndexManager index names for the Animal entity in Example 11.6, “Sharding Configuration for Entity Animal” are Animal.0 to Animal.4. In other words, each shard has the name of its owning index followed by . (dot) and its index number (see also Section 5.3, “Directory Configuration”).
Example 11.6. Sharding Configuration for Entity Animal
hibernate.search.default.indexBase = /usr/lucene/indexes hibernate.search.Animal.sharding_strategy.nbr_of_shards = 5 hibernate.search.Animal.directory_provider = filesystem hibernate.search.Animal.0.indexName = Animal00 hibernate.search.Animal.3.indexBase = /usr/lucene/sharded hibernate.search.Animal.3.indexName = Animal03
Animal index into 5 sub-indexes. All sub-indexes are filesystem instances and the directory where each sub-index is stored is as followed:
- for sub-index 0:
/usr/lucene/indexes/Animal00(shared indexBase but overridden indexName) - for sub-index 1:
/usr/lucene/indexes/Animal.1(shared indexBase, default indexName) - for sub-index 2:
/usr/lucene/indexes/Animal.2(shared indexBase, default indexName) - for sub-index 3:
/usr/lucene/shared/Animal03(overridden indexBase, overridden indexName) - for sub-index 4:
/usr/lucene/indexes/Animal.4(shared indexBase, default indexName)
IndexShardingStrategy any field can be used to determine the sharding selection. Consider that to handle deletions, purge and purgeAll operations, the implementation might need to return one or more indexes without being able to read all the field values or the primary identifier; in case the information is not enough to pick a single index, all indexes should be returned, so that the delete operation will be propagated to all indexes potentially containing the documents to be deleted.
11.5. Sharing Indexes
- Configuring the underlying directory providers to point to the same physical index directory. In practice, you set the property
hibernate.search.[fully qualified entity name].indexNameto the same value. As an example let’s use the same index (directory) for theFurnitureandAnimalentity. We just setindexNamefor both entities to for example “Animal”. Both entities will then be stored in the Animal directory.hibernate.search.org.hibernate.search.test.shards.Furniture.indexName = Animal hibernate.search.org.hibernate.search.test.shards.Animal.indexName = Animal
- Setting the
@Indexedannotation’sindexattribute of the entities you want to merge to the same value. If we again wanted allFurnitureinstances to be indexed in theAnimalindex along with all instances ofAnimalwe would specify@Indexed(index="Animal")on bothAnimalandFurnitureclasses.Note
This is only presented here so that you know the option is available. There is really not much benefit in sharing indexes.
11.6. About Using External Services
DirectoryProvider. The full list is:
DirectoryProviderReaderProviderOptimizerStrategyBackendQueueProcessorWorkerErrorHandlerMassIndexerProgressMonitor
SearchFactory. Sometimes, you even want the same service to be shared amongst several instances of these contract.
11.6.1. Exposing a Service
org.hibernate.search.spi.ServiceProvider<T>. T is the type of the service you want to use. Services are retrieved by components via their ServiceProvider class implementation.
11.6.1.1. Managed Services
start and stop methods of ServiceProvider. When the service is requested, the getService method is called.
Example 11.7. Example of ServiceProvider implementation
public class CacheServiceProvider implements ServiceProvider<Cache> { private CacheManager manager; public void start(Properties properties) { //read configuration manager = new CacheManager(properties); } public Cache getService() { return manager.getCache(DEFAULT); } void stop() { manager.close(); } }
Note
ServiceProvider implementation must have a no-arg constructor.
META-INF/services/org.hibernate.search.spi.ServiceProvider whose content list the (various) service provider implementation(s).
Example 11.8. Content of META-INF/services/org.hibernate.search.spi.ServiceProvider
com.acme.infra.hibernate.CacheServiceProvider
11.6.1.2. Provided Services
CacheContainer instance is not managed by Hibernate Search and the start/stop methods of its corresponding service provider will not be used.
Note
ServiceProvider class as a managed service, the provided service will be used.
SearchConfiguration interface (getProvidedServices).
Important
11.6.2. Using a Service
BuildContext interface.
Example 11.9. Example of a Directory Provider Using a Cache Service
public CustomDirectoryProvider implements DirectoryProvider<RAMDirectory> { private BuildContext context; public void initialize( String directoryProviderName, Properties properties, BuildContext context) { //initialize this.context = context; } public void start() { Cache cache = context.requestService( CacheServiceProvider.class ); //use cache } public RAMDirectory getDirectory() { // use cache } public stop() { //stop services context.releaseService( CacheServiceProvider.class ); } }
DirectoryProvider.stop method if the DirectoryProvider uses the service during its lifetime or could be released right away of the service is simply used at initialization time.
11.7. Customizing Lucene's Scoring Formula
org.apache.lucene.search.Similarity. The abstract methods defined in this class match the factors of the following formula calculating the score of query q for document d:
| Factor | Description |
|---|---|
| tf(t ind) | Term frequency factor for the term (t) in the document (d). |
| idf(t) | Inverse document frequency of the term. |
| coord(q,d) | Score factor based on how many of the query terms are found in the specified document. |
| queryNorm(q) | Normalizing factor used to make scores between queries comparable. |
| t.getBoost() | Field boost. |
| norm(t,d) | Encapsulates a few (indexing time) boost and length factors. |
Similarity's Javadocs for more information.
Similarity implementation using the property hibernate.search.similarity. The default value is org.apache.lucene.search.DefaultSimilarity.
similarity property
hibernate.search.default.similarity = my.custom.Similarity
@Similarity annotation.
@Entity
@Indexed
@Similarity(impl = DummySimilarity.class)
public class Book {
...
}tf(float freq) should return 1.0.
Warning
Similarity implementation. Classes in the same class hierarchy always share the index, so it's not allowed to override the Similarity implementation in a subtype.


