Chapter 26. Red Hat JBoss Data Grid as Lucene Directory

26.1. Red Hat JBoss Data Grid as Lucene Directory

Red Hat JBoss Data Grid can be used as a shared, in-memory index (Infinispan Directory) for Hibernate Search queries on a relational database. By default, Hibernate Search uses a local filesystem to store the Lucene indexes but optionally it can be configured to use JBoss Data Grid as a storage to achieve real-time replication across multiple server nodes.

In the Infinispan Directory, the index is stored in memory and shared across multiple nodes. The Infinispan Directory acts as a single directory distributed across all participating nodes. An index update on one node updates the index on all the nodes. Index can be searched immediately after the node update across the cluster. The default Hibernate Search configuration replicates the data defining the index across all the nodes.

Data distribution for large indexes may be enabled to consume less memory; however, this will come at a cost of locality resulting in query operations less efficient. The indexed data can also be offloaded to a CacheStore configured on each node or configure a single centralized CacheStore shared by each node.

Note

While enabling distribution rather than replication might save memory, the queries will be slower. Enabling a CacheStore might save even more memory, but at cost of additional performance if used for passivation.

26.2. Configuration

The directory provider is enabled by specifying it per index. If the default index is specified then all indexes will use the directory provider unless specified:

hibernate.search.[default|<indexname>].directory_provider = infinispan

This gives a cluster-replicated index, but the default configuration does not enable any form of permanent persistence for the index. To enable such a feature provide an Infinispan configuration file.

Hibernate Search requires a CacheManager to use Infinispan. It can look up and reuse an existing CacheManager , via JNDI, or start and manage a new one. When looking up an existing CacheManager this will be provided from the Infinispan subsystem where it was originally registered; for instance, if this was registered via JBoss EAP, then JBoss EAP’s Infinispan subsystem will provide the CacheManager .

Note

When using JNDI to register a CacheManager , it must be done using Red Hat JBoss Data Grid configuration files only.

To use an existing CacheManager via JNDI (optional parameter):

hibernate.search.infinispan.cachemanager_jndiname = [jndiname]

To start a new CacheManager from a configuration file (optional parameter):

hibernate.search.infinispan.configuration_resourcename = [infinispan configuration filename]

If both the parameters are defined, JNDI will have priority. If none of these are defined, Hibernate Search will use the default Infinispan configuration which does not store the index in a persistent cache store.

26.3. Red Hat JBoss Data Grid Modules

Red Hat JBoss Data Grid directory provider for Hibernate Search are distributed as part of the JBoss Data Grid Library Modules for JBoss EAP. Download the files from the Red Hat Customer Portal.

Unpack the archive into the modules/ directory in the target JBoss Enterprise Application Platform folder.

Add the following entry to the MANIFEST.MF file in the project archive:

Dependencies: org.hibernate.search.orm services

For more information, see the Generate MANIFEST.MF entries using Maven section in the Red Hat JBoss EAP Development Guide.

26.4. Lucene Directory Configuration for Replicated Indexing

Define the following properties in the Hibernate configuration and in the Persistence unit configuration file when using standard JPA. For instance, to change all of the default storage indexes the following property could be configured:

hibernate.search.default.directory_provider=infinispan

This may also be performed on unique indexes. In the following example tickets and actors are index names:

hibernate.search.tickets.directory_provider=infinispan
hibernate.search.actors.directory_provider=filesystem

Lucene’s DirectoryProvider uses the following options to configure the cache names:

  • locking_cachename - Cache name where Lucene’s locks are stored. Defaults to LuceneIndexesLocking.
  • data_cachename - Cache name where Lucene’s data is stored, including the largest data chunks and largest objects. Defaults to LuceneIndexesData.
  • metadata_cachename - Cache name where Lucene’s metadata is stored. Defaults to LuceneIndexesMetadata.

To adjust the name of the locking cache to CustomLockingCache use the following:

hibernate.search.default.directory_provider.locking_cachname="CustomLockingCache"

In addition, large files of the index are split into a smaller, configurable, chunk. It is often recommended to set the index’s chunk_size to the highest value that may be handled efficiently by the network.

Hibernate Search already contains internally a default configuration which uses replicated caches to hold the indexes.

It is important that if more than one node writes to the index at the same time, configure a JMS backend. For more information on the configuration, see the Hibernate Search documentation.

Important

In settings where distribution mode is needed to configure, the LuceneIndexesMetadata and LuceneIndexesLocking caches should always use replication mode in all the cases.

26.5. JMS Master and Slave Back End Configuration

While using an Infinispan directory, it is recommended to use the JMS Master/Slave backend. In Infinispan, all nodes share the same index and since IndexWriter is active on different nodes, it acquires the lock on the same index. So instead of sending updates directly to the index, send it to a JMS queue and make a single node apply all changes on behalf of all other nodes.

Warning

Not enabling a JMS based backend will lead to timeout exceptions when multiple nodes attempt to write to the index.

To configure a JMS slave, replace only the backend and set the directory provider to Infinispan. Set the same directory provider on the master and it will connect without the need to set up the copy job across nodes.

For Master and Slave backend configuration examples, see the Back End Setup and Operations section of the Red Hat JBoss EAP[ref]Administration and Configuration Guide .