Chapter 11. Set Up and Configure the Infinispan Query API

11.1. Set Up Infinispan Query

11.1.1. Infinispan Query Dependencies in Library Mode

To use the JBoss Data Grid Infinispan Query via Maven, add the following dependencies:

<dependency>
    <groupId>org.infinispan</groupId>
    <artifactId>infinispan-embedded-query</artifactId>
    <version>${infinispan.version}</version>
</dependency>

Non-Maven users must install all of the infinispan-embedded-query.jar and infinispan-embedded.jar files from the JBoss Data Grid distribution.

Warning

The Infinispan query API directly exposes the Hibernate Search and the Lucene APIs and cannot be embedded within the infinispan-embedded-query.jar file. Do not include other versions of Hibernate Search and Lucene in the same deployment as infinispan-embedded-query . This action will cause classpath conflicts and result in unexpected behavior.

11.2. Indexing Modes

11.2.1. Managing Indexes

In Red Hat JBoss Data Grid’s Query Module there are two options for storing indexes:

  1. Each node can maintain an individual copy of the global index.
  2. The index can be shared across all nodes.

When the indexes are stored locally, by setting indexLocalOnly to true, each write to cache must be forwarded to all other nodes so that they can update their indexes. If the index is shared, by setting indexLocalOnly to false, only the node where the write originates is required to update the shared index.

Lucene provides an abstraction of the directory structure called directory provider, which is used to store the index. The index can be stored, for example, as in-memory, on filesystem, or in distributed cache.

11.2.2. Managing the Index in Local Mode

In local mode, any Lucene Directory implementation may be used. The indexLocalOnly option is meaningless in local mode.

11.2.3. Managing the Index in Replicated Mode

In replication mode, each node can store its own local copy of the index. To store indexes locally on each node, set indexLocalOnly to false, so that each node will apply the required updates it receives from other nodes in addition to the updates started locally.

Any Directory implementation can be used. When a new node is started it must receive an up to date copy of the index. Usually this can be done via resync, however being an external operation, this may result in a slightly out of sync index, particularly where updates are frequent.

Alternatively, if a shared storage for indexes is used (see Infinispan Directory Provider), indexLocalOnly must be set to true so that each node will only apply the changes originated locally. While there is no risk of having an out of sync index, this causes contention on the node used for updating the index.

The following diagram demonstrates a replicated deployment where each node has a local index.

Figure 11.1. Replicated Cache Querying

Indexing in Replicated Mode

11.2.4. Managing the Index in Distribution Mode

In both Distribution modes, the shared index must be used, with the indexLocalOnly set to true.

The following diagram shows a deployment with a shared index.

Figure 11.2. Querying with a Shared Index

Querying with a shared index

11.2.5. Managing the Index in Invalidation Mode

Indexing and searching of elements in Invalidation mode is not supported.

11.3. Directory Providers

11.3.1. Directory Providers

The following directory providers are supported in Infinispan Query:

  • RAM Directory Provider
  • Filesystem Directory Provider
  • Infinispan Directory Provider

11.3.2. RAM Directory Provider

Storing the global index locally in Red Hat JBoss Data Grid’s Query Module allows each node to

  • maintain its own index.
  • use Lucene's in-memory or filesystem-based index directory.

The following example demonstrates an in-memory, RAM-based index store:

<local-cache name="indexesInMemory">
    <indexing index="LOCAL">
        <property name="default.directory_provider">ram</property>
    </indexing>
</local-cache>

11.3.3. Filesystem Directory Provider

To configure the storage of indexes, set the appropriate properties when enabling indexing in the JBoss Data Grid configuration.

This example shows a disk-based index store:

Disk-based Index Store

<local-cache name="indexesInInfinispan">
    <indexing index="ALL">
        <property name="default.directory_provider">filesystem</property>
        <property name="default.indexBase">/tmp/ispn_index</property>
    </indexing>
</local-cache>

11.3.4. Infinispan Directory Provider

In addition to the Lucene directory implementations, Red Hat JBoss Data Grid also ships with an infinispan-directory module.

Note

Red Hat JBoss Data Grid only supports infinispan-directory in the context of the Querying feature, not as a standalone feature.

The infinispan-directory allows Lucene to store indexes within the distributed data grid. This allows the indexes to be distributed, stored in-memory, and optionally written to disk using the cache store for durability.

Sharing the same index instance using the Infinispan Directory Provider introduces a write contention point, as only one instance can write on the same index at the same time.

Important

By default the exclusive_index_use is set to true, as this provides major performance increases; however, if external applications access the same index in use by Infinispan this property must be set to false. The default value is recommended for the majority of applications and use cases due to the performance increases, so only change this if absolutely necessary.

InfinispanIndexManager provides a default back end that sends all updates to master node which later applies the updates to the index. In case of master node failure, the update can be lost, therefore keeping the cache and index non-synchronized. Non-default back ends are not supported.

Enable Shared Indexes

<local-cache name="indexesInInfinispan">
    <indexing index="ALL">
        <property name="default.directory_provider">infinispan</property>
        <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property>
    </indexing>
</local-cache>

When using an indexed, clustered cache ensure that the caches containing the index data are also clustered, as described in Tuning Infinispan Directory.

11.4. Configure Indexing

11.4.1. Configure the Index in Remote Client-Server Mode

In Remote Client-Server Mode, index configuration depends on the provider and its configuration. The indexing mode depends on the provider and whether or not it is local or distributed. The following indexing modes are supported:

  • NONE
  • LOCAL = indexLocalOnly="true"
  • ALL = indexLocalOnly="false"

Index configuration in Remote Client-Server Mode is as follows:

Configuration in Remote Client-Server Mode

<indexing index="LOCAL">
    <property name="default.directory_provider">ram</property>
    <!-- Additional configuration information here -->
</indexing>

Configure Lucene Caches

By default the Lucene caches will be created as local caches; however, with this configuration the Lucene search results are not shared between nodes in the cluster. To prevent this define the caches required by Lucene in a clustered mode, as seen in the following configuration snippet:

Configuring the Lucene cache in Remote Client-Server Mode

<cache-container name="clustered" default-cache="repltestcache">
    [...]
    <replicated-cache name="LuceneIndexesMetadata" />
    <distributed-cache name="LuceneIndexesData" />
    <replicated-cache name="LuceneIndexesLocking" />
    [...]
</cache-container>

These caches are discussed in further detail at in the Red Hat JBoss Data Grid Developer Guide .

11.4.2. Rebuilding the Index

The Lucene index can be rebuilt, if required, by reconstructing it from the data store in the cache.

The index must be rebuilt if:

  • The definition of what is indexed in the types has changed.
  • A parameter affecting how the index is defined, such as the Analyser changes.
  • The index is destroyed or corrupted, possibly due to a system administration error.

Rebuilding the index may be performed by executing the Start operation on the MassIndexer MBean.

This operation reprocesses all data in the grid, and therefore may take some time.

11.5. Tuning the Index

11.5.1. Near-Realtime Index Manager

By default, each update is immediately flushed into the index. In order to achieve better throughput, the updates can be batched. However, this can result in a lag between the update and query — the query can see outdated data. If this is acceptable, you can use the Near-Realtime Index Manager by setting the following.

<property name="default.indexmanager">near-real-time</property>

11.5.2. Tuning Infinispan Directory

Lucene directory uses three caches to store the index:

  • Data cache
  • Metadata cache
  • Locking cache

Configuration for these caches can be set explicitly, specifying the cache names as in the example below, and configuring those caches as usual. All of these caches must be clustered unless Infinispan Directory is used in local mode.

Tuning the Infinispan Directory

<distributed-cache name="indexedCache" >
    <indexing index="LOCAL">
        <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property>
        <property name="default.metadata_cachename">lucene_metadata_repl</property>
        <property name="default.data_cachename">lucene_data_dist</property>
        <property name="default.locking_cachename">lucene_locking_repl</property>
    </indexing>
</distributed-cache>

<replicated-cache name="lucene_metadata_repl" />

<distributed-cache name="lucene_data_dist" />

<replicated-cache name="lucene_locking_repl" />

11.5.3. Per-Index Configuration

The indexing properties in examples above apply for all indices - this is because we use the default. prefix for each property. To specify different configuration for each index, replace default with the index name. By default, this is the full class name of the indexed object, however you can override the index name in the @Indexed annotation.