Chapter 10. Set Up and Configure the Infinispan Query API

10.1. Set Up Infinispan Query

10.1.1. Infinispan Query Dependencies in Library Mode

To use the JBoss Data Grid Infinispan Query via Maven, add the following dependencies:

<dependency>
    <groupId>org.infinispan</groupId>
    <artifactId>infinispan-embedded-query</artifactId>
    <version>${infinispan.version}</version>
</dependency>

Non-Maven users must install all of the infinispan-embedded-query.jar and infinispan-embedded.jar files from the JBoss Data Grid distribution.

Warning

The Infinispan query API directly exposes the Hibernate Search and the Lucene APIs and cannot be embedded within the infinispan-embedded-query.jar file. Do not include other versions of Hibernate Search and Lucene in the same deployment as infinispan-embedded-query . This action will cause classpath conflicts and result in unexpected behavior.

10.2. Directory Providers

10.2.1. Directory Providers

The following directory providers are supported in Infinispan Query:

  • RAM Directory Provider
  • Filesystem Directory Provider
  • Infinispan Directory Provider

10.2.2. RAM Directory Provider

Storing the global index locally in Red Hat JBoss Data Grid’s Query Module allows each node to

  • maintain its own index.
  • use Lucene's in-memory or filesystem-based index directory.

The following example demonstrates an in-memory, RAM-based index store:

<local-cache name="indexesInMemory">
    <indexing index="LOCAL">
        <property name="default.directory_provider">ram</property>
    </indexing>
</local-cache>

10.2.3. Filesystem Directory Provider

To configure the storage of indexes, set the appropriate properties when enabling indexing in the JBoss Data Grid configuration.

This example shows a disk-based index store:

Disk-based Index Store

<local-cache name="indexesInInfinispan">
    <indexing index="ALL">
        <property name="default.directory_provider">filesystem</property>
        <property name="default.indexBase">/tmp/ispn_index</property>
    </indexing>
</local-cache>

10.2.4. Infinispan Directory Provider

In addition to the Lucene directory implementations, Red Hat JBoss Data Grid also ships with an infinispan-directory module.

Note

Red Hat JBoss Data Grid only supports infinispan-directory in the context of the Querying feature, not as a standalone feature.

The infinispan-directory allows Lucene to store indexes within the distributed data grid. This allows the indexes to be distributed, stored in-memory, and optionally written to disk using the cache store for durability.

Sharing the same index instance using the Infinispan Directory Provider introduces a write contention point, as only one instance can write on the same index at the same time.

Important

By default the exclusive_index_use is set to true, as this provides major performance increases; however, if external applications access the same index in use by Infinispan this property must be set to false. The default value is recommended for the majority of applications and use cases due to the performance increases, so only change this if absolutely necessary.

InfinispanIndexManager provides a default back end that sends all updates to master node which later applies the updates to the index. In case of master node failure, the update can be lost, therefore keeping the cache and index non-synchronized. Non-default back ends are not supported.

Enable Shared Indexes

<local-cache name="indexesInInfinispan">
    <indexing index="ALL">
        <property name="default.directory_provider">infinispan</property>
        <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property>
    </indexing>
</local-cache>

When using an indexed, clustered cache ensure that the caches containing the index data are also clustered, as described in Tuning Infinispan Directory.

10.3. Configure Indexing

10.3.1. Configure the Index in Remote Client-Server Mode

In Remote Client-Server Mode, index configuration depends on the provider and its configuration. The indexing mode depends on the provider and whether or not it is local or distributed.

The following indexing modes are supported:

  • NONE
  • LOCAL = indexLocalOnly="true"
  • ALL = indexLocalOnly="false"

Index configuration in Remote Client-Server Mode is as follows:

Configuration in Remote Client-Server Mode

<indexing index="LOCAL">
    <property name="default.directory_provider">ram</property>
    <!-- Additional configuration information here -->
</indexing>

Configure Lucene Caches

By default the Lucene caches will be created as local caches; however, with this configuration the Lucene search results are not shared between nodes in the cluster. To prevent this define the caches required by Lucene in a clustered mode, as seen in the following configuration snippet:

Configuring the Lucene cache in Remote Client-Server Mode

<cache-container name="clustered" default-cache="repltestcache">
    [...]
    <replicated-cache name="LuceneIndexesMetadata" />
    <distributed-cache name="LuceneIndexesData" />
    <replicated-cache name="LuceneIndexesLocking" />
    [...]
</cache-container>

These caches are discussed in further detail at in the Red Hat JBoss Data Grid Developer Guide .

10.3.2. Automatic Indexing

You can use the auto-config attribute to automatically configure indexing based on the cache type.

  • Replicated and local caches: Indexing is persisted to disk and is not shared with other processes. Indexing is also configured so that there is minimum delay between the time an object is indexed and the time it becomes available for searches.
  • Distributed caches: Indexing is handled internally to Red Hat JBoss Data Grid as a master-slave mechanism so that indexing operations are delegated to a single node that writes to the index.

The following XML snippet shows a local cache configuration with the auto-config attribute:

<local-cache name="default">
   <indexing index="LOCAL" auto-config="true">
   </indexing>
</local-cache>

The auto-config attribute adds properties to the cache. You can tune the indexing behavior by re-defining the properties or adding new properties.

Table 10.1. Properties for Replicated and Local Caches

PropertyValueDescription

default.directory_provider

filesystem

Use a filesystem to store the index.

default.exclusive_index_use

true

Perform indexing operations in exclusive mode. This mode allows Hibernate Search to optimize writes.

default.indexmanager

near-real-time

Use Lucene’s Near-Real-Time (NRT) search feature.

default.reader.strategy

shared

Reuse the index reader across several queries.

Table 10.2. Properties for Distributed Caches

PropertyValueDescription

default.directory_provider

infinispan

Store indexes interally to JBoss Data Grid.

default.exclusive_index_use

true

Perform indexing operations in exclusive mode. This mode allows Hibernate Search to optimize writes.

default.indexmanager

org.infinispan.query.indexmanager.InfinispanIndexManager

Delegate index write operations to a single node in the cluster.

default.reader.strategy

shared

Reuse the index reader across several queries.

10.3.3. Rebuilding the Index

You can manually rebuild the Lucene index if required. However, you do not usually need to rebuild the index manually because JBoss Data Grid maintains the index during normal operation.

Rebuilding the index actually reconstructs the entire index from the data store, which requires JBoss Data Grid to process all data in the grid and can take a very long time to complete. You should only need to rebuild the Lucene index if:

  • The definition of what is indexed in the types has changed.
  • A parameter affecting how the index is defined, such as the Analyser changes.
  • The index is destroyed or corrupted, possibly due to a system administration error.

Rebuilding the index may be performed by executing the Start operation on the MassIndexer MBean.

10.4. Tuning the Index

10.4.1. Near-Realtime Index Manager

By default, each update is immediately flushed into the index. In order to achieve better throughput, the updates can be batched. However, this can result in a lag between the update and query — the query can see outdated data. If this is acceptable, you can use the Near-Realtime Index Manager by setting the following.

<property name="default.indexmanager">near-real-time</property>

10.4.2. Tuning Infinispan Directory

Lucene directory uses three caches to store the index:

  • Data cache
  • Metadata cache
  • Locking cache

Configuration for these caches can be set explicitly, specifying the cache names as in the example below, and configuring those caches as usual. All of these caches must be clustered unless Infinispan Directory is used in local mode.

Tuning the Infinispan Directory

<distributed-cache name="indexedCache" >
    <indexing index="LOCAL">
        <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property>
        <property name="default.metadata_cachename">lucene_metadata_repl</property>
        <property name="default.data_cachename">lucene_data_dist</property>
        <property name="default.locking_cachename">lucene_locking_repl</property>
    </indexing>
</distributed-cache>

<replicated-cache name="lucene_metadata_repl" />

<distributed-cache name="lucene_data_dist" />

<replicated-cache name="lucene_locking_repl" />

10.4.3. Per-Index Configuration

The indexing properties in examples above apply for all indices - this is because we use the default. prefix for each property. To specify different configuration for each index, replace default with the index name. By default, this is the full class name of the indexed object, however you can override the index name in the @Indexed annotation.