-
Language:
English
-
Language:
English
Chapter 10. Set Up and Configure the Infinispan Query API
10.1. Set Up Infinispan Query
10.1.1. Infinispan Query Dependencies in Library Mode
To use the JBoss Data Grid Infinispan Query via Maven, add the following dependencies:
<dependency> <groupId>org.infinispan</groupId> <artifactId>infinispan-embedded-query</artifactId> <version>${infinispan.version}</version> </dependency>
Non-Maven users must install all of the infinispan-embedded-query.jar and infinispan-embedded.jar files from the JBoss Data Grid distribution.
The Infinispan query API directly exposes the Hibernate Search and the Lucene APIs and cannot be embedded within the infinispan-embedded-query.jar file. Do not include other versions of Hibernate Search and Lucene in the same deployment as infinispan-embedded-query . This action will cause classpath conflicts and result in unexpected behavior.
10.2. Directory Providers
10.2.1. Directory Providers
The following directory providers are supported in Infinispan Query:
- RAM Directory Provider
- Filesystem Directory Provider
- Infinispan Directory Provider
10.2.2. RAM Directory Provider
Storing the global index locally in Red Hat JBoss Data Grid’s Query Module allows each node to
- maintain its own index.
-
use
Lucene
's in-memory or filesystem-based index directory.
The following example demonstrates an in-memory, RAM-based index store:
<local-cache name="indexesInMemory"> <indexing index="LOCAL"> <property name="default.directory_provider">ram</property> </indexing> </local-cache>
10.2.3. Filesystem Directory Provider
To configure the storage of indexes, set the appropriate properties when enabling indexing in the JBoss Data Grid configuration.
This example shows a disk-based index store:
Disk-based Index Store
<local-cache name="indexesInInfinispan"> <indexing index="ALL"> <property name="default.directory_provider">filesystem</property> <property name="default.indexBase">/tmp/ispn_index</property> </indexing> </local-cache>
10.2.4. Infinispan Directory Provider
In addition to the Lucene
directory implementations, Red Hat JBoss Data Grid also ships with an infinispan-directory
module.
Red Hat JBoss Data Grid only supports infinispan-directory
in the context of the Querying feature, not as a standalone feature.
The infinispan-directory
allows Lucene
to store indexes within the distributed data grid. This allows the indexes to be distributed, stored in-memory, and optionally written to disk using the cache store for durability.
Sharing the same index instance using the Infinispan Directory Provider
introduces a write contention point, as only one instance can write on the same index at the same time.
By default the exclusive_index_use
is set to true
, as this provides major performance increases; however, if external applications access the same index in use by Infinispan this property must be set to false
. The default value is recommended for the majority of applications and use cases due to the performance increases, so only change this if absolutely necessary.
InfinispanIndexManager
provides a default back end that sends all updates to master node which later applies the updates to the index. In case of master node failure, the update can be lost, therefore keeping the cache and index non-synchronized. Non-default back ends are not supported.
Enable Shared Indexes
<local-cache name="indexesInInfinispan"> <indexing index="ALL"> <property name="default.directory_provider">infinispan</property> <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property> </indexing> </local-cache>
When using an indexed, clustered cache ensure that the caches containing the index data are also clustered, as described in Tuning Infinispan Directory.
10.3. Configure Indexing
10.3.1. Configure the Index in Remote Client-Server Mode
In Remote Client-Server Mode, index configuration depends on the provider and its configuration. The indexing mode depends on the provider and whether or not it is local or distributed.
The following indexing modes are supported:
- NONE
- LOCAL = indexLocalOnly="true"
- ALL = indexLocalOnly="false"
Index configuration in Remote Client-Server Mode is as follows:
Configuration in Remote Client-Server Mode
<indexing index="LOCAL"> <property name="default.directory_provider">ram</property> <!-- Additional configuration information here --> </indexing>
Configure Lucene Caches
By default the Lucene caches will be created as local caches; however, with this configuration the Lucene search results are not shared between nodes in the cluster. To prevent this define the caches required by Lucene in a clustered mode, as seen in the following configuration snippet:
Configuring the Lucene cache in Remote Client-Server Mode
<cache-container name="clustered" default-cache="repltestcache"> [...] <replicated-cache name="LuceneIndexesMetadata" /> <distributed-cache name="LuceneIndexesData" /> <replicated-cache name="LuceneIndexesLocking" /> [...] </cache-container>
These caches are discussed in further detail at in the Red Hat JBoss Data Grid Developer Guide .
10.3.2. Automatic Indexing
You can use the auto-config
attribute to automatically configure indexing based on the cache type.
- Replicated and local caches: Indexing is persisted to disk and is not shared with other processes. Indexing is also configured so that there is minimum delay between the time an object is indexed and the time it becomes available for searches.
- Distributed caches: Indexing is handled internally to Red Hat JBoss Data Grid as a master-slave mechanism so that indexing operations are delegated to a single node that writes to the index.
The following XML snippet shows a local cache configuration with the auto-config
attribute:
<local-cache name="default"> <indexing index="LOCAL" auto-config="true"> </indexing> </local-cache>
The auto-config
attribute adds properties to the cache. You can tune the indexing behavior by re-defining the properties or adding new properties.
Table 10.1. Properties for Replicated and Local Caches
Property | Value | Description |
---|---|---|
| filesystem | Use a filesystem to store the index. |
| true | Perform indexing operations in exclusive mode. This mode allows Hibernate Search to optimize writes. |
| near-real-time | Use Lucene’s Near-Real-Time (NRT) search feature. |
| shared | Reuse the index reader across several queries. |
Table 10.2. Properties for Distributed Caches
Property | Value | Description |
---|---|---|
| infinispan | Store indexes interally to JBoss Data Grid. |
| true | Perform indexing operations in exclusive mode. This mode allows Hibernate Search to optimize writes. |
| org.infinispan.query.indexmanager.InfinispanIndexManager | Delegate index write operations to a single node in the cluster. |
| shared | Reuse the index reader across several queries. |
10.3.3. Rebuilding the Index
You can manually rebuild the Lucene index if required. However, you do not usually need to rebuild the index manually because JBoss Data Grid maintains the index during normal operation.
Rebuilding the index actually reconstructs the entire index from the data store, which requires JBoss Data Grid to process all data in the grid and can take a very long time to complete. You should only need to rebuild the Lucene index if:
- The definition of what is indexed in the types has changed.
-
A parameter affecting how the index is defined, such as the
Analyser
changes. - The index is destroyed or corrupted, possibly due to a system administration error.
Rebuilding the index may be performed by executing the Start operation on the MassIndexer
MBean.
10.4. Tuning the Index
10.4.1. Near-Realtime Index Manager
By default, each update is immediately flushed into the index. In order to achieve better throughput, the updates can be batched. However, this can result in a lag between the update and query — the query can see outdated data. If this is acceptable, you can use the Near-Realtime Index Manager by setting the following.
<property name="default.indexmanager">near-real-time</property>
10.4.2. Tuning Infinispan Directory
Lucene directory uses three caches to store the index:
- Data cache
- Metadata cache
- Locking cache
Configuration for these caches can be set explicitly, specifying the cache names as in the example below, and configuring those caches as usual. All of these caches must be clustered unless Infinispan Directory is used in local mode.
Tuning the Infinispan Directory
<distributed-cache name="indexedCache" > <indexing index="LOCAL"> <property name="default.indexmanager">org.infinispan.query.indexmanager.InfinispanIndexManager</property> <property name="default.metadata_cachename">lucene_metadata_repl</property> <property name="default.data_cachename">lucene_data_dist</property> <property name="default.locking_cachename">lucene_locking_repl</property> </indexing> </distributed-cache> <replicated-cache name="lucene_metadata_repl" /> <distributed-cache name="lucene_data_dist" /> <replicated-cache name="lucene_locking_repl" />
10.4.3. Per-Index Configuration
The indexing properties in examples above apply for all indices - this is because we use the default.
prefix for each property. To specify different configuration for each index, replace default
with the index name. By default, this is the full class name of the indexed object, however you can override the index name in the @Indexed
annotation.