Chapter 17. Getting Started with Infinispan Query

17.1. Introduction

The Red Hat JBoss Data Grid Library mode Querying API enables you to search for entries in the grid using properties of the values instead of keys. It provides features such as:

  • Keyword, Range, Fuzzy, Wildcard, and Phrase queries
  • Combining queries
  • Sorting, filtering, and pagination of query results

This API, which is based on Apache Lucene and Hibernate Search, is supported in Red Hat JBoss Data Grid. Additionally, Red Hat JBoss Data Grid provides an alternate mechanism that allows both indexless and indexed searching. See The Infinispan Query DSL for details.

Enabling Querying

The Querying API is enabled by default in Remote Client-Server Mode. Instructions for enabling Querying in Library Mode are found in the Red Hat JBoss Data Grid Administration and Configuration Guide. .

17.2. Installing Querying for Red Hat JBoss Data Grid

In Red Hat JBoss Data Grid, the JAR files required to perform queries are packaged within the Red Hat JBoss Data Grid Library and Remote Client-Server mode downloads.

For details about downloading and installing Red Hat JBoss Data Grid, see the Download and Install JBoss Data Grid chapter in the Getting Started Guide.

In addition, the following Maven dependency must be defined:

<dependency>
 <groupId>org.infinispan</groupId>
 <artifactId>infinispan-embedded-query</artifactId>
 <version>${version.infinispan}</version>
</dependency>
Warning

The Infinispan query API directly exposes the Hibernate Search and the Lucene APIs and cannot be embedded within the infinispan-embedded-query.jar file. Do not include other versions of Hibernate Search and Lucene in the same deployment as infinispan-embedded-query . This action will cause classpath conflicts and result in unexpected behavior.

17.3. About Querying in Red Hat JBoss Data Grid

17.3.1. Hibernate Search and the Query Module

Users have the ability to query the entire stored data set for specific items in Red Hat JBoss Data Grid. Applications may not always be aware of specific keys, however different parts of a value can be queried using the Query Module.

Objects can be searched for based on some of their properties. For example:

  • Retrieve all red cars (an exact metadata match).
  • Search for all books about a specific topic (full text search and relevance scoring).

An exact data match can also be implemented with the MapReduce function, however full text and relevance based scoring can only be performed via the Query Module.

Warning

The query capability is currently intended for rich domain objects, and primitive values are not currently supported for querying.

17.3.2. Apache Lucene and the Query Module

In order to perform querying on the entire data set stored in the distributed grid, Red Hat JBoss Data Grid utilizes the capabilities of the Apache Lucene indexing tool, as well as Hibernate Search.

  • Apache Lucene is a document indexing tool and search engine. JBoss Data Grid uses Apache Lucene 5.5.1.
  • JBoss Data Grid’s Query Module is a toolkit based on Hibernate Search that reduces Java objects into a format similar to a document, which is able to be indexed and queried by Apache Lucene.

In JBoss Data Grid, the Query Module indexes values annotated with Hibernate Search indexing annotations, then updates the index based in Apache Lucene accordingly.

Hibernate Search intercepts changes to entries stored in the data grid to generate corresponding indexing operations

17.4. Indexing

17.4.1. Indexing

When indexing is set up, the Query module transparently indexes every added, updated, or removed cache entry. Indices improve performance of queries, though induce additional overhead during updates. For index-less querying see The Infinispan Query DSL.

For data that already exists in the grid, create an initial Lucene index. After relevant properties and annotations are added, trigger an initial batch index as shown in Rebuilding the Index.

17.4.2. Indexing with Transactional and Non-transactional Caches

In Red Hat JBoss Data Grid, the relationship between transactions and indexing is as follows:

  • If the cache is transactional, index updates are applied using a listener after the commit process (after-commit listener). Index update failure does not cause the write to fail.
  • If the cache is not transactional, index updates are applied using a listener that works after the event completes (post-event listener). Index update failure does not cause the write to fail.

17.4.3. Configure Indexing Programmatically

Indexing can be configured programmatically, avoiding XML configuration files.

In this example, Red Hat JBoss Data Grid is started programmatically and also maps an object Author, which is stored in the grid and made searchable via two properties, without annotating the class.

Configure Indexing Programmatically

SearchMapping mapping = new SearchMapping();
mapping.entity(Author.class).indexed().providedId()
        .property("name", ElementType.METHOD).field()
        .property("surname", ElementType.METHOD).field();

Properties properties = new Properties();
properties.put(org.hibernate.search.cfg.Environment.MODEL_MAPPING, mapping);
properties.put("[other.options]", "[...]");

Configuration infinispanConfiguration = new ConfigurationBuilder()
        .indexing()
        .index(Index.LOCAL)
        .withProperties(properties)
        .build();

DefaultCacheManager cacheManager = new DefaultCacheManager(infinispanConfiguration);

Cache<Long, Author> cache = cacheManager.getCache();
SearchManager sm = Search.getSearchManager(cache);

Author author = new Author(1, "FirstName", "Surname");
cache.put(author.getId(), author);

QueryBuilder qb = sm.buildQueryBuilderForClass(Author.class).get();
Query q = qb.keyword().onField("name").matching("FirstName").createQuery();
CacheQuery cq = sm.getQuery(q, Author.class);
Assert.assertEquals(cq.getResultSize(), 1);

17.4.4. Rebuilding the Index

You can manually rebuild the Lucene index if required. However, you do not usually need to rebuild the index manually because JBoss Data Grid maintains the index during normal operation.

Rebuilding the index actually reconstructs the entire index from the data store, which requires JBoss Data Grid to process all data in the grid and can take a very long time to complete. You should only need to rebuild the Lucene index if:

  • The definition of what is indexed in the types has changed.
  • A parameter affecting how the index is defined, such as the Analyser changes.
  • The index is destroyed or corrupted, possibly due to a system administration error.
Server Mode

To rebuild the index in remote JBoss Data Grid servers, call the reindexCache() method in the RemoteCacheManagerAdmin HotRod client interface, for example:

remoteCacheManager.administration().reindexCache("MyCache");
Library Mode

To rebuild the index in Library mode, obtain a reference to the MassIndexer and start it as follows:

SearchManager searchManager = Search.getSearchManager(cache);
searchManager.getMassIndexer().start();

17.5. Searching

To execute a search, create a Lucene query (see Building a Lucene Query Using the Lucene-based Query API). Wrap the query in a org.infinispan.query.CacheQuery to get the required functionality from the Lucene-based API. The following code prepares a query against the indexed fields. Executing the code returns a list of Books.

Using Infinispan Query to Create and Execute a Search

QueryBuilder qb = Search.getSearchManager(cache).buildQueryBuilderForClass(Book.class).get();

org.apache.lucene.search.Query query = qb
    .keyword()
    .onFields("title", "author")
    .matching("Java rocks!")
    .createQuery();

// wrap Lucene query in a org.infinispan.query.CacheQuery
CacheQuery cacheQuery = Search.getSearchManager(cache).getQuery(query);

List list = cacheQuery.list();