7.3. Filters

Apache Lucene has a powerful feature that allows to filter query results according to a custom filtering process. This is a very powerful way to apply additional data restrictions, especially since filters can be cached and reused. Some interesting use cases are:
  • security
  • temporal data (example, view only last month's data)
  • population filter (example, search limited to a given category)
  • and many more

7.3.1. Using Filters in a Sharded Environment

It is possible, in a sharded environment to execute queries on a subset of the available shards. This can be done in two steps:
  • create a sharding strategy that does select a subset of IndexManagers depending on some filter configuration
  • activate the proper filter at query time
Let's first look at an example of sharding strategy that query on a specific customer shard if the customer filter is activated.
public class CustomerShardingStrategy implements IndexShardingStrategy {

 // stored IndexManagers in a array indexed by customerID
 private IndexManager[] indexManagers;
 
 public void initialize(Properties properties, IndexManager[] indexManagers) {
   this.indexManagers = indexManagers;
 }

 public IndexManager[] getIndexManagersForAllShards() {
   return indexManagers;
 }

 public IndexManager getIndexManagerForAddition(
     Class<?> entity, Serializable id, String idInString, Document document) {
   Integer customerID = Integer.parseInt(document.getFieldable("customerID").stringValue());
   return indexManagers[customerID];
 }

 public IndexManager[] getIndexManagersForDeletion(
     Class<?> entity, Serializable id, String idInString) {
   return getIndexManagersForAllShards();
 }

  /**
  * Optimization; don't search ALL shards and union the results; in this case, we 
  * can be certain that all the data for a particular customer Filter is in a single
  * shard; simply return that shard by customerID.
  */
 public IndexManager[] getIndexManagersForQuery(
     FullTextFilterImplementor[] filters) {
   FullTextFilter filter = getCustomerFilter(filters, "customer");
   if (filter == null) {
     return getIndexManagersForAllShards();
   }
   else {
     return new IndexManager[] { indexManagers[Integer.parseInt(
       filter.getParameter("customerID").toString())] };
   }
 }

 private FullTextFilter getCustomerFilter(FullTextFilterImplementor[] filters, String name) {
   for (FullTextFilterImplementor filter: filters) {
     if (filter.getName().equals(name)) return filter;
   }
   return null;
 }
}
In this example, if the filter named customer is present, we make sure to only use the shard dedicated to this customer. Otherwise, we return all shards. A given Sharding strategy can react to one or more filters and depends on their parameters.
The second step is simply to activate the filter at query time. While the filter can be a regular filter (as defined in Section 7.3, “Filters”) which also filters Lucene results after the query, you can make use of a special filter that will only be passed to the sharding strategy and otherwise ignored for the rest of the query. Simply use the ShardSensitiveOnlyFilter class when declaring your filter.
@Indexed
@FullTextFilterDef(name="customer", impl=ShardSensitiveOnlyFilter.class)
public class Customer {
   ...
}

FullTextQuery query = ftEm.createFullTextQuery(luceneQuery, Customer.class);
query.enableFulltextFilter("customer").setParameter("CustomerID", 5);
@SuppressWarnings("unchecked")
List<Customer> results = query.getResultList();
Note that by using the ShardSensitiveOnlyFilter, you do not have to implement any Lucene filter. Using filters and sharding strategy reacting to these filters is recommended to speed up queries in a sharded environment.