Chapter 14. Mapping Domain Objects to the Index Structure

14.1. Basic Mapping

In Red Hat JBoss Data Grid, the identifier for all @Indexed objects is the key used to store the value. How the key is indexed can still be customized by using a combination of @Transformable, @ProvidedId, custom types and custom FieldBridge implementations.
The @DocumentId identifier does not apply to JBoss Data Grid values.
The Lucene-based Query API uses the following common annotations to map entities:
  • @Indexed
  • @Field
  • @NumericField

14.1.1. @Indexed

The @Indexed annotation declares a cached entry indexable. All entries not annotated with @Indexed are ignored.

Example 14.1. Making a class indexable with @Indexed

@Indexed
public class Essay {
}
Optionally, specify the index attribute of the @Indexed annotation to change the default name of the index.

14.1.2. @Field

Each property or attribute of an entity can be indexed. Properties and attributes are not annotated by default, and therefore are ignored by the indexing process. The @Field annotation declares a property as indexed and allows the configuration of several aspects of the indexing process by setting one or more of the following attributes:
name
The name under which the property will be stored in the Lucene Document. By default, this attribute is the same as the property name, following the JavaBeans convention.
store
Specifies if the property is stored in the Lucene index. When a property is stored it can be retrieved in its original value from the Lucene Document. This is regardless of whether or not the element is indexed. Valid options are:
  • Store.YES: Consumes more index space but allows projection. See Section 15.1.3.4, “Projection”
  • Store.COMPRESS: Stores the property as compressed. This attribute consumes more CPU.
  • Store.NO: No storage. This is the default setting for the store attribute.
index
Describes if property is indexed or not. The following values are applicable:
  • Index.NO: No indexing is applied; cannot be found by querying. This setting is used for properties that are not required to be searchable, but are able to be projected.
  • Index.YES: The element is indexed and is searchable. This is the default setting for the index attribute.
analyze
Determines if the property is analyzed. The analyze attribute allows a property to be searched by its contents. For example, it may be worthwhile to analyze a text field, whereas a date field does not need to be analyzed. Enable or disable the Analyze attribute using the following:
  • Analyze.YES
  • Analyze.NO
The analyze attribute is enabled by default. The Analyze.YES setting requires the property to be indexed via the Index.YES attribute.
The following attributes are used for sorting, and must not be analyzed.
norms
Determines whether or not to store index time boosting information. Valid settings are:
  • Norms.YES
  • Norms.NO
The default for this attribute is Norms.YES. Disabling norms conserves memory, however no index time boosting information will be available.
termVector
Describes collections of term-frequency pairs. This attribute enables the storing of the term vectors within the documents during indexing. The default value is TermVector.NO. Available settings for this attribute are:
  • TermVector.YES: Stores the term vectors of each document. This produces two synchronized arrays, one contains document terms and the other contains the term's frequency.
  • TermVector.NO: Does not store term vectors.
  • TermVector.WITH_OFFSETS: Stores the term vector and token offset information. This is the same as TermVector.YES plus it contains the starting and ending offset position information for the terms.
  • TermVector.WITH_POSITIONS: Stores the term vector and token position information. This is the same as TermVector.YES plus it contains the ordinal positions of each occurrence of a term in a document.
  • TermVector.WITH_POSITION_OFFSETS: Stores the term vector, token position and offset information. This is a combination of the YES, WITH_OFFSETS, and WITH_POSITIONS.
indexNullAs
By default, null values are ignored and not indexed. However, using indexNullAs permits specification of a string to be inserted as token for the null value. When using the indexNullAs parameter, use the same token in the search query to search for null value. Use this feature only with Analyze.NO. Valid settings for this attribute are:
  • Field.DO_NOT_INDEX_NULL: This is the default value for this attribute. This setting indicates that null values will not be indexed.
  • Field.DEFAULT_NULL_TOKEN: Indicates that a default null token is used. This default null token can be specified in the configuration using the default_null_token property. If this property is not set and Field.DEFAULT_NULL_TOKEN is specified, the string "_null_" will be used as default.

Warning

When implementing a custom FieldBridge or TwoWayFieldBridge it is up to the developer to handle the indexing of null values (see JavaDocs of LuceneOptions.indexNullAs()).

14.1.3. @NumericField

The @NumericField annotation can be specified in the same scope as @Field.
The @NumericField annotation can be specified for Integer, Long, Float, and Double properties. At index time the value will be indexed using a Trie structure. When a property is indexed as numeric field, it enables efficient range query and sorting, orders of magnitude faster than doing the same query on standard @Field properties. The @NumericField annotation accept the following optional parameters:
  • forField: Specifies the name of the related @Field that will be indexed as numeric. It is mandatory when a property contains more than a @Field declaration.
  • precisionStep: Changes the way that the Trie structure is stored in the index. Smaller precisionSteps lead to more disk space usage, and faster range and sort queries. Larger values lead to less space used, and range query performance closer to the range query in normal @Fields. The default value for precisionStep is 4.
@NumericField supports only Double, Long, Integer, and Float. It is not possible to take any advantage from a similar functionality in Lucene for the other numeric types, therefore remaining types must use the string encoding via the default or custom TwoWayFieldBridge.
Custom NumericFieldBridge can also be used. Custom configurations require approximation during type transformation. The following is an example defines a custom NumericFieldBridge.

Example 14.2. Defining a custom NumericFieldBridge

public class BigDecimalNumericFieldBridge extends NumericFieldBridge {
    private static final BigDecimal storeFactor = BigDecimal.valueOf(100);

    @Override
    public void set(String name, 
                    Object value, 
                    Document document, 
                    LuceneOptions luceneOptions) {
        if (value != null) {
            BigDecimal decimalValue = (BigDecimal) value;
            Long indexedValue = Long.valueOf(
                decimalValue
                .multiply(storeFactor)
                .longValue());
            luceneOptions.addNumericFieldToDocument(name, indexedValue, document);
        }
    }

    @Override
    public Object get(String name, Document document) {
        String fromLucene = document.get(name);
        BigDecimal storedBigDecimal = new BigDecimal(fromLucene);
        return storedBigDecimal.divide(storeFactor);
    }
}