Chapter 6. Mapping Entities to the Index Structure

6.1. Mapping an Entity

In Section 3.1, “Getting Started” you have already learned that all the metadata information needed to index entities is described through annotations. There is no need for xml mapping files. You can still use Hibernate mapping files for the basic Hibernate configuration, but the Hibernate Search specific configuration has to be expressed via annotations.

6.1.1. Basic Mapping

Lets start with the most commonly used annotations for mapping an entity.
The Lucene-based Query API uses the following common annotations to map entities:
  • @Indexed
  • @Field
  • @NumericField
  • @Id

6.1.1.1. @Indexed

Foremost we must declare a persistent class as indexable. This is done by annotating the class with @Indexed (all entities not annotated with @Indexed will be ignored by the indexing process):

Example 6.1. Making a class indexable with @Indexed

@Entity
@Indexed
public class Essay {
    ...
}
You can optionally specify the index attribute of the @Indexed annotation to change the default name of the index. For more information see Section 5.3, “Directory Configuration”.

6.1.1.2. @Field

For each property (or attribute) of your entity, you have the ability to describe how it will be indexed. The default (no annotation present) means that the property is ignored by the indexing process. @Field does declare a property as indexed and allows to configure several aspects of the indexing process by setting one or more of the following attributes:
  • name : describe under which name, the property should be stored in the Lucene Document. The default value is the property name (following the JavaBeans convention)
  • store : describe whether or not the property is stored in the Lucene index. You can store the value Store.YES (consuming more space in the index but allowing projection, see Section 7.1.10.5, “Projection”), store it in a compressed way Store.COMPRESS (this does consume more CPU), or avoid any storage Store.NO (this is the default value). When a property is stored, you can retrieve its original value from the Lucene Document. This is not related to whether the element is indexed or not.
  • index: describe whether the property is indexed or not. The different values are Index.NO (no indexing, ie cannot be found by a query), Index.YES (the element gets indexed and is searchable). The default value is Index.YES. Index.NO can be useful for cases where a property is not required to be searchable, but should be available for projection.

    Note

    Index.NO in combination with Analyze.YES or Norms.YES is not useful, since analyze and norms require the property to be indexed
  • analyze: determines whether the property is analyzed (Analyze.YES) or not (Analyze.NO). The default value is Analyze.YES.

    Note

    Whether or not you want to analyze a property depends on whether you wish to search the element as is, or by the words it contains. It make sense to analyze a text field, but probably not a date field.

    Note

    Fields used for sorting must not be analyzed.
  • norms: describes whether index time boosting information should be stored (Norms.YES) or not (Norms.NO). Not storing it can save a considerable amount of memory, but there won't be any index time boosting information available. The default value is Norms.YES.
  • termVector: describes collections of term-frequency pairs. This attribute enables the storing of the term vectors within the documents during indexing. The default value is TermVector.NO.
    The different values of this attribute are:
    Value Definition
    TermVector.YES Store the term vectors of each document. This produces two synchronized arrays, one contains document terms and the other contains the term's frequency.
    TermVector.NO Do not store term vectors.
    TermVector.WITH_OFFSETS Store the term vector and token offset information. This is the same as TermVector.YES plus it contains the starting and ending offset position information for the terms.
    TermVector.WITH_POSITIONS Store the term vector and token position information. This is the same as TermVector.YES plus it contains the ordinal positions of each occurrence of a term in a document.
    TermVector.WITH_POSITION_OFFSETS Store the term vector, token position and offset information. This is a combination of the YES, WITH_OFFSETS and WITH_POSITIONS.
  • indexNullAs : Per default null values are ignored and not indexed. However, using indexNullAs you can specify a string which will be inserted as token for the null value. Per default this value is set to Field.DO_NOT_INDEX_NULL indicating that null values should not be indexed. You can set this value to Field.DEFAULT_NULL_TOKENto indicate that a default null token should be used. This default null token can be specified in the configuration using hibernate.search.default_null_token. If this property is not set and you specify Field.DEFAULT_NULL_TOKEN the string "_null_" will be used as default.

    Note

    When the indexNullAs parameter is used it is important to use the same token in the search query (see Chapter 7, Querying) to search for null values. It is also advisable to use this feature only with un-analyzed fields (analyze=Analyze.NO).

    Warning

    When implementing a custom FieldBridge or TwoWayFieldBridge it is up to the developer to handle the indexing of null values (see JavaDocs of LuceneOptions.indexNullAs()).

6.1.1.3. @NumericField

There is a companion annotation to @Field called @NumericField that can be specified in the same scope as @Field or @DocumentId. It can be specified for Integer, Long, Float, and Double properties. At index time the value will be indexed using a Trie structure. When a property is indexed as numeric field, it enables efficient range query and sorting, orders of magnitude faster than doing the same query on standard @Field properties. The @NumericField annotation accept the following parameters:
Value Definition
forField (Optional) Specify the name of the related @Field that will be indexed as numeric. It's only mandatory when the property contains more than a @Field declaration
precisionStep (Optional) Change the way that the Trie structure is stored in the index. Smaller precisionSteps lead to more disk space usage and faster range and sort queries. Larger values lead to less space used and range query performance more close to the range query in normal @Fields. Default value is 4.
@NumericField supports only Double, Long, Integer and Float. It is not possible to take any advantage from a similar functionality in Lucene for the other numeric types, so remaining types should use the string encoding via the default or custom TwoWayFieldBridge.
It is possible to use a custom NumericFieldBridge assuming you can deal with the approximation during type transformation:

Example 6.2. Defining a custom NumericFieldBridge


public class BigDecimalNumericFieldBridge extends NumericFieldBridge {
   private static final BigDecimal storeFactor = BigDecimal.valueOf(100);

   @Override
   public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
      if ( value != null ) {
         BigDecimal decimalValue = (BigDecimal) value;
         Long indexedValue = Long.valueOf( decimalValue.multiply( storeFactor ).longValue() );
         luceneOptions.addNumericFieldToDocument( name, indexedValue, document );
      }
   }

    @Override
    public Object get(String name, Document document) {
        String fromLucene = document.get( name );
        BigDecimal storedBigDecimal = new BigDecimal( fromLucene );
        return storedBigDecimal.divide( storeFactor );
    }

}

6.1.1.4. @Id

Finally, the id property of an entity is a special property used by Hibernate Search to ensure index unicity of a given entity. By design, an id has to be stored and must not be tokenized. To mark a property as index id, use the @DocumentId annotation. If you are using JPA and you have specified @Id you can omit @DocumentId. The chosen entity id will also be used as document id.

Example 6.3. Specifying indexed properties

@Entity
@Indexed
public class Essay {
    ...

    @Id
    @DocumentId
    public Long getId() { return id; }

    @Field(name="Abstract", store=Store.YES)
    public String getSummary() { return summary; }

    @Lob
    @Field
    public String getText() { return text; }

    @Field @NumericField( precisionStep = 6)
    public float getGrade() { return grade; }
}
Example 6.3, “Specifying indexed properties” defines an index with four fields: id , Abstract, text and grade . Note that by default the field name is decapitalized, following the JavaBean specification. The grade field is annotated as Numeric with a slightly larger precision step than the default.

6.1.2. Mapping Properties Multiple Times

Sometimes one has to map a property multiple times per index, with slightly different indexing strategies. For example, sorting a query by field requires the field to be un-analyzed. If one wants to search by words in this property and still sort it, one need to index it twice - once analyzed and once un-analyzed. @Fields allows to achieve this goal.

Example 6.4. Using @Fields to map a property multiple times

@Entity
@Indexed(index = "Book" )
public class Book {
    @Fields( {
            @Field,
            @Field(name = "summary_forSort", analyze = Analyze.NO, store = Store.YES)
            } )
    public String getSummary() {
        return summary;
    }

    ...
}
In Example 6.4, “Using @Fields to map a property multiple times” the field summary is indexed twice, once as summary in a tokenized way, and once as summary_forSort in an untokenized way. @Field supports 2 attributes useful when @Fields is used:
See below for more information about analyzers and field bridges.

6.1.3. Embedded and Associated Objects

Associated objects as well as embedded objects can be indexed as part of the root entity index. This is useful if you expect to search a given entity based on properties of associated objects. In Example 6.5, “Indexing associations” the aim is to return places where the associated city is Atlanta (In the Lucene query parser language, it would translate into address.city:Atlanta). The place fields will be indexed in the Place index. The Place index documents will also contain the fields address.id, address.street, and address.city which you will be able to query.

Example 6.5. Indexing associations

@Entity
@Indexed
public class Place {
    @Id
    @GeneratedValue
    @DocumentId
    private Long id;

    @Field
    private String name;

    @OneToOne( cascade = { CascadeType.PERSIST, CascadeType.REMOVE } )
    @IndexedEmbedded
    private Address address;
    ....
}

@Entity
public class Address {
    @Id
    @GeneratedValue
    private Long id;

    @Field
    private String street;

    @Field
    private String city;

    @ContainedIn
    @OneToMany(mappedBy="address")
    private Set<Place> places;
    ...
}
Be careful. Because the data is denormalized in the Lucene index when using the @IndexedEmbedded technique, Hibernate Search needs to be aware of any change in the Place object and any change in the Address object to keep the index up to date. To make sure the Place Lucene document is updated when it's Address changes, you need to mark the other side of the bidirectional relationship with @ContainedIn.

Note

@ContainedIn is useful on both associations pointing to entities and on embedded (collection of) objects.

Example 6.6. Nested usage of @IndexedEmbedded and @ContainedIn

@Entity
@Indexed
public class Place {
    @Id
    @GeneratedValue
    @DocumentId
    private Long id;

    @Field
    private String name;

    @OneToOne( cascade = { CascadeType.PERSIST, CascadeType.REMOVE } )
    @IndexedEmbedded
    private Address address;
    ....
}

@Entity
public class Address {
    @Id
    @GeneratedValue
    private Long id;

    @Field
    private String street;

    @Field
    private String city;

    @IndexedEmbedded(depth = 1, prefix = "ownedBy_")
    private Owner ownedBy;

    @ContainedIn
    @OneToMany(mappedBy="address")
    private Set<Place> places;
    ...
}

@Embeddable
public class Owner {
    @Field
    private String name;
   ...
}
As you can see, any @*ToMany, @*ToOne and @Embedded attribute can be annotated with @IndexedEmbedded. The attributes of the associated class will then be added to the main entity index. In Example 6.6, “Nested usage of @IndexedEmbedded and @ContainedIn the index will contain the following fields:
  • id
  • name
  • address.street
  • address.city
  • address.ownedBy_name
The default prefix is propertyName., following the traditional object navigation convention. You can override it using the prefix attribute as it is shown on the ownedBy property.

Note

The prefix cannot be set to the empty string.
The depth property is necessary when the object graph contains a cyclic dependency of classes (not instances). For example, if Owner points to Place. Hibernate Search will stop including Indexed embedded attributes after reaching the expected depth (or the object graph boundaries are reached). A class having a self reference is an example of cyclic dependency. In our example, because depth is set to 1, any @IndexedEmbedded attribute in Owner (if any) will be ignored.
Using @IndexedEmbedded for object associations allows you to express queries (using Lucene's query syntax) such as:
  • Return places where name contains JBoss and where address city is Atlanta. In Lucene query this would be
    +name:jboss +address.city:atlanta
  • Return places where name contains JBoss and where owner's name contain Joe. In Lucene query this would be
    +name:jboss +address.orderBy_name:joe
In a way it mimics the relational join operation in a more efficient way (at the cost of data duplication). Remember that, out of the box, Lucene indexes have no notion of association, the join operation is simply non-existent. It might help to keep the relational model normalized while benefiting from the full text index speed and feature richness.

Note

An associated object can itself (but does not have to) be @Indexed
When @IndexedEmbedded points to an entity, the association has to be directional and the other side has to be annotated @ContainedIn (as seen in the previous example). If not, Hibernate Search has no way to update the root index when the associated entity is updated (in our example, a Place index document has to be updated when the associated Address instance is updated).
Sometimes, the object type annotated by @IndexedEmbedded is not the object type targeted by Hibernate and Hibernate Search. This is especially the case when interfaces are used in lieu of their implementation. For this reason you can override the object type targeted by Hibernate Search using the targetElement parameter.

Example 6.7. Using the targetElement property of @IndexedEmbedded

@Entity
@Indexed
public class Address {
    @Id
    @GeneratedValue
    @DocumentId
    private Long id;

    @Field
    private String street;

    @IndexedEmbedded(depth = 1, prefix = "ownedBy_", targetElement = Owner.class)
    @Target(Owner.class)
    private Person ownedBy;


    ...
}

@Embeddable
public class Owner implements Person { ... }

6.1.4. Limiting Object Embedding to Specific Paths

The @IndexedEmbedded annotation provides also an attribute includePaths which can be used as an alternative to depth, or be combined with it.
When using only depth all indexed fields of the embedded type will be added recursively at the same depth; this makes it harder to pick only a specific path without adding all other fields as well, which might not be needed.
To avoid unnecessarily loading and indexing entities you can specify exactly which paths are needed. A typical application might need different depths for different paths, or in other words it might need to specify paths explicitly, as shown in Example 6.8, “Using the includePaths property of @IndexedEmbedded

Example 6.8. Using the includePaths property of @IndexedEmbedded

@Entity
@Indexed
public class Person {

   @Id
   public int getId() {
      return id;
   }

   @Field
   public String getName() {
      return name;
   }

   @Field
   public String getSurname() {
      return surname;
   }

   @OneToMany
   @IndexedEmbedded(includePaths = { "name" })
   public Set<Person> getParents() {
      return parents;
   }

   @ContainedIn
   @ManyToOne
   public Human getChild() {
      return child;
   }

    ...//other fields omitted
Using a mapping as in Example 6.8, “Using the includePaths property of @IndexedEmbedded, you would be able to search on a Person by name and/or surname, and/or the name of the parent. It will not index the surname of the parent, so searching on parent's surnames will not be possible but speeds up indexing, saves space and improve overall performance.
The @IndexedEmbeddedincludePaths will include the specified paths in addition to what you would index normally specifying a limited value for depth. When using includePaths, and leaving depth undefined, behavior is equivalent to setting depth=0: only the included paths are indexed.

Example 6.9. Using the includePaths property of @IndexedEmbedded

@Entity
@Indexed
public class Human {

   @Id
   public int getId() {
      return id;
   }

   @Field
   public String getName() {
      return name;
   }

   @Field
   public String getSurname() {
      return surname;
   }

   @OneToMany
   @IndexedEmbedded(depth = 2, includePaths = { "parents.parents.name" })
   public Set<Human> getParents() {
      return parents;
   }

   @ContainedIn
   @ManyToOne
   public Human getChild() {
      return child;
   }

    ...//other fields omitted
In Example 6.9, “Using the includePaths property of @IndexedEmbedded, every human will have it's name and surname attributes indexed. The name and surname of parents will be indexed too, recursively up to second line because of the depth attribute. It will be possible to search by name or surname, of the person directly, his parents or of his grand parents. Beyond the second level, we will in addition index one more level but only the name, not the surname.
This results in the following fields in the index:
  • id - as primary key
  • _hibernate_class - stores entity type
  • name - as direct field
  • surname - as direct field
  • parents.name - as embedded field at depth 1
  • parents.surname - as embedded field at depth 1
  • parents.parents.name - as embedded field at depth 2
  • parents.parents.surname - as embedded field at depth 2
  • parents.parents.parents.name - as additional path as specified by includePaths. The first parents. is inferred from the field name, the remaining path is the attribute of includePaths
Having explicit control of the indexed paths might be easier if you're designing your application by defining the needed queries first, as at that point you might know exactly which fields you need, and which other fields are unnecessary to implement your use case.