Chapter 5. Mapping Entities to the Index Structure

All the metadata information needed to index entities is described through some Java annotations. There is no need for xml mapping files nor a list of indexed entities. The list is discovered at startup by scanning the Hibernate mapped entities.

5.1. Mapping an entity

5.1.1. Basic mapping

First, we must declare a persistent class as indexable. This is done by annotating the class with @Indexed (all entities not annotated with @Indexed will be ignored by the indexing process):
@Entity
				@Indexed(index="indexes/essays")
				public class Essay {
				...
				}
The index attribute tells Hibernate what the Lucene directory name is (usually a directory on your file system). If you wish to define a base directory for all Lucene indexes, you can use the hibernate.search.default.indexBase property in your configuration file. Each entity instance will be represented by a Lucene Document inside the given index (aka Directory).
For each property (or attribute) of your entity, you have the ability to describe how it will be indexed. The default (i.e. no annotation) means that the property is completely ignored by the indexing process. @Field does declare a property as indexed. When indexing an element to a Lucene document you can specify how it is indexed:
  • name : describe under which name, the property should be stored in the Lucene Document. The default value is the property name (following the JavaBeans convention)
  • store : describe whether or not the property is stored in the Lucene index. You can store the value Store.YES (consuming more space in the index but allowing projection, see Section 6.1.2.5, “Projection” for more information), store it in a compressed way Store.COMPRESS (this does consume more CPU), or avoid any storage Store.NO (this is the default value). When a property is stored, you can retrieve it from the Lucene Document (note that this is not related to whether the element is indexed or not).
  • index: describe how the element is indexed (i.e. the process used to index the property and the type of information store). The different values are Index.NO (no indexing, i.e. cannot be found by a query), Index.TOKENIZED (use an analyzer to process the property), Index.UN_TOKENISED (no analyzer pre-processing), Index.NO_NORM (do not store the normalization data). The default value is TOKENIZED.
These attributes are part of the @Field annotation.
Whether or not you want to store the data depends on how you wish to use the index query result. For a regular Hibernate Search usage, storing is not necessary. However you might want to store some fields to subsequently project them (see Section 6.1.2.5, “Projection” for more information).
Whether or not you want to tokenize a property depends on whether you wish to search the element as is, or by the words it contains. It make sense to tokenize a text field, but it does not to do it for a date field (or an id field). Note that fields used for sorting must not be tokenized.
Finally, the id property of an entity is a special property used by Hibernate Search to ensure index unicity of a given entity. By design, an id has to be stored and must not be tokenized. To mark a property as index id, use the @DocumentId annotation.
@Entity
				@Indexed(index="indexes/essays")
				public class Essay {
				...
				
				@Id
				@DocumentId
				public Long getId() { return id; }
				
				@Field(name="Abstract", index=Index.TOKENIZED, store=Store.YES)
				public String getSummary() { return summary; }
				
				@Lob
				@Field(index=Index.TOKENIZED)
				public String getText() { return text; }
				}
These annotations define an index with three fields: id , Abstract and text . Note that by default the field name is decapitalized, following the JavaBean specification.

Note

You must specify @DocumentId on the identifier property of your entity class.

5.1.2. Mapping properties multiple times

It is sometimes needed to map a property multiple times per index, with slightly different indexing strategies. Especially, sorting a query by field requires the field to be UN_TOKENIZED. If one want to search by words in this property and still sort it, one need to index it twice, once tokenized, once untokenized. @Fields allows to achieve this goal.
@Entity
				@Indexed(index = "Book" )
				public class Book {
				@Fields( {
				@Field(index = Index.TOKENIZED),
				@Field(name = "summary_forSort", index = Index.UN_TOKENIZED, store = Store.YES)
				} )
				public String getSummary() {
				return summary;
				}
				
				...
				}
The field summary is indexed twice, once as summary in a tokenized way, and once as summary_forSort in an untokenized way. @Field supports 2 attributes useful when @Fields is used:
  • analyzer: defines a @Analyzer annotation per field rather than per property
  • bridge: defines a @FieldBridge annotation per field rather than per property
See below for more information about analyzers and field bridges.

5.1.3. Embedded and Associated Objects

Associated objects as well as embedded objects can be indexed as part of the root entity index. It is necessary if you expect to search a given entity based on properties of the associated object(s). In the following example, the use case is to return the places whose city is Atlanta (In the Lucene query parser language, it would translate into address.city:Atlanta).
@Entity
				@Indexed
				public class Place {
				@Id
				@GeneratedValue
				@DocumentId
				private Long id;
				
				@Field( index = Index.TOKENIZED )
				private String name;
				
				@OneToOne( cascade = { CascadeType.PERSIST, CascadeType.REMOVE } )
				@IndexedEmbedded
				private Address address;
				....
				}
				
				@Entity
				@Indexed
				public class Address {
				@Id
				@GeneratedValue
				@DocumentId
				private Long id;
				
				@Field(index=Index.TOKENIZED)
				private String street;
				
				@Field(index=Index.TOKENIZED)
				private String city;
				
				@ContainedIn
				@OneToMany(mappedBy="address")
				private Set<Place> places;
				...
				}
In this example, the place fields will be indexed in the Place index. The Place index documents will also contain the fields address.id, address.street, and address.city which you will be able to query. This is enabled by the @IndexedEmbedded annotation.
Be careful. Because the data is denormalized in the Lucene index when using the @IndexedEmbedded technique, Hibernate Search needs to be aware of any change in the Place object and any change in the Address object to keep the index up to date. To make sure the Place Lucene document is updated when it's Address changes, you need to mark the other side of the bidirectional relationship with @ContainedIn.
@ContainedIn is only useful on associations pointing to entities as opposed to embedded (collection of) objects.
Let's make our example a bit more complex:
@Entity
				@Indexed
				public class Place {
				@Id
				@GeneratedValue
				@DocumentId
				private Long id;
				
				@Field( index = Index.TOKENIZED )
				private String name;
				
				@OneToOne( cascade = { CascadeType.PERSIST, CascadeType.REMOVE } )
				@IndexedEmbedded
				private Address address;
				....
				}
				
				@Entity
				@Indexed
				public class Address {
				@Id
				@GeneratedValue
				@DocumentId
				private Long id;
				
				@Field(index=Index.TOKENIZED)
				private String street;
				
				@Field(index=Index.TOKENIZED)
				private String city;
				
				@IndexedEmbedded(depth = 1, prefix = "ownedBy_")
				private Owner ownedBy;
				
				@ContainedIn
				@OneToMany(mappedBy="address")
				private Set<Place> places;
				...
				}
				
				@Embeddable
				public class Owner {
				@Field(index = Index.TOKENIZED)
				private String name;
				...
				}
Any @*ToOne and @Embedded attribute can be annotated with @IndexedEmbedded. The attributes of the associated class will then be added to the main entity index. In the previous example, the index will contain the following fields
  • id
  • name
  • address.street
  • address.city
  • addess.ownedBy_name
The default prefix is propertyName., following the traditional object navigation convention. You can override it using the prefix attribute as it is shown on the ownedBy property.
depth is necessary when the object graph contains a cyclic dependency of classes (not instances). For example, if Owner points to Place. Hibernate Search will stop including Indexed embedded attributes after reaching the expected depth (or the object graph boundaries are reached). A class having a self reference is an example of cyclic dependency. In our example, because depth is set to 1, any @IndexedEmbedded attribute in Owner (if any) will be ignored.
Such a feature (@IndexedEmbedded) is very useful to express queries referring to associated objects, such as:
  • Return places where name contains JBoss and where address city is Atlanta. In Lucene query this would be
    +name:jboss +address.city:atlanta
  • Return places where name contains JBoss and where owner's name contain Joe. In Lucene query this would be
    +name:jboss +address.orderBy_name:joe
In a way it mimics the relational join operation in a more efficient way (at the cost of data duplication). Remember that, out of the box, Lucene indexes have no notion of association, the join operation is simply non-existent. It might help to keep the relational model normalized while benefiting from the full text index speed and feature richness.

Note

An associated object can itself be (but don't have to) @Indexed
When @IndexedEmbedded points to an entity, the association has to be directional and the other side has to be annotated @ContainedIn (as see in the previous example). If not, Hibernate Search has no way to update the root index when the associated entity is updated (in our example, a Place index document has to be updated when the associated Address instance is updated.
Sometimes, the object type annotated by @IndexedEmbedded is not the object type targeted by Hibernate and Hibernate Search especially when interface are used in lieu of their implementation. You can override the object type targeted by Hibernate Search using the targetElement parameter.
@Entity
				@Indexed
				public class Address {
				@Id
				@GeneratedValue
				@DocumentId
				private Long id;
				
				@Field(index= Index.TOKENIZED)
				private String street;
				
				@IndexedEmbedded(depth = 1, prefix = "ownedBy_", targetElement = Owner.class)
				@Target(Owner.class)
				private Person ownedBy;
				
				
				...
				}
				
				@Embeddable
				public class Owner implements Person { ... }

5.1.4. Boost factor

Lucene has the notion of boost factor . It's a way to give more weight to a field or to an indexed element over an other during the indexation process. You can use @Boost at the field or the class level.
@Entity
				@Indexed(index="indexes/essays")
				@Boost(2)
				public class Essay {
				...
				
				@Id
				@DocumentId
				public Long getId() { return id; }
				
				@Field(name="Abstract", index=Index.TOKENIZED, store=Store.YES)
				@Boost(2.5f)
				public String getSummary() { return summary; }
				
				@Lob
				@Field(index=Index.TOKENIZED)
				public String getText() { return text; }
				}
In our example, Essay's probability to reach the top of the search list will be multiplied by 2 and the summary field will be 2.5 more important than the test field. Note that this explanation is actually wrong, but it is simple and close enough to the reality. Please check the Lucene documentation or the excellent Lucene In Action from Otis Gospodnetic and Erik Hatcher.

5.1.5. Analyzer

The default analyzer class used to index the elements is configurable through the hibernate.search.analyzer property. If none is defined, org.apache.lucene.analysis.standard.StandardAnalyzer is used as the default.
You can also define the analyzer class per entity, per property and even per @Field (useful when multiple fields are indexed from a single property).
@Entity
				@Indexed
				@Analyzer(impl = EntityAnalyzer.class)
				public class MyEntity {
				@Id
				@GeneratedValue
				@DocumentId
				private Integer id;
				
				@Field(index = Index.TOKENIZED)
				private String name;
				
				@Field(index = Index.TOKENIZED)
				@Analyzer(impl = PropertyAnalyzer.class)
				private String summary;
				
				@Field(index = Index.TOKENIZED, analyzer = @Analyzer(impl = FieldAnalyzer.class)
				private String body;
				
				...
				}
In this example, EntityAnalyzer is used index all tokenized properties (e.g. name), except for summary and body which are indexed with PropertyAnalyzer and FieldAnalyzer respectively.

Warning

Mixing different analyzers in the same entity is most of the time a bad practice. It makes query building more complex and results less predictable (for the novice), especially if you are using a QueryParser (which uses the same analyzer for the whole query). As a thumb rule, the same analyzer should be used for both the indexing and the query for a given field.