Chapter 6. Mapping Entities to the Index Structure
6.1. Mapping an Entity
In Section 3.1, “Getting Started” you have already learned that all the metadata information needed to index entities is described through annotations. There is no need for xml mapping files. You can still use Hibernate mapping files for the basic Hibernate configuration, but the Hibernate Search specific configuration has to be expressed via annotations.
6.1.1. Basic Mapping
Lets start with the most commonly used annotations for mapping an entity.
The Lucene-based Query API uses the following common annotations to map entities:
- @Indexed
- @Field
- @NumericField
- @Id
6.1.1.1. @Indexed
Foremost we must declare a persistent class as indexable. This is done by annotating the class with
@Indexed (all entities not annotated with @Indexed will be ignored by the indexing process):
You can optionally specify the
index attribute of the @Indexed annotation to change the default name of the index. For more information see Section 5.3, “Directory Configuration”.
6.1.1.2. @Field
For each property (or attribute) of your entity, you have the ability to describe how it will be indexed. The default (no annotation present) means that the property is ignored by the indexing process.
@Field does declare a property as indexed and allows to configure several aspects of the indexing process by setting one or more of the following attributes:
name: describe under which name, the property should be stored in the Lucene Document. The default value is the property name (following the JavaBeans convention)store: describe whether or not the property is stored in the Lucene index. You can store the valueStore.YES(consuming more space in the index but allowing projection, see Section 7.1.10.5, “Projection”), store it in a compressed wayStore.COMPRESS(this does consume more CPU), or avoid any storageStore.NO(this is the default value). When a property is stored, you can retrieve its original value from the Lucene Document. This is not related to whether the element is indexed or not.index: describe whether the property is indexed or not. The different values areIndex.NO(no indexing, ie cannot be found by a query),Index.YES(the element gets indexed and is searchable). The default value isIndex.YES.Index.NOcan be useful for cases where a property is not required to be searchable, but should be available for projection.Note
Index.NOin combination withAnalyze.YESorNorms.YESis not useful, sinceanalyzeandnormsrequire the property to be indexedanalyze: determines whether the property is analyzed (Analyze.YES) or not (Analyze.NO). The default value isAnalyze.YES.Note
Whether or not you want to analyze a property depends on whether you wish to search the element as is, or by the words it contains. It make sense to analyze a text field, but probably not a date field.Note
Fields used for sorting must not be analyzed.norms: describes whether index time boosting information should be stored (Norms.YES) or not (Norms.NO). Not storing it can save a considerable amount of memory, but there won't be any index time boosting information available. The default value isNorms.YES.termVector: describes collections of term-frequency pairs. This attribute enables the storing of the term vectors within the documents during indexing. The default value isTermVector.NO.The different values of this attribute are:Value Definition TermVector.YES Store the term vectors of each document. This produces two synchronized arrays, one contains document terms and the other contains the term's frequency. TermVector.NO Do not store term vectors. TermVector.WITH_OFFSETS Store the term vector and token offset information. This is the same as TermVector.YES plus it contains the starting and ending offset position information for the terms. TermVector.WITH_POSITIONS Store the term vector and token position information. This is the same as TermVector.YES plus it contains the ordinal positions of each occurrence of a term in a document. TermVector.WITH_POSITION_OFFSETS Store the term vector, token position and offset information. This is a combination of the YES, WITH_OFFSETS and WITH_POSITIONS. indexNullAs: Per default null values are ignored and not indexed. However, usingindexNullAsyou can specify a string which will be inserted as token for thenullvalue. Per default this value is set toField.DO_NOT_INDEX_NULLindicating thatnullvalues should not be indexed. You can set this value toField.DEFAULT_NULL_TOKENto indicate that a defaultnulltoken should be used. This defaultnulltoken can be specified in the configuration usinghibernate.search.default_null_token. If this property is not set and you specifyField.DEFAULT_NULL_TOKENthe string "_null_" will be used as default.Note
When theindexNullAsparameter is used it is important to use the same token in the search query (see Chapter 7, Querying) to search fornullvalues. It is also advisable to use this feature only with un-analyzed fields ().analyze=Analyze.NOWarning
When implementing a customFieldBridgeorTwoWayFieldBridgeit is up to the developer to handle the indexing of null values (see JavaDocs ofLuceneOptions.indexNullAs()).
6.1.1.3. @NumericField
There is a companion annotation to
@Field called @NumericField that can be specified in the same scope as @Field or @DocumentId. It can be specified for Integer, Long, Float, and Double properties. At index time the value will be indexed using a Trie structure. When a property is indexed as numeric field, it enables efficient range query and sorting, orders of magnitude faster than doing the same query on standard @Field properties. The @NumericField annotation accept the following parameters:
| Value | Definition |
|---|---|
| forField | (Optional) Specify the name of the related @Field that will be indexed as numeric. It's only mandatory when the property contains more than a @Field declaration |
| precisionStep | (Optional) Change the way that the Trie structure is stored in the index. Smaller precisionSteps lead to more disk space usage and faster range and sort queries. Larger values lead to less space used and range query performance more close to the range query in normal @Fields. Default value is 4. |
@NumericField supports only Double, Long, Integer and Float. It is not possible to take any advantage from a similar functionality in Lucene for the other numeric types, so remaining types should use the string encoding via the default or custom TwoWayFieldBridge.
It is possible to use a custom
NumericFieldBridge assuming you can deal with the approximation during type transformation:
Example 6.2. Defining a custom NumericFieldBridge
public class BigDecimalNumericFieldBridge extends NumericFieldBridge { private static final BigDecimal storeFactor = BigDecimal.valueOf(100); @Override public void set(String name, Object value, Document document, LuceneOptions luceneOptions) { if ( value != null ) { BigDecimal decimalValue = (BigDecimal) value; Long indexedValue = Long.valueOf( decimalValue.multiply( storeFactor ).longValue() ); luceneOptions.addNumericFieldToDocument( name, indexedValue, document ); } } @Override public Object get(String name, Document document) { String fromLucene = document.get( name ); BigDecimal storedBigDecimal = new BigDecimal( fromLucene ); return storedBigDecimal.divide( storeFactor ); } }
6.1.1.4. @Id
Finally, the id property of an entity is a special property used by Hibernate Search to ensure index unicity of a given entity. By design, an id has to be stored and must not be tokenized. To mark a property as index id, use the
@DocumentId annotation. If you are using JPA and you have specified @Id you can omit @DocumentId. The chosen entity id will also be used as document id.
Example 6.3. Specifying indexed properties
@Entity @Indexed public class Essay { ... @Id @DocumentId public Long getId() { return id; } @Field(name="Abstract", store=Store.YES) public String getSummary() { return summary; } @Lob @Field public String getText() { return text; } @Field @NumericField( precisionStep = 6) public float getGrade() { return grade; } }
Example 6.3, “Specifying indexed properties” defines an index with four fields:
id , Abstract, text and grade . Note that by default the field name is decapitalized, following the JavaBean specification. The grade field is annotated as Numeric with a slightly larger precision step than the default.
6.1.2. Mapping Properties Multiple Times
Sometimes one has to map a property multiple times per index, with slightly different indexing strategies. For example, sorting a query by field requires the field to be un-analyzed. If one wants to search by words in this property and still sort it, one need to index it twice - once analyzed and once un-analyzed. @Fields allows to achieve this goal.
Example 6.4. Using @Fields to map a property multiple times
@Entity @Indexed(index = "Book" ) public class Book { @Fields( { @Field, @Field(name = "summary_forSort", analyze = Analyze.NO, store = Store.YES) } ) public String getSummary() { return summary; } ... }
In Example 6.4, “Using @Fields to map a property multiple times” the field
summary is indexed twice, once as summary in a tokenized way, and once as summary_forSort in an untokenized way. @Field supports 2 attributes useful when @Fields is used:
See below for more information about analyzers and field bridges.
6.1.3. Embedded and Associated Objects
Associated objects as well as embedded objects can be indexed as part of the root entity index. This is useful if you expect to search a given entity based on properties of associated objects. In Example 6.5, “Indexing associations” the aim is to return places where the associated city is Atlanta (In the Lucene query parser language, it would translate into
address.city:Atlanta). The place fields will be indexed in the Place index. The Place index documents will also contain the fields address.id, address.street, and address.city which you will be able to query.
Example 6.5. Indexing associations
@Entity @Indexed public class Place { @Id @GeneratedValue @DocumentId private Long id; @Field private String name; @OneToOne( cascade = { CascadeType.PERSIST, CascadeType.REMOVE } ) @IndexedEmbedded private Address address; .... } @Entity public class Address { @Id @GeneratedValue private Long id; @Field private String street; @Field private String city; @ContainedIn @OneToMany(mappedBy="address") private Set<Place> places; ... }
Be careful. Because the data is denormalized in the Lucene index when using the
@IndexedEmbedded technique, Hibernate Search needs to be aware of any change in the Place object and any change in the Address object to keep the index up to date. To make sure the Place Lucene document is updated when it's Address changes, you need to mark the other side of the bidirectional relationship with @ContainedIn.
Note
@ContainedIn is useful on both associations pointing to entities and on embedded (collection of) objects.
Let's make Example 6.5, “Indexing associations” a bit more complex by nesting @IndexedEmbedded as seen in Example 6.6, “Nested usage of
@IndexedEmbedded and @ContainedIn”.
Example 6.6. Nested usage of @IndexedEmbedded and @ContainedIn
@Entity @Indexed public class Place { @Id @GeneratedValue @DocumentId private Long id; @Field private String name; @OneToOne( cascade = { CascadeType.PERSIST, CascadeType.REMOVE } ) @IndexedEmbedded private Address address; .... } @Entity public class Address { @Id @GeneratedValue private Long id; @Field private String street; @Field private String city; @IndexedEmbedded(depth = 1, prefix = "ownedBy_") private Owner ownedBy; @ContainedIn @OneToMany(mappedBy="address") private Set<Place> places; ... } @Embeddable public class Owner { @Field private String name; ... }
As you can see, any
@*ToMany, @*ToOne and @Embedded attribute can be annotated with @IndexedEmbedded. The attributes of the associated class will then be added to the main entity index. In Example 6.6, “Nested usage of @IndexedEmbedded and @ContainedIn” the index will contain the following fields:
- id
- name
- address.street
- address.city
- address.ownedBy_name
The default prefix is
propertyName., following the traditional object navigation convention. You can override it using the prefix attribute as it is shown on the ownedBy property.
Note
The prefix cannot be set to the empty string.
The
depth property is necessary when the object graph contains a cyclic dependency of classes (not instances). For example, if Owner points to Place. Hibernate Search will stop including Indexed embedded attributes after reaching the expected depth (or the object graph boundaries are reached). A class having a self reference is an example of cyclic dependency. In our example, because depth is set to 1, any @IndexedEmbedded attribute in Owner (if any) will be ignored.
Using
@IndexedEmbedded for object associations allows you to express queries (using Lucene's query syntax) such as:
- Return places where name contains JBoss and where address city is Atlanta. In Lucene query this would be
+name:jboss +address.city:atlanta
- Return places where name contains JBoss and where owner's name contain Joe. In Lucene query this would be
+name:jboss +address.orderBy_name:joe
In a way it mimics the relational join operation in a more efficient way (at the cost of data duplication). Remember that, out of the box, Lucene indexes have no notion of association, the join operation is simply non-existent. It might help to keep the relational model normalized while benefiting from the full text index speed and feature richness.
Note
An associated object can itself (but does not have to) be
@Indexed
When @IndexedEmbedded points to an entity, the association has to be directional and the other side has to be annotated
@ContainedIn (as seen in the previous example). If not, Hibernate Search has no way to update the root index when the associated entity is updated (in our example, a Place index document has to be updated when the associated Address instance is updated).
Sometimes, the object type annotated by
@IndexedEmbedded is not the object type targeted by Hibernate and Hibernate Search. This is especially the case when interfaces are used in lieu of their implementation. For this reason you can override the object type targeted by Hibernate Search using the targetElement parameter.
Example 6.7. Using the targetElement property of @IndexedEmbedded
@Entity @Indexed public class Address { @Id @GeneratedValue @DocumentId private Long id; @Field private String street; @IndexedEmbedded(depth = 1, prefix = "ownedBy_", targetElement = Owner.class) @Target(Owner.class) private Person ownedBy; ... } @Embeddable public class Owner implements Person { ... }
6.1.4. Limiting Object Embedding to Specific Paths
The
@IndexedEmbedded annotation provides also an attribute includePaths which can be used as an alternative to depth, or be combined with it.
When using only
depth all indexed fields of the embedded type will be added recursively at the same depth; this makes it harder to pick only a specific path without adding all other fields as well, which might not be needed.
To avoid unnecessarily loading and indexing entities you can specify exactly which paths are needed. A typical application might need different depths for different paths, or in other words it might need to specify paths explicitly, as shown in Example 6.8, “Using the
includePaths property of @IndexedEmbedded”
Example 6.8. Using the includePaths property of @IndexedEmbedded
@Entity @Indexed public class Person { @Id public int getId() { return id; } @Field public String getName() { return name; } @Field public String getSurname() { return surname; } @OneToMany @IndexedEmbedded(includePaths = { "name" }) public Set<Person> getParents() { return parents; } @ContainedIn @ManyToOne public Human getChild() { return child; } ...//other fields omitted
Using a mapping as in Example 6.8, “Using the
includePaths property of @IndexedEmbedded”, you would be able to search on a Person by name and/or surname, and/or the name of the parent. It will not index the surname of the parent, so searching on parent's surnames will not be possible but speeds up indexing, saves space and improve overall performance.
The
@IndexedEmbeddedincludePaths will include the specified paths in addition to what you would index normally specifying a limited value for depth. When using includePaths, and leaving depth undefined, behavior is equivalent to setting depth=0: only the included paths are indexed.
Example 6.9. Using the includePaths property of @IndexedEmbedded
@Entity @Indexed public class Human { @Id public int getId() { return id; } @Field public String getName() { return name; } @Field public String getSurname() { return surname; } @OneToMany @IndexedEmbedded(depth = 2, includePaths = { "parents.parents.name" }) public Set<Human> getParents() { return parents; } @ContainedIn @ManyToOne public Human getChild() { return child; } ...//other fields omitted
In Example 6.9, “Using the
includePaths property of @IndexedEmbedded”, every human will have it's name and surname attributes indexed. The name and surname of parents will be indexed too, recursively up to second line because of the depth attribute. It will be possible to search by name or surname, of the person directly, his parents or of his grand parents. Beyond the second level, we will in addition index one more level but only the name, not the surname.
This results in the following fields in the index:
id- as primary key_hibernate_class- stores entity typename- as direct fieldsurname- as direct fieldparents.name- as embedded field at depth 1parents.surname- as embedded field at depth 1parents.parents.name- as embedded field at depth 2parents.parents.surname- as embedded field at depth 2parents.parents.parents.name- as additional path as specified byincludePaths. The firstparents.is inferred from the field name, the remaining path is the attribute ofincludePaths
Having explicit control of the indexed paths might be easier if you're designing your application by defining the needed queries first, as at that point you might know exactly which fields you need, and which other fields are unnecessary to implement your use case.