Chapter 16. Mapping Domain Objects to the Index Structure
16.1. Basic Mapping
16.1.1. Basic Mapping
In Red Hat JBoss Data Grid, the identifier for all @Indexed objects is the key used to store the value. How the key is indexed can still be customized by using a combination of @Transformable, @ProvidedId, custom types and custom FieldBridge implementations.
The @DocumentId identifier does not apply to JBoss Data Grid values.
The Lucene-based Query API uses the following common annotations to map entities:
- @Indexed
- @Field
- @NumericField
16.1.2. @Indexed
The @Indexed annotation declares a cached entry indexable. All entries not annotated with @Indexed are ignored.
Making a class indexable with @Indexed
@Indexed
public class Essay {
}
Optionally, specify the index attribute of the @Indexed annotation to change the default name of the index.
16.1.3. @Field
Each property or attribute of an entity can be indexed. Properties and attributes are not annotated by default, and therefore are ignored by the indexing process. The @Field annotation declares a property as indexed and allows the configuration of several aspects of the indexing process by setting one or more of the following attributes:
name- The name under which the property will be stored in the Lucene Document. By default, this attribute is the same as the property name, following the JavaBeans convention.
storeSpecifies if the property is stored in the Lucene index. When a property is stored it can be retrieved in its original value from the Lucene Document. This is regardless of whether or not the element is indexed. Valid options are:
-
Store.YES: Consumes more index space but allows projection. See Projection. -
Store.COMPRESS: Stores the property as compressed. This attribute consumes more CPU. -
Store.NO: No storage. This is the default setting for the store attribute.
-
indexDescribes if property is indexed or not. The following values are applicable:
-
Index.NO: No indexing is applied; cannot be found by querying. This setting is used for properties that are not required to be searchable, but are able to be projected. -
Index.YES: The element is indexed and is searchable. This is the default setting for the index attribute.
-
analyzeDetermines if the property is analyzed. The analyze attribute allows a property to be searched by its contents. For example, it may be worthwhile to analyze a text field, whereas a date field does not need to be analyzed. Enable or disable the Analyze attribute using the following:
-
Analyze.YES -
Analyze.NO
-
The analyze attribute is enabled by default. The Analyze.YES setting requires the property to be indexed via the Index.YES attribute.
The following attributes are used for sorting, and must not be analyzed.
normsDetermines whether or not to store index time boosting information. Valid settings are:
-
Norms.YES -
Norms.NO
-
The default for this attribute is Norms.YES. Disabling norms conserves memory, however no index time boosting information will be available.
termVectorDescribes collections of term-frequency pairs. This attribute enables the storing of the term vectors within the documents during indexing. The default value is
TermVector.NO. Available settings for this attribute are:-
TermVector.YES: Stores the term vectors of each document. This produces two synchronized arrays, one contains document terms and the other contains the term’s frequency. -
TermVector.NO: Does not store term vectors. -
TermVector.WITH_OFFSETS: Stores the term vector and token offset information. This is the same asTermVector.YESplus it contains the starting and ending offset position information for the terms. -
TermVector.WITH_POSITIONS: Stores the term vector and token position information. This is the same asTermVector.YESplus it contains the ordinal positions of each occurrence of a term in a document. -
TermVector.WITH_POSITION_OFFSETS: Stores the term vector, token position and offset information. This is a combination of theYES,WITH_OFFSETS, andWITH_POSITIONS.
-
indexNullAsBy default, null values are ignored and not indexed. However, using
indexNullAspermits specification of a string to be inserted as token for the null value. When using theindexNullAsparameter, use the same token in the search query to search for null value. Use this feature only withAnalyze.NO. Valid settings for this attribute are:-
Field.DO_NOT_INDEX_NULL: This is the default value for this attribute. This setting indicates that null values will not be indexed. -
Field.DEFAULT_NULL_TOKEN: Indicates that a default null token is used. This default null token can be specified in the configuration using the default_null_token property. If this property is not set andField.DEFAULT_NULL_TOKENis specified, the string "null" will be used as default.
-
When implementing a custom FieldBridge or TwoWayFieldBridge it is up to the developer to handle the indexing of null values (see JavaDocs of LuceneOptions.indexNullAs()).
16.1.4. @NumericField
The @NumericField annotation can be specified in the same scope as @Field.
The @NumericField annotation can be specified for Integer, Long, Float, and Double properties. At index time the value will be indexed using a Trie structure. When a property is indexed as numeric field, it enables efficient range query and sorting, orders of magnitude faster than doing the same query on standard @Field properties. The @NumericField annotation accept the following optional parameters:
-
forField: Specifies the name of the related@Fieldthat will be indexed as numeric. It is mandatory when a property contains more than a@Fielddeclaration. -
precisionStep: Changes the way that the Trie structure is stored in the index. SmallerprecisionStepslead to more disk space usage, and faster range and sort queries. Larger values lead to less space used, and range query performance closer to the range query in normal@Fields. The default value forprecisionStepis 4.
@NumericField supports only Double, Long, Integer, and Float. It is not possible to take any advantage from a similar functionality in Lucene for the other numeric types, therefore remaining types must use the string encoding via the default or custom TwoWayFieldBridge.
Custom NumericFieldBridge can also be used. Custom configurations require approximation during type transformation. The following is an example defines a custom NumericFieldBridge.
Defining a custom NumericFieldBridge
public class BigDecimalNumericFieldBridge extends NumericFieldBridge {
private static final BigDecimal storeFactor = BigDecimal.valueOf(100);
@Override
public void set(String name,
Object value,
Document document,
LuceneOptions luceneOptions) {
if (value != null) {
BigDecimal decimalValue = (BigDecimal) value;
Long indexedValue = Long.valueOf(
decimalValue
.multiply(storeFactor)
.longValue());
luceneOptions.addNumericFieldToDocument(name, indexedValue, document);
}
}
@Override
public Object get(String name, Document document) {
String fromLucene = document.get(name);
BigDecimal storedBigDecimal = new BigDecimal(fromLucene);
return storedBigDecimal.divide(storeFactor);
}
}
16.2. Mapping Properties Multiple Times
Properties may need to be mapped multiple times per index, using different indexing strategies. For example, sorting a query by field requires that the field is not analyzed. To search by words in this property and also sort it, the property will need to be indexed it twice - once analyzed and once un-analyzed. @Fields can be used to perform this search. For example:
Using @Fields to map a property multiple times
@Indexed(index = "Book")
public class Book {
@Fields( {
@Field,
@Field(name = "summary_forSort", analyze = Analyze.NO, store = Store.YES)
})
public String getSummary() {
return summary;
}
}
In the example above, the field summary is indexed twice - once as summary in a tokenized way, and once as summary_forSort in an untokenized way. @Field supports 2 attributes useful when @Fields is used:
- analyzer: defines a @Analyzer annotation per field rather than per property
- bridge: defines a @FieldBridge annotation per field rather than per property
16.3. Embedded and Associated Objects
16.3.1. Embedded and Associated Objects
Associated objects and embedded objects can be indexed as part of the root entity index. This allows searches of an entity based on properties of associated objects.
16.3.2. Indexing Associated Objects
The aim of the following example is to return places where the associated city is Atlanta via the Lucene query address.city:Atlanta. The place fields are indexed in the Place index. The Place index documents also contain the following fields:
-
address.street -
address.city
These fields are also able to be queried.
Indexing associations
@Indexed
public class Place {
@Field
private String name;
@IndexedEmbedded
@ManyToOne(cascade = {CascadeType.PERSIST, CascadeType.REMOVE})
private Address address;
}
public class Address {
@Field
private String street;
@Field
private String city;
@ContainedIn
@OneToMany(mappedBy = "address")
private Set<Place> places;
}
16.3.3. @IndexedEmbedded
When using the @IndexedEmbedded technique, data is denormalized in the Lucene index. As a result, the Lucene-based Query API must be updated with any changes in the Place and Address objects to keep the index up to date. Ensure the Place Lucene document is updated when its Address changes by marking the other side of the bidirectional relationship with @ContainedIn. @ContainedIn can be used for both associations pointing to entities and on embedded objects.
The @IndexedEmbedded annotation can be nested. Attributes can be annotated with @IndexedEmbedded. The attributes of the associated class are then added to the main entity index. In the following example, the index will contain the following fields:
- name
- address.street
- address.city
- address.ownedBy_name
Nested usage of @IndexedEmbedded and @ContainedIn
@Indexed
public class Place {
@Field
private String name;
@IndexedEmbedded
@ManyToOne(cascade = {CascadeType.PERSIST, CascadeType.REMOVE})
private Address address;
}
public class Address {
@Field
private String street;
@Field
private String city;
@IndexedEmbedded(depth = 1, prefix = "ownedBy_")
private Owner ownedBy;
@ContainedIn
@OneToMany(mappedBy = "address")
private Set<Place> places;
}
public class Owner {
@Field
private String name;
}
The default prefix is propertyName, following the traditional object navigation convention. This can be overridden using the prefix attribute as it is shown on the ownedBy property.
The prefix cannot be set to the empty string.
The depth property is used when the object graph contains a cyclic dependency of classes. For example, if Owner points to Place. the Query Module stops including attributes after reaching the expected depth, or object graph boundaries. A self-referential class is an example of cyclic dependency. In the provided example, because depth is set to 1, any @IndexedEmbedded attribute in Owner is ignored.
Using @IndexedEmbedded for object associations allows queries to be expressed using Lucene’s query syntax. For example:
Return places where name contains JBoss and where address city is Atlanta. In Lucene query this is:
+name:jboss +address.city:atlanta
Return places where name contains JBoss and where owner’s name contain Joe. In Lucene query this is:
+name:jboss +address.ownedBy_name:joe
This operation is similar to the relational join operation, without data duplication. Out of the box, Lucene indexes have no notion of association; the join operation does not exist. It may be beneficial to maintain the normalized relational model while benefiting from the full text index speed and feature richness.
An associated object can be also be @Indexed. When @IndexedEmbedded points to an entity, the association must be directional and the other side must be annotated using @ContainedIn. If not, the Lucene-based Query API cannot update the root index when the associated entity is updated. In the provided example, a Place index document is updated when the associated Address instance updates.
16.3.4. The targetElement Property
It is possible to override the object type targeted using the targetElement parameter. This method can be used when the object type annotated by @IndexedEmbedded is not the object type targeted by the data grid and the Lucene-based Query API. This occurs when interfaces are used instead of their implementation.
Using the targetElement property of @IndexedEmbedded
@Indexed
public class Address {
@Field
private String street;
@IndexedEmbedded(depth = 1, prefix = "ownedBy_", targetElement = Owner.class)
private Person ownedBy;
...
}
public class Owner implements Person { ... }
16.4. Boosting
16.4.1. Boosting
Lucene uses boosting to attach more importance to specific fields or documents over others. Lucene differentiates between index and search-time boosting.
16.4.2. Static Index Time Boosting
The @Boost annotation is used to define a static boost value for an indexed class or property. This annotation can be used within @Field, or can be specified directly on the method or class level.
In the following example:
- the probability of Essay reaching the top of the search list will be multiplied by 1.7.
-
@Field.boostand@Booston a property are cumulative, therefore the summary field will be 3.0 (2 x 1.5), and more important than the ISBN field. - The text field is 1.2 times more important than the ISBN field.
Different ways of using @Boost
@Indexed
@Boost(1.7f)
public class Essay {
@Field(name = "Abstract", store=Store.YES, boost = @Boost(2f))
@Boost(1.5f)
public String getSummary() { return summary; }
@Field(boost = @Boost(1.2f))
public String getText() { return text; }
@Field
public String getISBN() { return isbn; }
}
16.4.3. Dynamic Index Time Boosting
The @Boost annotation defines a static boost factor that is independent of the state of the indexed entity at runtime. However, in some cases the boost factor may depend on the actual state of the entity. In this case, use the @DynamicBoost annotation together with an accompanying custom BoostStrategy.
@Boost and @DynamicBoost annotations can both be used in relation to an entity, and all defined boost factors are cumulative. The @DynamicBoost can be placed at either class or field level.
In the following example, a dynamic boost is defined on class level specifying VIPBoostStrategy as implementation of the BoostStrategy interface used at indexing time. Depending on the annotation placement, either the whole entity is passed to the defineBoost method or only the annotated field/property value. The passed object must be cast to the correct type.
Dynamic boost example
public enum PersonType {
NORMAL,
VIP
}
@Indexed
@DynamicBoost(impl = VIPBoostStrategy.class)
public class Person {
private PersonType type;
}
public class VIPBoostStrategy implements BoostStrategy {
public float defineBoost(Object value) {
Person person = (Person) value;
if (person.getType().equals(PersonType.VIP)) {
return 2.0f;
}
else {
return 1.0f;
}
}
}
In the provided example all indexed values of a VIP would be twice the importance of the values of a non-VIP.
The specified BoostStrategy implementation must define a public no argument constructor.
16.5. Analysis
16.5.1. Analysis
16.5.2. Default Analyzer and Analyzer by Class
The default analyzer class is used to index tokenized fields, and is configurable through the default.analyzer property. The default value for this property is org.apache.lucene.analysis.standard.StandardAnalyzer.
The analyzer class can be defined per entity, property, and per @Field, which is useful when multiple fields are indexed from a single property.
In the following example, EntityAnalyzer is used to index all tokenized properties, such as name except, summary and body, which are indexed with PropertyAnalyzer and FieldAnalyzer respectively.
Different ways of using @Analyzer
@Indexed
@Analyzer(impl = EntityAnalyzer.class)
public class MyEntity {
@Field
private String name;
@Field
@Analyzer(impl = PropertyAnalyzer.class)
private String summary;
@Field(analyzer = @Analyzer(impl = FieldAnalyzer.class))
private String body;
}
Avoid using different analyzers on a single entity. Doing so can create complications in building queries, and make results less predictable, particularly if using a QueryParser. Use the same analyzer for indexing and querying on any field.
16.5.3. Named Analyzers
The Query Module uses analyzer definitions to deal with the complexity of the Analyzer function. Analyzer definitions are reusable by multiple @Analyzer declarations and includes the following:
- a name: the unique string used to refer to the definition.
-
a list of
CharFilters: eachCharFilteris responsible to pre-process input characters before the tokenization.CharFilterscan add, change, or remove characters. One common usage is for character normalization. -
a
Tokenizer: responsible for tokenizing the input stream into individual words. -
a list of filters: each filter is responsible to remove, modify, or sometimes add words into the stream provided by the
Tokenizer.
The Analyzer separates these components into multiple tasks, allowing individual components to be reused and components to be built with flexibility using the following procedure:
The Analyzer Process
-
The
CharFiltersprocess the character input. -
Tokenizerconverts the character input into tokens. -
The tokens are the processed by the
TokenFilters.
The Lucene-based Query API supports this infrastructure by utilizing the Solr analyzer framework.
16.5.4. Analyzer Definitions
Once defined, an analyzer definition can be reused by an @Analyzer annotation.
Referencing an analyzer by name
@Indexed
@AnalyzerDef(name = "customanalyzer")
public class Team {
@Field
private String name;
@Field
private String location;
@Field
@Analyzer(definition = "customanalyzer")
private String description;
}
Analyzer instances declared by @AnalyzerDef are also available by their name in the SearchFactory, which is useful when building queries.
Analyzer analyzer = Search.getSearchManager(cache).getAnalyzer("customanalyzer")When querying, fields must use the same analyzer that has been used to index the field. The same tokens are reused between the query and the indexing process.
16.5.5. @AnalyzerDef for Solr
When using Maven all required Apache Solr dependencies are now defined as dependencies of the artifact org.hibernate:hibernate-search-analyzers. Add the following dependency:
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search-analyzers</artifactId>
<version>${version.hibernate.search}</version>
<dependency>
In the following example, a CharFilter is defined by its factory. In this example, a mapping char filter is used, which will replace characters in the input based on the rules specified in the mapping file. Finally, a list of filters is defined by their factories. In this example, the StopFilter filter is built reading the dedicated words property file. The filter will ignore case.
@AnalyzerDef and the Solr framework
Configure the CharFilter
Define a
CharFilterby factory. In this example, a mappingCharFilteris used, which will replace characters in the input based on the rules specified in the mapping file.@AnalyzerDef(name = "customanalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "org/hibernate/search/test/analyzer/solr/mapping-chars.properties") }) },Define the Tokenizer
A
Tokenizeris then defined using theStandardTokenizerFactory.class.@AnalyzerDef(name = "customanalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "org/hibernate/search/test/analyzer/solr/mapping-chars.properties") }) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class)List of Filters
Define a list of filters by their factories. In this example, the
StopFilterfilter is built reading the dedicated words property file. The filter will ignore case.@AnalyzerDef(name = "customanalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "org/hibernate/search/test/analyzer/solr/mapping-chars.properties") }) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class), @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = StopFilterFactory.class, params = { @Parameter(name = "words", value= "org/hibernate/search/test/analyzer/solr/stoplist.properties" ), @Parameter(name = "ignoreCase", value = "true") }) }) public class Team { }
Filters and CharFilters are applied in the order they are defined in the @AnalyzerDef annotation.
16.5.6. Loading Analyzer Resources
Tokenizers, TokenFilters, and CharFilters can load resources such as configuration or metadata files using the StopFilterFactory.class or the synonym filter. The virtual machine default can be explicitly specified by adding a resource_charset parameter.
Use a specific charset to load the property file
@AnalyzerDef(name = "customanalyzer",
charFilters = {
@CharFilterDef(factory = MappingCharFilterFactory.class, params = {
@Parameter(name = "mapping",
value =
"org/hibernate/search/test/analyzer/solr/mapping-chars.properties")
})
},
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = ISOLatin1AccentFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class, params = {
@Parameter(name="words",
value= "org/hibernate/search/test/analyzer/solr/stoplist.properties"),
@Parameter(name = "resource_charset", value = "UTF-16BE"),
@Parameter(name = "ignoreCase", value = "true")
})
})
public class Team {
}
16.5.7. Dynamic Analyzer Selection
The Query Module uses the @AnalyzerDiscriminator annotation to enable the dynamic analyzer selection.
An analyzer can be selected based on the current state of an entity that is to be indexed. This is particularly useful in multilingual applications. For example, when using the BlogEntry class, the analyzer can depend on the language property of the entry. Depending on this property, the correct language-specific stemmer can then be chosen to index the text.
An implementation of the Discriminator interface must return the name of an existing Analyzer definition, or null if the default analyzer is not overridden.
The following example assumes that the language parameter is either 'de' or 'en', which is specified in the @AnalyzerDefs.
Configure the @AnalyzerDiscriminator
Predefine Dynamic Analyzers
The
@AnalyzerDiscriminatorrequires that all analyzers that are to be used dynamically are predefined via@AnalyzerDef. The@AnalyzerDiscriminatorannotation can then be placed either on the class, or on a specific property of the entity, in order to dynamically select an analyzer. An implementation of theDiscriminatorinterface can be specified using the@AnalyzerDiscriminatorimplparameter.@Indexed @AnalyzerDefs({ @AnalyzerDef(name = "en", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = EnglishPorterFilterFactory.class) }), @AnalyzerDef(name = "de", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = GermanStemFilterFactory.class) }) }) public class BlogEntry { @Field @AnalyzerDiscriminator(impl = LanguageDiscriminator.class) private String language; @Field private String text; private Set<BlogEntry> references; // standard getter/setter }Implement the Discriminator Interface
Implement the
getAnalyzerDefinitionName()method, which is called for each field added to the Lucene document. The entity being indexed is also passed to the interface method.The
valueparameter is set if the@AnalyzerDiscriminatoris placed on the property level instead of the class level. In this example, the value represents the current value of this property.public class LanguageDiscriminator implements Discriminator { public String getAnalyzerDefinitionName(Object value, Object entity, String field) { if (value == null || !(entity instanceof Article)) { return null; } return (String) value; } }
16.5.8. Retrieving an Analyzer
Retrieving an analyzer can be used when multiple analyzers have been used in a domain model, in order to benefit from stemming or phonetic approximation, etc. In this case, use the same analyzers to building a query. Alternatively, use the Lucene-based Query API, which selects the correct analyzer automatically. See Building a Lucene Query.
The scoped analyzer for a given entity can be retrieved using either the Lucene programmatic API or the Lucene query parser. A scoped analyzer applies the right analyzers depending on the field indexed. Multiple analyzers can be defined on a given entity, each working on an individual field. A scoped analyzer unifies these analyzers into a context-aware analyzer.
In the following example, the song title is indexed in two fields:
-
Standard analyzer: used in the
titlefield. -
Stemming analyzer: used in the
title_stemmedfield.
Using the analyzer provided by the search factory, the query uses the appropriate analyzer depending on the field targeted.
Using the scoped analyzer when building a full-text query
SearchManager manager = Search.getSearchManager(cache);
org.apache.lucene.queryparser.classic.QueryParser parser = new QueryParser(
org.apache.lucene.util.Version.LUCENE_5_5_1,
"title",
manager.getAnalyzer(Song.class)
);
org.apache.lucene.search.Query luceneQuery =
parser.parse("title:sky Or title_stemmed:diamond");
// wrap Lucene query in a org.infinispan.query.CacheQuery
CacheQuery cacheQuery = manager.getQuery(luceneQuery, Song.class);
List result = cacheQuery.list();
//return the list of matching objects
Analyzers defined via @AnalyzerDef can also be retrieved by their definition name using searchManager.getAnalyzer(String).
16.5.9. Available Analyzers
Apache Solr and Lucene ship with a number of default CharFilters, tokenizers, and filters. A complete list of CharFilter, tokenizer, and filter factories is available at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters. The following tables provide some example CharFilters, tokenizers, and filters.
Table 16.1. Example of available CharFilters
| Factory | Description | Parameters | Additional dependencies |
|---|---|---|---|
|
| Replaces one or more characters with one or more characters, based on mappings specified in the resource file |
"á" => "a"
"ñ" => "n"
"ø" => "o"
| none |
|
| Remove HTML standard tags, keeping the text | none | none |
Table 16.2. Example of available tokenizers
| Factory | Description | Parameters | Additional dependencies |
|---|---|---|---|
|
| Use the Lucene StandardTokenizer | none | none |
|
| Remove HTML tags, keep the text and pass it to a StandardTokenizer. | none |
|
|
| Breaks text at the specified regular expression pattern. |
group: says which pattern group to extract into tokens |
|
Table 16.3. Examples of available filters
| Factory | Description | Parameters | Additional dependencies |
|---|---|---|---|
|
| Remove dots from acronyms and 's from words | none |
|
|
| Lowercases all words | none |
|
|
| Remove words (tokens) matching a list of stop words |
ignoreCase: true if |
|
|
| Reduces a word to it’s root in a given language. (example: protect, protects, protection share the same root). Using such a filter allows searches matching related words. |
|
|
|
| Remove accents for languages like French | none |
|
|
| Inserts phonetically similar tokens into the token stream |
inject:
|
|
|
|
Converts each token into its |
|
|
16.6. Bridge
16.6.1. Bridges
When mapping entities, Lucene represents all index fields as strings. All entity properties annotated with @Field are converted to strings to be indexed. Built-in bridges automatically translates properties for the Lucene-based Query API. The bridges can be customized to gain control over the translation process.
16.6.2. Built-in Bridges
The Lucene-based Query API includes a set of built-in bridges between a Java property type and its full text representation.
- null
-
Per default
nullelements are not indexed. Lucene does not support null elements. However, in some situation it can be useful to insert a custom token representing thenullvalue. See @Field for more information. - java.lang.String
Strings are indexed, as are:
-
short,Short -
integer,Integer -
long,Long -
float,Float -
double,Double -
BigInteger -
BigDecimal
Numbers are converted into their string representation. Note that numbers cannot be compared by Lucene, or used in ranged queries out of the box, and must be padded
-
Using a Range query has disadvantages. An alternative approach is to use a Filter query which will filter the result query to the appropriate range.
The Query Module supports using a custom StringBridge. See Custom Bridges.
- java.util.Date
Dates are stored as
yyyyMMddHHmmssSSSin GMT time (200611072203012 for Nov 7th of 2006 4:03PM and 12ms EST). When using aTermRangeQuery, dates are expressed in GMT.@DateBridgedefines the appropriate resolution to store in the index, for example:@DateBridge(resolution=Resolution.DAY). The date pattern will then be truncated accordingly.@Indexed public class Meeting { @Field(analyze=Analyze.NO) @DateBridge(resolution=Resolution.MINUTE) private Date date;The default
Datebridge uses Lucene’sDateToolsto convert from and toString. All dates are expressed in GMT time. Implement a custom date bridge in order to store dates in a fixed time zone.- java.net.URI, java.net.URL
- URI and URL are converted to their string representation
- java.lang.Class
- Class are converted to their fully qualified class name. The thread context classloader is used when the class is rehydrated
16.6.3. Custom Bridges
16.6.3.1. Custom Bridges
Custom bridges are available in situations where built-in bridges, or the bridge’s String representation, do not sufficiently address the required property types.
16.6.3.2. FieldBridge
For improved flexibility, a bridge can be implemented as a FieldBridge. The FieldBridge interface provides a property value, which can then be mapped in the Lucene Document. For example, a property can be stored in two different document fields.
Implementing the FieldBridge Interface
public class DateSplitBridge implements FieldBridge {
private final static TimeZone GMT = TimeZone.getTimeZone("GMT");
public void set(String name,
Object value,
Document document,
LuceneOptions luceneOptions) {
Date date = (Date) value;
Calendar cal = GregorianCalendar.getInstance(GMT);
cal.setTime(date);
int year = cal.get(Calendar.YEAR);
int month = cal.get(Calendar.MONTH) + 1;
int day = cal.get(Calendar.DAY_OF_MONTH);
// set year
luceneOptions.addFieldToDocument(
name + ".year",
String.valueOf(year),
document);
// set month and pad it if needed
luceneOptions.addFieldToDocument(
name + ".month",
month < 10 ? "0" : "" + String.valueOf(month),
document);
// set day and pad it if needed
luceneOptions.addFieldToDocument(
name + ".day",
day < 10 ? "0" : "" + String.valueOf(day),
document);
}
}
//property
@FieldBridge(impl = DateSplitBridge.class)
private Date date;
In the following example, the fields are not added directly to the Lucene Document. Instead the addition is delegated to the LuceneOptions helper. The helper will apply the options selected on @Field, such as Store or TermVector, or apply the chosen @Boost value.
It is recommended that LuceneOptions is delegated to add fields to the Document, however the Document can also be edited directly, ignoring the LuceneOptions.
LuceneOptions shields the application from changes in Lucene API and simplifies the code.
16.6.3.3. StringBridge
Use the org.infinispan.query.bridge.StringBridge interface to provide the Lucene-based Query API with an implementation of the expected Object to String bridge, or StringBridge. All implementations are used concurrently, and therefore must be thread-safe.
Custom StringBridge implementation
/**
* Padding Integer bridge.
* All numbers will be padded with 0 to match 5 digits
*
* @author Emmanuel Bernard
*/
public class PaddedIntegerBridge implements StringBridge {
private int PADDING = 5;
public String objectToString(Object object) {
String rawInteger = ((Integer) object).toString();
if (rawInteger.length() > PADDING)
throw new IllegalArgumentException("Try to pad on a number too big");
StringBuilder paddedInteger = new StringBuilder();
for (int padIndex = rawInteger.length() ; padIndex < PADDING ; padIndex++) {
paddedInteger.append('0');
}
return paddedInteger.append(rawInteger).toString();
}
}
The @FieldBridge annotation allows any property or field in the provided example to use the bridge:
@FieldBridge(impl = PaddedIntegerBridge.class) private Integer length;
16.6.3.4. Two-Way Bridge
A TwoWayStringBridge is an extended version of a StringBridge, which can be used when the bridge implementation is used on an ID property. The Lucene-based Query API reads the string representation of the identifier and uses it to generate an object. The @FieldBridge annotation is used in the same way.
Implementing a TwoWayStringBridge for ID Properties
public class PaddedIntegerBridge implements TwoWayStringBridge, ParameterizedBridge {
public static String PADDING_PROPERTY = "padding";
private int padding = 5; //default
public void setParameterValues(Map parameters) {
Object padding = parameters.get(PADDING_PROPERTY);
if (padding != null) this.padding = (Integer) padding;
}
public String objectToString(Object object) {
String rawInteger = ((Integer) object).toString();
if (rawInteger.length() > padding)
throw new IllegalArgumentException("Try to pad on a number too big");
StringBuilder paddedInteger = new StringBuilder();
for (int padIndex = rawInteger.length(); padIndex < padding; padIndex++) {
paddedInteger.append('0');
}
return paddedInteger.append(rawInteger).toString();
}
public Object stringToObject(String stringValue) {
return new Integer(stringValue);
}
}
@FieldBridge(impl = PaddedIntegerBridge.class,
params = @Parameter(name = "padding", value = "10"))
private Integer id;
The two-way process must be idempotent (ie object = stringToObject(objectToString(object))).
16.6.3.5. Parameterized Bridge
A ParameterizedBridge interface passes parameters to the bridge implementation, making it more flexible. The ParameterizedBridge interface can be implemented by StringBridge, TwoWayStringBridge, FieldBridge implementations. All implementations must be thread-safe.
The following example implements a ParameterizedBridge interface, with parameters passed through the @FieldBridge annotation.
Configure the ParameterizedBridge Interface
public class PaddedIntegerBridge implements StringBridge, ParameterizedBridge {
public static String PADDING_PROPERTY = "padding";
private int padding = 5; //default
public void setParameterValues(Map <String,String> parameters) {
String padding = parameters.get(PADDING_PROPERTY);
if (padding != null) this.padding = Integer.parseInt(padding);
}
public String objectToString(Object object) {
String rawInteger = ((Integer) object).toString();
if (rawInteger.length() > padding)
throw new IllegalArgumentException("Try to pad on a number too big");
StringBuilder paddedInteger = new StringBuilder();
for (int padIndex = rawInteger.length() ; padIndex < padding ; padIndex++) {
paddedInteger.append('0');
}
return paddedInteger.append(rawInteger).toString();
}
}
//property
@FieldBridge(impl = PaddedIntegerBridge.class,
params = @Parameter(name = "padding", value = "10")
)
private Integer length;
16.6.3.6. Type Aware Bridge
Any bridge implementing AppliedOnTypeAwareBridge will get the type the bridge is applied on injected. For example:
- the return type of the property for field/getter-level bridges.
- the class type for class-level bridges.
The type injected does not have any specific thread-safety requirements.
16.6.3.7. ClassBridge
More than one property of an entity can be combined and indexed in a specific way to the Lucene index using the @ClassBridge annotation. @ClassBridge can be defined at class level, and supports the termVector attribute.
In the following example, the custom FieldBridge implementation receives the entity instance as the value parameter, rather than a particular property. The particular CatFieldsClassBridge is applied to the department instance.The FieldBridge then concatenates both branch and network, and indexes the concatenation.
Implementing a ClassBridge
@Indexed
@ClassBridge(name = "branchnetwork",
store = Store.YES,
impl = CatFieldsClassBridge.class,
params = @Parameter(name = "sepChar", value = ""))
public class Department {
private int id;
private String network;
private String branchHead;
private String branch;
private Integer maxEmployees;
}
public class CatFieldsClassBridge implements FieldBridge, ParameterizedBridge {
private String sepChar;
public void setParameterValues(Map parameters) {
this.sepChar = (String) parameters.get("sepChar");
}
public void set(String name,
Object value,
Document document,
LuceneOptions luceneOptions) {
Department dep = (Department) value;
String fieldValue1 = dep.getBranch();
if (fieldValue1 == null) {
fieldValue1 = "";
}
String fieldValue2 = dep.getNetwork();
if (fieldValue2 == null) {
fieldValue2 = "";
}
String fieldValue = fieldValue1 + sepChar + fieldValue2;
Field field = new Field(name, fieldValue, luceneOptions.getStore(),
luceneOptions.getIndex(), luceneOptions.getTermVector());
field.setBoost(luceneOptions.getBoost());
document.add(field);
}
}

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.