3.7. Analyzer

Assuming that the title of an indexed book entity is Refactoring: Improving the Design of Existing Code and that hits are required for the following queries: refactor, refactors, refactored, and refactoring. Select an analyzer class in Lucene that applies word stemming when indexing and searching. Hibernate Search offers several ways to configure the analyzer (see Section 6.3.1, “Default Analyzer and Analyzer by Class” for more information):
  • Set the analyzer property in the configuration file. The specified class becomes the default analyzer.
  • Set the @Analyzer annotation at the entity level.
  • Set the @Analyzer annotation at the field level.
Specify the fully qualified classname or the analyzer to use, or see an analyzer defined by the @AnalyzerDef annotation with the @Analyzer annotation. The Solr analyzer framework with its factories are utilized for the latter option. For more information about factory classes, see the Solr JavaDoc or read the corresponding section on the Solr Wiki (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters)
In the example, a StandardTokenizerFactory is used by two filter factories: LowerCaseFilterFactory and SnowballPorterFilterFactory. The tokenizer splits words at punctuation characters and hyphens but keeping email addresses and internet hostnames intact. The standard tokenizer is ideal for this and other general operations. The lowercase filter converts all letters in the token into lowercase and the snowball filter applies language specific stemming.
If using the Solr framework, use the tokenizer with an arbitrary number of filters.

Example 3.8. Using @AnalyzerDef and the Solr Framework to Define and Use an Analyzer

@Indexed
@AnalyzerDef(
   name = "customanalyzer",
   tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
   filters = {
      @TokenFilterDef(factory = LowerCaseFilterFactory.class), 
      @TokenFilterDef(factory = SnowballPorterFilterFactory.class,
         params = { @Parameter(name = "language", value = "English") })
 })
public class Book implements Serializable {

  @Field
  @Analyzer(definition = "customanalyzer")
  private String title;
  
  @Field
  @Analyzer(definition = "customanalyzer")
  private String subtitle; 

  @IndexedEmbedded
  private Set authors = new HashSet();

  @Field(index = Index.YES, analyze = Analyze.NO, store = Store.YES)
  @DateBridge(resolution = Resolution.DAY)
  private Date publicationDate;
  
  public Book() {
  } 
  
  // standard getters/setters follow here
  ... 
}
Use @AnalyzerDef to define an analyzer, then apply it to entities and properties using @Analyzer. In the example, the customanalyzer is defined but not applied on the entity. The analyzer is only applied to the title and subtitle properties. An analyzer definition is global. Define the analyzer for an entity and reuse the definition for other entities as required.