11.7. Customizing Lucene's Scoring Formula
Customize Lucene's scoring formula allows the user to customize its scoring formula by extending
org.apache.lucene.search.Similarity. The abstract methods defined in this class match the factors of the following formula calculating the score of query q for document d:
score(q,d) = coord(q,d) · queryNorm(q) · ∑ t in q ( tf(t in d) · idf(t) 2 · t.getBoost() · norm(t,d) )
| Factor | Description |
|---|---|
| tf(t ind) | Term frequency factor for the term (t) in the document (d). |
| idf(t) | Inverse document frequency of the term. |
| coord(q,d) | Score factor based on how many of the query terms are found in the specified document. |
| queryNorm(q) | Normalizing factor used to make scores between queries comparable. |
| t.getBoost() | Field boost. |
| norm(t,d) | Encapsulates a few (indexing time) boost and length factors. |
It is beyond the scope of this manual to explain this formula in more detail. Please refer to
Similarity's Javadocs for more information.
Hibernate Search provides three ways to modify Lucene's similarity calculation.
First you can set the default similarity by specifying the fully specified classname of your
Similarity implementation using the property hibernate.search.similarity. The default value is org.apache.lucene.search.DefaultSimilarity.
You can also override the similarity used for a specific index by setting the
similarity property
hibernate.search.default.similarity = my.custom.Similarity
Finally you can override the default similarity on class level using the
@Similarity annotation.
@Entity
@Indexed
@Similarity(impl = DummySimilarity.class)
public class Book {
...
}
As an example, let's assume it is not important how often a term appears in a document. Documents with a single occurrence of the term should be scored the same as documents with multiple occurrences. In this case your custom implementation of the method
tf(float freq) should return 1.0.
Warning
When two entities share the same index they must declare the same
Similarity implementation. Classes in the same class hierarchy always share the index, so it's not allowed to override the Similarity implementation in a subtype.
Likewise, it does not make sense to define the similarity via the index setting and the class-level setting as they would conflict. Such a configuration will be rejected.