3.2. Sharding indexes
In some extreme cases involving huge indexes (in size), it is necessary to split (shard) the indexing data of a given entity type into several Lucene indexes. This solution is not recommended until you reach significant index sizes and index update times are slowing the application down. The main drawback of index sharding is that searches will end up being slower since more files have to be opened for a single search. In other words do not do it until you have problems :)
Despite this strong warning, Hibernate Search allows you to index a given entity type into several sub indexes. Data is sharded into the different sub indexes thanks to an
IndexShardingStrategy
. By default, no sharding strategy is enabled, unless the number of shards is configured. To configure the number of shards use the following property
Example 3.3. Enabling index sharding by specifying nbr_of_shards for a specific index
hibernate.search.<indexName>.sharding_strategy.nbr_of_shards 5
This will use 5 different shards.
The default sharding strategy, when shards are set up, splits the data according to the hash value of the id string representation (generated by the Field Bridge). This ensures a fairly balanced sharding. You can replace the strategy by implementing
IndexShardingStrategy
and by setting the following property
Example 3.4. Specifying a custom sharding strategy
hibernate.search.<indexName>.sharding_strategy my.shardingstrategy.Implementation
Each shard has an independent directory provider configuration as described in Section 3.1, “Directory configuration”. The DirectoryProvider default name for the previous example are
<indexName>.0
to <indexName>.4
. In other words, each shard has the name of it's owning index followed by .
(dot) and its index number.
Example 3.5. Configuring the sharding configuration for an example entity Animal
hibernate.search.default.indexBase /usr/lucene/indexes hibernate.search.Animal.sharding_strategy.nbr_of_shards 5 hibernate.search.Animal.directory_provider org.hibernate.search.store.FSDirectoryProvider hibernate.search.Animal.0.indexName Animal00 hibernate.search.Animal.3.indexBase /usr/lucene/sharded hibernate.search.Animal.3.indexName Animal03
This configuration uses the default id string hashing strategy and shards the Animal index into 5 subindexes. All subindexes are
FSDirectoryProvider
instances and the directory where each subindex is stored is as followed:
- for subindex 0: /usr/lucene/indexes/Animal00 (shared indexBase but overridden indexName)
- for subindex 1: /usr/lucene/indexes/Animal.1 (shared indexBase, default indexName)
- for subindex 2: /usr/lucene/indexes/Animal.2 (shared indexBase, default indexName)
- for subindex 3: /usr/lucene/shared/Animal03 (overridden indexBase, overridden indexName)
- for subindex 4: /usr/lucene/indexes/Animal.4 (shared indexBase, default indexName)