3.2. Sharding indexes

In some extreme cases involving huge indexes (in size), it is necessary to split (shard) the indexing data of a given entity type into several Lucene indexes. This solution is not recommended until you reach significant index sizes and index update times are slowing the application down. The main drawback of index sharding is that searches will end up being slower since more files have to be opened for a single search. In other words do not do it until you have problems :)
Despite this strong warning, Hibernate Search allows you to index a given entity type into several sub indexes. Data is sharded into the different sub indexes thanks to an IndexShardingStrategy. By default, no sharding strategy is enabled, unless the number of shards is configured. To configure the number of shards use the following property

Example 3.3. Enabling index sharding by specifying nbr_of_shards for a specific index

hibernate.search.<indexName>.sharding_strategy.nbr_of_shards 5
This will use 5 different shards.
The default sharding strategy, when shards are set up, splits the data according to the hash value of the id string representation (generated by the Field Bridge). This ensures a fairly balanced sharding. You can replace the strategy by implementing IndexShardingStrategy and by setting the following property

Example 3.4. Specifying a custom sharding strategy

hibernate.search.<indexName>.sharding_strategy my.shardingstrategy.Implementation
Each shard has an independent directory provider configuration as described in Section 3.1, “Directory configuration”. The DirectoryProvider default name for the previous example are <indexName>.0 to <indexName>.4. In other words, each shard has the name of it's owning index followed by . (dot) and its index number.

Example 3.5. Configuring the sharding configuration for an example entity Animal

hibernate.search.default.indexBase /usr/lucene/indexes

hibernate.search.Animal.sharding_strategy.nbr_of_shards 5
hibernate.search.Animal.directory_provider org.hibernate.search.store.FSDirectoryProvider
hibernate.search.Animal.0.indexName Animal00
hibernate.search.Animal.3.indexBase /usr/lucene/sharded
hibernate.search.Animal.3.indexName Animal03
This configuration uses the default id string hashing strategy and shards the Animal index into 5 subindexes. All subindexes are FSDirectoryProvider instances and the directory where each subindex is stored is as followed:
  • for subindex 0: /usr/lucene/indexes/Animal00 (shared indexBase but overridden indexName)
  • for subindex 1: /usr/lucene/indexes/Animal.1 (shared indexBase, default indexName)
  • for subindex 2: /usr/lucene/indexes/Animal.2 (shared indexBase, default indexName)
  • for subindex 3: /usr/lucene/shared/Animal03 (overridden indexBase, overridden indexName)
  • for subindex 4: /usr/lucene/indexes/Animal.4 (shared indexBase, default indexName)