Chapter 5. Set Up Distribution Mode
5.1. About Distribution Mode
In distribution mode, Red Hat JBoss Data Grid stores cache entries across a subset of nodes in the cluster instead of replicating entries on each node. This improves JBoss Data Grid scalability.
5.2. Consistent Hashing in Distribution Mode
Red Hat JBoss Data Grid uses an algorithm based on consistent hashing to distribute cache entries on nodes across clusters. JBoss Data Grid splits keys in distributed caches into fixed numbers of hash space segments, using
MurmurHash3 by default.
Segments are distributed across the cluster to nodes that act as primary and backup owners. Primary owners coordinate locking and write operations for the keys in each segment. Backup owners provide redundancy in the event the primary owner becomes unavailable.
You configure the number of owners with the
owners attribute. This attribute defines how many copies of each entry are available across the cluster. The default value is
2, a primary owner and one backup owner.
You can configure the number of hash space segments with the
segments attribute. This attribute defines the hash space segments for the named cache across the cluster. The cache always has the configured number of hash segments available across the JBoss Data Grid cluster, no matter how many nodes join or leave.
Additionally, the key-to-segment mapping is fixed. In other words, keys always map to the same segments, regardless of changes to the cluster topology.
The default number of segments is
256, which is suitable for JBoss Data Grid clusters of 25 nodes or less. The recommended value is
20 * the number of nodes for each cluster, which allows you to add nodes and still have capacity.
However, any value within the range of
10 * the number of nodes and
100 * the number of nodes per cluster is fine.
With a perfect segment-to-node mapping, nodes are:
primary owner for segments calculated as
number of segments / number of nodes
any kind of owner for segments calculated as
number of owners * number of segments / number of nodes
However, JBoss Data Grid does not always distribute segments evenly and can map more segments to some nodes than others.
Consider a scenario where a cluster has 10 nodes and there are 20 segments per node. If segments are distributed evenly across the cluster, each node is the primary owner for 2 segments. If segments are not distributed evenly, some nodes are primary owners for 3 segments, which represents an increase of 50% for the planned capacity.
Likewise, if the number of owners is 2, each node should own 4 segments. However it could be the case that some nodes are owners for 5 segments, which represents a 25% increase for the planned capacity.
You must restart the JBoss Data Grid cluster for changes to the number of segments to take effect.
5.3. Locating Entries in Distribution Mode
The consistent hash algorithm used in Red Hat JBoss Data Grid’s distribution mode can locate entries deterministically, without multicasting a request or maintaining expensive metadata.
PUT operation can result in as many remote calls as specified by the
owners parameter, while a
GET operation executed on any node in the cluster results in a single remote call. In the background, the
GET operation results in the same number of remote calls as a
PUT operation (specifically the value of the
owners parameter), but these occur in parallel and the returned entry is passed to the caller as soon as one returns.
5.4. Return Values in Distribution Mode
In Red Hat JBoss Data Grid’s distribution mode, a synchronous request is used to retrieve the previous return value if it cannot be found locally. A synchronous request is used for this task irrespective of whether distribution mode is using asynchronous or synchronous processes.
5.5. Configure Distribution Mode
Distribution mode is a clustered mode in Red Hat JBoss Data Grid. Distribution mode can be added to any cache container, in both Library Mode and Remote Client-Server Mode, using the following procedure:
<cache-container name="clustered" default-cache="default" statistics="true"> <!-- Additional configuration information here --> <distributed-cache name="default" statistics="true"> <!-- Additional configuration information here --> </distributed-cache> </cache-container>
distributed-cache element configures settings for the distributed cache using the following parameters:
nameparameter provides a unique identifier for the cache.
statisticsare enabled at the container level, per-cache statistics can be selectively disabled for caches that do not require monitoring by setting the
JGroups must be appropriately configured for clustered mode before attempting to load this configuration.
5.6. Synchronous and Asynchronous Distribution
To elicit meaningful return values from certain public
API methods, it is essential to use synchronized communication when using distribution mode.
Communication Mode example
For example, with three nodes in a cluster, node
C, and a key
K that maps nodes
B. Perform an operation on node
C that requires a return value, for example
Cache.remove(K). To execute successfully, the operation must first synchronously forward the call to both node
B, and then wait for a result returned from either node
B. If asynchronous communication was used, the usefulness of the returned values cannot be guaranteed, despite the operation behaving as expected.