Chapter 6. Set Up Distribution Mode
6.1. About Distribution Mode
When enabled, Red Hat JBoss Data Grid’s distribution mode stores each entry on a subset of the nodes in the grid instead of replicating each entry on every node. Typically, each entry is stored on more than one node for redundancy and fault tolerance.
As a result of storing entries on selected nodes across the cluster, distribution mode provides improved scalability compared to other clustered modes.
A cache using distribution mode can transparently locate keys across a cluster using the consistent hash algorithm.
6.2. Distribution Mode’s Consistent Hash Algorithm
The hashing algorithm in Red Hat JBoss Data Grid is based on consistent hashing. The term consistent hashing is still used for this implementation, despite some divergence from a traditional consistent hash.
Distribution mode uses a consistent hash algorithm to select a node from the cluster to store entries upon. The consistent hash algorithm is configured with the number of copies of each cache entry to be maintained within the cluster. Unlike generic consistent hashing, the implementation used in JBoss Data Grid splits the key space into fixed segments. The number of segments is configurable using
numSegments and cannot be changed without restarting the cluster. The mapping of keys to segments is also fixed — a key maps to the same segment, regardless of how the topology of the cluster changes.
The number of copies set for each data item requires balancing performance and fault tolerance. Creating too many copies of the entry can impair performance and too few copies can result in data loss in case of node failure.
Each hash segment is mapped to a list of nodes called owners. The order is important because the first owner (also known as the primary owner) has a special role in many cache operations (for example, locking). The other owners are called backup owners. There is no rule about mapping segments to owners, although the hashing algorithms simultaneously balance the number of segments allocated to each node and minimize the number of segments that have to move after a node joins or leaves the cluster.
6.3. Locating Entries in Distribution Mode
The consistent hash algorithm used in Red Hat JBoss Data Grid’s distribution mode can locate entries deterministically, without multicasting a request or maintaining expensive metadata.
PUT operation can result in as many remote calls as specified by the
owners parameter, while a
GET operation executed on any node in the cluster results in a single remote call. In the background, the
GET operation results in the same number of remote calls as a
PUT operation (specifically the value of the
owners parameter), but these occur in parallel and the returned entry is passed to the caller as soon as one returns.
6.4. Return Values in Distribution Mode
In Red Hat JBoss Data Grid’s distribution mode, a synchronous request is used to retrieve the previous return value if it cannot be found locally. A synchronous request is used for this task irrespective of whether distribution mode is using asynchronous or synchronous processes.
6.5. Configure Distribution Mode
Distribution mode is a clustered mode in Red Hat JBoss Data Grid. Distribution mode can be added to any cache container, in both Library Mode and Remote Client-Server Mode, using the following procedure:
<cache-container name="clustered" default-cache="default" statistics="true"> <!-- Additional configuration information here --> <distributed-cache name="default" statistics="true"> <!-- Additional configuration information here --> </distributed-cache> </cache-container>
distributed-cache element configures settings for the distributed cache using the following parameters:
nameparameter provides a unique identifier for the cache.
statisticsare enabled at the container level, per-cache statistics can be selectively disabled for caches that do not require monitoring by setting the
JGroups must be appropriately configured for clustered mode before attempting to load this configuration.
6.6. Synchronous and Asynchronous Distribution
To elicit meaningful return values from certain public
API methods, it is essential to use synchronized communication when using distribution mode.
Communication Mode example
For example, with three nodes in a cluster, node
C, and a key
K that maps nodes
B. Perform an operation on node
C that requires a return value, for example
Cache.remove(K). To execute successfully, the operation must first synchronously forward the call to both node
B, and then wait for a result returned from either node
B. If asynchronous communication was used, the usefulness of the returned values cannot be guaranteed, despite the operation behaving as expected.