Chapter 5. Set Up Distribution Mode

5.1. About Distribution Mode

In distribution mode, Red Hat JBoss Data Grid stores cache entries across a subset of nodes in the cluster instead of replicating entries on each node. This improves JBoss Data Grid scalability.

5.2. Consistent Hashing in Distribution Mode

Red Hat JBoss Data Grid uses an algorithm based on consistent hashing to distribute cache entries on nodes across clusters. JBoss Data Grid splits keys in distributed caches into fixed numbers of hash space segments, using MurmurHash3 by default.

Segments are distributed across the cluster to nodes that act as primary and backup owners. Primary owners coordinate locking and write operations for the keys in each segment. Backup owners provide redundancy in the event the primary owner becomes unavailable.

You configure the number of owners with the owners attribute. This attribute defines how many copies of each entry are available across the cluster. The default value is 2, a primary owner and one backup owner.

You can configure the number of hash space segments with the segments attribute. This attribute defines the hash space segments for the named cache across the cluster. The cache always has the configured number of hash segments available across the JBoss Data Grid cluster, no matter how many nodes join or leave.

Additionally, the key-to-segment mapping is fixed. In other words, keys always map to the same segments, regardless of changes to the cluster topology.

The default number of segments is 256, which is suitable for JBoss Data Grid clusters of 25 nodes or less. The recommended value is 20 * the number of nodes for each cluster, which allows you to add nodes and still have capacity.

However, any value within the range of 10 * the number of nodes and 100 * the number of nodes per cluster is fine.

With a perfect segment-to-node mapping, nodes are:

  • primary owner for segments calculated as number of segments / number of nodes
  • any kind of owner for segments calculated as number of owners * number of segments / number of nodes

However, JBoss Data Grid does not always distribute segments evenly and can map more segments to some nodes than others.

Consider a scenario where a cluster has 10 nodes and there are 20 segments per node. If segments are distributed evenly across the cluster, each node is the primary owner for 2 segments. If segments are not distributed evenly, some nodes are primary owners for 3 segments, which represents an increase of 50% for the planned capacity.

Likewise, if the number of owners is 2, each node should own 4 segments. However it could be the case that some nodes are owners for 5 segments, which represents a 25% increase for the planned capacity.

Note

You must restart the JBoss Data Grid cluster for changes to the number of segments to take effect.

5.3. Locating Entries in Distribution Mode

The consistent hash algorithm used in Red Hat JBoss Data Grid’s distribution mode can locate entries deterministically, without multicasting a request or maintaining expensive metadata.

A PUT operation can result in as many remote calls as specified by the owners parameter, while a GET operation executed on any node in the cluster results in a single remote call. In the background, the GET operation results in the same number of remote calls as a PUT operation (specifically the value of the owners parameter), but these occur in parallel and the returned entry is passed to the caller as soon as one returns.

5.4. Return Values in Distribution Mode

In Red Hat JBoss Data Grid’s distribution mode, a synchronous request is used to retrieve the previous return value if it cannot be found locally. A synchronous request is used for this task irrespective of whether distribution mode is using asynchronous or synchronous processes.

5.5. Configure Distribution Mode

Distribution mode is a clustered mode in Red Hat JBoss Data Grid. Distribution mode can be added to any cache container, in both Library Mode and Remote Client-Server Mode, using the following procedure:

The distributed-cache Element

<cache-container name="clustered"
    default-cache="default"
    statistics="true">
  <!-- Additional configuration information here -->
  <distributed-cache name="default"
      statistics="true">
    <!-- Additional configuration information here -->
  </distributed-cache>
</cache-container>

The distributed-cache element configures settings for the distributed cache using the following parameters:

  1. The name parameter provides a unique identifier for the cache.
  2. If statistics are enabled at the container level, per-cache statistics can be selectively disabled for caches that do not require monitoring by setting the statistics attribute to false.
Important

JGroups must be appropriately configured for clustered mode before attempting to load this configuration.

5.6. Synchronous and Asynchronous Distribution

To elicit meaningful return values from certain public API methods, it is essential to use synchronized communication when using distribution mode.

Communication Mode example

For example, with three nodes in a cluster, node A, B and C, and a key K that maps nodes A and B. Perform an operation on node C that requires a return value, for example Cache.remove(K). To execute successfully, the operation must first synchronously forward the call to both node A and B, and then wait for a result returned from either node A or B. If asynchronous communication was used, the usefulness of the returned values cannot be guaranteed, despite the operation behaving as expected.