Chapter 29. Distributed Execution

Red Hat JBoss Data Grid provides distributed execution through a standard JDK ExecutorService interface. Tasks submitted for execution are executed on an entire cluster of JBoss Data Grid nodes, rather than being executed in a local JVM.
JBoss Data Grid's distributed task executors can use data from JBoss Data Grid cache nodes as input for execution tasks. As a result, there is no need to configure the cache store for intermediate or final results. As input data in JBoss Data Grid is already load balanced, tasks are also automatically balanced, therefore there is no need to explicitly assign tasks to specific nodes.
In JBoss Data Grid's distributed execution framework:
  • Each DistributedExecutorService is bound to a single cache. Tasks submitted have access to key/value pairs from that particular cache if the task submitted is an instance of DistributedCallable.
  • Every Callable, Runnable, and/or DistributedCallable submitted must be either Serializable or Externalizable, in order to prevent task migration to other nodes each time one of these tasks is performed. The value returned from a Callable must also be Serializable or Externalizable.

29.1. Distributed Executor Service

A DistributedExecutorService controls the execution of DistributedCallable, and other Callable and Runnable, classes on the cluster. These instances are tied to a specific cache that is passed in upon instantiation:
DistributedExecutorService des = new DefaultExecutorService(cache);
It is only possible to execute a DistributedTask against a subset of keys if DistributedCallable is extended, as discussed in Section 29.2, “DistributedCallable API”. If a task is submitted in this manner to a single node, then JBoss Data Grid will locate the nodes containing the indicated keys, migrate the DistributedCallable to this node, and return a CompletableFuture. Alternatively, if a task is submitted to all available nodes in this manner then only the nodes containing the indicated keys will receive the task.
Once a DistributedTask has been created it may be submitted to the cluster using any of the below methods:
  • The task can be submitted to all available nodes and key/value pairs on the cluster using the submitEverywhere method:
    des.submitEverywhere(task)
  • The submitEverywhere method can also take a set of keys as an argument. Passing in keys in this manner will submit the task only to available nodes that contain the indicated keys:
    des.submitEverywhere(task, $KEY)
  • If a key is specified, then the task will be executed on a single node that contains at least one of the specified keys. Any keys not present locally will be retrieved from the cluster. This version of the submit method accepts one or more keys to be operated on, as seen in the following examples:
    des.submit(task, $KEY)
    des.submit(task, $KEY1, $KEY2, $KEY3)
  • A specific node can be instructed to execute the task by passing the node's Address to the submit method. The below will only be executed on the cluster's Coordinator:
    des.submit(cache.getCacheManager().getCoordinator(), task)

    Note

    By default tasks are automatically balanced, and there is typically no need to indicate a specific node to execute against.