16.2.2. MapReduceTask

In JBoss Data Grid, MapReduceTask is a distributed task, which unifies the Mapper, Combiner, Reducer, and Collator components into a cohesive computation, which can be parallelized and executed across a large-scale cluster.
These components can be specified with a fluent API. However,as most of them are serialized and executed on other nodes, using inner classes is not recommended.
For example:
new MapReduceTask(cache).mappedWith(new MyMapper()).combinedWith(new 
MyCombiner().reducedWith(new MyReducer()).execute(new MyCollator()).
MapReduceTask requires a cache containing data that will be used as input for the task. The JBoss Data Grid execution environment will instantiate and migrate instances of provided Mappers and Reducers seamlessly across the nodes.
By default, all available key/value pairs of a specified cache will be used as input data for the task. This can be modified by using the onKeys method as an input key filter.
There are two MapReduceTask constructor parameters that determine how the intermediate values are processed:
  • distributedReducePhase - When set to "false", the default setting, the reducers are only executed on the master node. If set to "true", the reducers are executed on every node in the cluster.
  • useIntermediateSharedCache - Only important if distributedReducePhase is set to "true". If "true", which is the default setting, this task will share intermediate value cache with other executing MapReduceTasks on the grid. If set to "false", this task will use its own dedicated cache for intermediate values.