27.3. Using the Hadoop Connector
InfinispanInputFormat and InfinispanOutputFormat
In Hadoop, the InputFormat interface indicates how a specific data source is partitioned, along with how to read data from each of the partitions, while the OutputFormat interface specifies how to write data.
InpoutFormat interface:
List<InputSplit> getSplits(JobContext context);
RecordReader<K,V> createRecordReader(InputSplit split,TaskAttemptContext context);
getSplits method defines a data partitioner, returning one or more InputSplit instances that contain information regarding a certain section of the data. The InputSplit can then be used to obtain a RecordReader which will be used to iterate over the resulting dataset. These two operations allow for parallelization of data processing across multiple nodes, resulting in Hadoop's high throughput over large datasets.
Example of configuring a Map Reduce job targeting a JBoss Data Grid cluster:
import org.infinispan.hadoop.*; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.mapreduce.Job; [...] Configuration configuration = new Configuration(); configuration.set(InfinispanConfiguration.INPUT_REMOTE_CACHE_SERVER_LIST, "localhost:11222"); configuration.set(InfinispanConfiguration.INPUT_REMOTE_CACHE_NAME, "map-reduce-in"); configuration.set(InfinispanConfiguration.OUTPUT_REMOTE_CACHE_SERVER_LIST, "localhost:11222"); configuration.set(InfinispanConfiguration.OUTPUT_REMOTE_CACHE_NAME, "map-reduce-out"); Job job = Job.getInstance(configuration, "Infinispan Integration"); [...]
InfinispanInputFormat and InfinispanOutputFormat classes:
[...] // Define the Map and Reduce classes job.setMapperClass(MapClass.class); job.setReducerClass(ReduceClass.class); // Define the JBoss Data Grid implementations job.setInputFormatClass(InfinispanInputFormat.class); job.setOutputFormatClass(InfinispanOutputFormat.class); [...]

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.