Chapter 1. Executive Summary

This reference architecture sets up two clusters, one is the Red Hat JBoss Data Grid (JDG) 7 cluster and the other an Apache Spark 1.6 cluster. Apache Spark processes data in memory, utilizing JDG 7 's in-memory data replication, eliminating the bottleneck that exists in many current enterprise applications.

This reference architecture also includes and deploys a sample internet of things (IoT) sensor application, which is developed based on a quickstarts example of JDG 7.

Since typical IoT workloads require low latency reads, writes and capacity to scale, JDG 7 cluster here serves not only as a highly scalable, high-performance data source for the Apache Spark cluster, but also ingests and serves data from IoT sensors. Spark tasks can operate on data stored in JDG caches with all the power of Spark batch and stream operators.

The goal is to provide a thorough description of the steps required for using Apache Spark with JDG 7 through the new Spark connector, and how to set up a cluster environment for JDG 7 and Spark.