Chapter 1. Overview

The OpenStack Data Processing service (sahara) provides a robust interface to easily provision and scale Hadoop clusters. Such clusters can then be used to run resource-intensive jobs, typically for processing large data sets. As an OpenStack component, OpenStack Data Processing is fully integrated into the OpenStack ecosystem; for example, users can administer the entire Hadoop data processing workflow through the OpenStack dashboard — from configuring clusters, all the way to launching and running jobs on them.

For more information about Hadoop, see http://hadoop.apache.org/.

Note

The Data Processing service (sahara) is deprecated in OpenStack Platform version 15 and is targeted for removal in version 16.

OpenStack Data Processing uses different plug-ins for provisioning specific clusters of each Hadoop distribution. The Hadoop parameters available for configuring clusters differ depending on the Hadoop distribution (and, by extension, the plug-in used). As of this release (Red Hat OpenStack Platform 15), OpenStack Data Processing supports the following plug-ins:

OpenStack Data Processing also includes a Guides tab. This tab features wizards that will help you create the templates necessary in order to launch clusters and run jobs on them. However, you will still need to register the components necessary for using OpenStack Data Processing, such as Hadoop images and job binaries. As such, if you intend to use the Guides feature, we recommend you read Chapter 5, Register the Required Components first.