Show Table of Contents
OpenStack Data Processing
Red Hat Enterprise Linux OpenStack Platform 7
Manually provisioning and scaling Hadoop clusters in Red Hat OpenStack
Abstract
The OpenStack Data Processing feature allows you to easily provision and scale Hadoop clusters to process large datasets. This guide walks you through the entire OpenStack Data Processing workflow, which includes registering the Data Processing requirements (image, input data, job binaries), configuring templates used to provision clusters, processing data on those clusters, and scaling those clusters as necessary.
This release of OpenStack Data Processing includes a Guides tab. This tab features wizards that will help you create the templates necessary in order to launch clusters and run jobs on them. The objective of this guide is to provide a more in-depth look at the OpenStack Data Processing workflow, and will therefore walk you through the template creation and component registration without the use of the Guides tab feature.
Using the OpenStack Data Processing feature requires basic knowledge of data processing within the Hadoop framework. Further, users also need to be familiar with the particulars of their chosen Hadoop plug-in. This release supports both Hortonworks Data Platform 2.0 and Cloudera 5.3.0 plug-ins.
1. Overview
OpenStack Data Processing is provided by the OpenStack Sahara project, which provides a robust interface to easily provision and scale Hadoop clusters. Such clusters can then be used to run resource-intensive jobs, typically for processing large data sets. As an OpenStack component, OpenStack Data Processing is fully integrated into the OpenStack ecosystem; for example, users can administer the entire Hadoop data processing workflow through the OpenStack dashboard-- from configuring clusters, all the way to launching and running jobs on them.
For more information about Hadoop, see http://hadoop.apache.org/.
OpenStack Data Processing uses different plug-ins for provisioning specific clusters of each Hadoop distribution. The Hadoop parameters available for configuring clusters differ depending on the Hadoop distribution (and, by extension, the plug-in used). As of this release (Red Hat Enterprise Linux OpenStack Platform 7), OpenStack Data Processing supports the following plug-ins:
Note
This guide assumes that you have already installed and configured the OpenStack Data Processing service. For instructions on how to do so, see Install the Data Processing Service.
If you want to deploy OpenStack Data Processing quickly as part of an evaluation or test installation, see Use Packstack to Deploy a Proof-of-Concept Data Processing Service.
This release of OpenStack Data Processing includes a Guides tab. This tab features wizards that will help you create the templates necessary in order to launch clusters and run jobs on them. However, you will still need to register the components necessary for using OpenStack Data Processing, such as Hadoop images and job binaries. As such, if you intend to use the Guides feature, we recommend that you see Section 4, “Register the Required Components” first.

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.