2015 - Deploying OpenStack Sahara on Red Hat Enterprise Linux OpenStack Platform 6

Updated -

Big data and analytics infrastructure spending continues to grow as businesses transform to being data driven. Apache Hadoop and other open-source frameworks for large-scale distributed data storage and processing are attractive to enterprise customers because they can run on commodity clusters. However, the cost to design, deploy, test, and maintain Hadoop clusters can quickly erode their favorable economics.

The Sahara project provides a simple means to provision Hadoop clusters in OpenStack. Sahara users can deploy Hadoop clusters in minutes by specifying simple parameters such as Hadoop version, cluster topology, and node count. The pre-configured clusters are deployed with all required software and libraries. Clusters provisioned by Sahara can be scaled on demand or removed when no longer needed. The Sahara database stores reusable cluster and job templates.

Typical use cases for OpenStack Sahara include:
- Rapid provisioning of Hadoop clusters for Dev and QA
- Utilization of unused compute power from a general purpose OpenStack cloud
- “Analytics-as-a-Service” for bursty or ad-hoc workloads

This reference architecture describes how to install and configure Sahara on Red Hat Enterprise Linux OpenStack Platform 6. It focuses on the single user to small group use case of deploying a Hadoop cluster for Dev or QA. It also demonstrates two methods for running Hadoop jobs within Sahara. First, it shows how to run an interactive MapReduce job to validate cluster functionality. It concludes with an example Pig job submitted through Sahara's Elastic Data Processing (EDP) engine that uses OpenStack Swift for input and output data storage. Although Sahara natively benefits from many of the fault tolerant features of both Hadoop and OpenStack, deploying Sahara for production is beyond the scope of this document.

Attachments

Comments