Chapter 11. OpenStack Sahara Installation

11.1. OpenStack Sahara Service Overview

Note

The Red Hat Enterprise Linux OpenStack Platform 5 release includes OpenStack Sahara as a Technology Preview. For more information on the support scope for features marked as technology previews, refer to https://access.redhat.com/support/offerings/techpreview/
OpenStack Sahara enables the fast provisioning and easy management of Hadoop clusters on OpenStack. Hadoop is used to store and analyze large amounts of data, which is usually unstructured but can be a combination of both complex and structured data:
  • Hadoop clusters are groups of servers acting as both storage servers, running the Hadoop Distributed File System (HDFS), and compute servers, running Hadoop's MapReduce (MR) framework. Cluster servers do not necessarily share memory or disks; they usually only share the network that connects them. This means that clusters can be easily added or removed as needed.
  • Hadoop enables the fast analysis of its data because computation and storage are co-located, and work is divided across its servers, each of which offers local computation and storage services.
In OpenStack Sahara:
  • The Identity service authenticates users and provides user security.
  • The Compute service provisions cluster VMs.
  • The Image service stores cluster VMs (each contain its operating system plus Hadoop).
  • The Object Storage service can be used to store data that is processed by Hadoop jobs.
  • Templates are used for cluster configuration. Nodes are grouped together using a Node Group template; Cluster templates are used to combine Node Groups.
  • Jobs are used to execute tasks on Hadoop clusters. Job binaries store executable code; data sources store input or output locations as well as any necessary credentials.
Sahara supports different Hadoop distributions as well as vendor-specific management tools (for example, Apache Ambari). Either the OpenStack dashboard or the command-line tool can be used for cluster provision and management.

Table 11.1. Sahara Service components

Component Description
openstack-sahara-api
API service. Handles cluster requests and data delivery.
sahara
CLI client for Sahara tasks. The majority of actions that are available in the OpenStack Dashboard can also be executed using the CLI (excluded is the ability to scale clusters up and down).
sahara-db-manage
CLI client for database management
sahara-dashboard
Plugin for the OpenStack dashboard.