Chapter 26. Administering the Hortonworks Data Platform on Red Hat Gluster Storage
Warning
Important
- Dispersed Volumes and Distributed Dispersed Volume
- Red Hat Enterprise Linux 7
The following are the advantages of Hadoop Compatible Storage with Red Hat Gluster Storage:
- Provides file-based access to Red Hat Gluster Storage volumes by Hadoop while simultaneously supporting POSIX features for the volumes such as NFS Mounts, Fuse Mounts, Snapshotting and Geo-Replication.
- Eliminates the need for a centralized metadata server (HDFS Primary and Redundant Namenodes) by replacing HDFS with Red Hat Gluster Storage.
- Provides compatibility with MapReduce and Hadoop Ecosystem applications with no code rewrite required.
- Provides a fault tolerant file system.
- Allows co-location of compute and data and the ability to run Hadoop jobs across multiple namespaces using multiple Red Hat Gluster Storage volumes.
26.1. Deployment Scenarios
Table 26.1. Component Overview
Component Overview | Component Description |
---|---|
Ambari | Management Console for the Hortonworks Data Platform |
Red Hat Gluster Storage Console | (Optional) Management Console for Red Hat Gluster Storage |
YARN Resource Manager | Scheduler for the YARN Cluster |
YARN Node Manager | Worker for the YARN Cluster on a specific server |
Job History Server | This logs the history of submitted YARN Jobs |
glusterd | This is the Red Hat Gluster Storage process on a given server |
26.1.1. Red Hat Gluster Storage Trusted Storage Pool with Two Additional Servers

Figure 26.1. Recommended Deployment Topology for Large Clusters
26.1.2. Red Hat Gluster Storage Trusted Storage Pool with One Additional Server

Figure 26.2. Recommended Deployment Topology for Smaller Clusters
26.1.3. Red Hat Gluster Storage Trusted Storage Pool only

Figure 26.3. Evaluation deployment topology using the minimum amount of servers
26.1.4. Deploying Hadoop on an existing Red Hat Gluster Storage Trusted Storage Pool
26.1.5. Deploying Hadoop on a New Red Hat Gluster Storage Trusted Storage Pool
setup_cluster.sh
script can build the storage pool for you. The rest of the installation instructions will articulate how to create and enable volumes for use with Hadoop.