-
Language:
English
-
Language:
English
Red Hat Training
A Red Hat training course is available for Red Hat Gluster Storage
Chapter 26. Administering the Hortonworks Data Platform on Red Hat Gluster Storage
Warning
Support for Hortonworks Data Platform (HDP) on Red Hat Gluster Storage integrated using the Hadoop Plug-In is deprecated as of Red Hat Gluster Storage 3.1 Update 2, and is unlikely to be supported in the next major release. Red Hat discourages further use of this plug-in for deployments where Red Hat Gluster Storage is directly used for holding analytics data for running in-place analytics. However, Red Hat Gluster Storage can be used as a general purpose repository for holding analytics data and as a companion store where the bulk of the data is stored and then moved to Hadoop clusters for analysis when necessary.
Red Hat Gluster Storage provides filesystem compatibility for Apache Hadoop and uses the standard file system APIs available in Hadoop to provide a new storage option for Hadoop deployments. Existing Hadoop Ecosystem applications can use Red Hat Gluster Storage seamlessly.
Important
The following features of Red Hat Gluster Storage is not supported with Hadoop:
- Dispersed Volumes and Distributed Dispersed Volume
- Red Hat Enterprise Linux 7
Advantages
The following are the advantages of Hadoop Compatible Storage with Red Hat Gluster Storage:
- Provides file-based access to Red Hat Gluster Storage volumes by Hadoop while simultaneously supporting POSIX features for the volumes such as NFS Mounts, Fuse Mounts, Snapshotting and Geo-Replication.
- Eliminates the need for a centralized metadata server (HDFS Primary and Redundant Namenodes) by replacing HDFS with Red Hat Gluster Storage.
- Provides compatibility with MapReduce and Hadoop Ecosystem applications with no code rewrite required.
- Provides a fault tolerant file system.
- Allows co-location of compute and data and the ability to run Hadoop jobs across multiple namespaces using multiple Red Hat Gluster Storage volumes.
26.1. Deployment Scenarios
You must ensure to meet the prerequisites by establishing the basic infrastructure required to enable Hadoop Distributions to run on Red Hat Gluster Storage. For information on prerequisites and installation procedure, see Deploying the Hortonworks Data Platform on Red Hat Gluster Storage chapter in Red Hat Gluster Storage 3.1 Installation Guide.
The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2 or 3.
The following table provides the overview of the components of the integrated environment.
Table 26.1. Component Overview
Component Overview | Component Description |
---|---|
Ambari | Management Console for the Hortonworks Data Platform |
Red Hat Gluster Storage Console | (Optional) Management Console for Red Hat Gluster Storage |
YARN Resource Manager | Scheduler for the YARN Cluster |
YARN Node Manager | Worker for the YARN Cluster on a specific server |
Job History Server | This logs the history of submitted YARN Jobs |
glusterd | This is the Red Hat Gluster Storage process on a given server |
26.1.1. Red Hat Gluster Storage Trusted Storage Pool with Two Additional Servers
The recommended approach to deploy the Hortonworks Data Platform on Red Hat Gluster Storage is to add two additional servers to your trusted storage pool. One server acts as the Management Server hosting the management components such as Hortonworks Ambari and Red Hat Gluster Storage Console (optional). The other server acts as the YARN Master Server and hosts the YARN Resource Manager and Job History Server components. This design ensures that the YARN Master processes do not compete for resources with the YARN NodeManager processes. Furthermore, it also allows the Management server to be multi-homed on both the Hadoop Network and User Network, which is useful to provide users with limited visibility into the cluster.

Figure 26.1. Recommended Deployment Topology for Large Clusters
26.1.2. Red Hat Gluster Storage Trusted Storage Pool with One Additional Server
If two servers are not available, you can install the YARN Master Server and the Management Server on a single server. This is also an option if you have a server with abundant CPU and Memory available. It is recommended that the utilization is carefully monitored on the server to ensure that sufficient resources are available to all the processes. If resources are being over-utilized, it is recommended that you move to the deployment topology for a large cluster as explained in the previous section. Ambari supports the ability to relocate the YARN Resource Manager to another server after it is deployed. It is also possible to move Ambari to another server after it is installed.

Figure 26.2. Recommended Deployment Topology for Smaller Clusters
26.1.3. Red Hat Gluster Storage Trusted Storage Pool only
If no additional servers are available, one can condense the processes on the YARN Master Server and the Management Server on a server within the trusted storage pool. This option is recommended only in a evaluation environment with workloads that do not utilize the servers heavily. It is recommended that the utilization is carefully monitored on the server to ensure that sufficient resources are available for all the processes. If the resources start are over-utilized, it is recommended that you move to the deployment topology detailed in Section 26.1.1, “Red Hat Gluster Storage Trusted Storage Pool with Two Additional Servers”. Ambari supports the ability to relocate the YARN Resource Manager to another server after it is deployed. It is also possible to move Ambari to another server after it is installed.

Figure 26.3. Evaluation deployment topology using the minimum amount of servers
26.1.4. Deploying Hadoop on an existing Red Hat Gluster Storage Trusted Storage Pool
If you have an existing Red Hat Gluster Storage Trusted Storage Pool then you need to procure two additional servers for the YARN Master and Ambari Management Server as depicted in the deployment topology detailed in Section 26.1.1, “Red Hat Gluster Storage Trusted Storage Pool with Two Additional Servers”. If you have no existing volumes within the trusted storage pool you need to follow the instructions in the installation guide to create and enable those volumes for Hadoop. If you have existing volumes you need to follow the instructions to enable them for Hadoop.
The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2 or 3.
26.1.5. Deploying Hadoop on a New Red Hat Gluster Storage Trusted Storage Pool
If you do not have an existing Red Hat Gluster Storage Trusted Storage Pool, you must procure all the servers listed in the deployment topology detailed in Section 26.1.1, “Red Hat Gluster Storage Trusted Storage Pool with Two Additional Servers”. You must then follow the installation instructions listed in the Red Hat Gluster Storage 3.1 Installation Guide so that the
setup_cluster.sh
script can build the storage pool for you. The rest of the installation instructions will articulate how to create and enable volumes for use with Hadoop.
The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2 or 3.