Red Hat Training

A Red Hat training course is available for Red Hat Gluster Storage

Chapter 8. Deploying the Hortonworks Data Platform 2.1 on Red Hat Storage

Red Hat Storage provides compatibility for Apache Hadoop and it uses the standard file system APIs available in Hadoop to provide a new storage option for Hadoop deployments. Red Hat has created a Hadoop File System plug-in that enables Hadoop Distributions to run on Red Hat Storage.

8.1. Prerequisites

Before you begin installation, you must establish the basic infrastructure required to enable Hadoop to run on Red Hat Storage.

8.1.1. Supported Versions

The following table lists the supported versions of HDP and Ambari with Red Hat Storage Server.

Table 8.1. Red Hat Storage Server Support Matrix

Red Hat Storage Server version HDP version Ambari version
3.0.4 2.1 1.6.1
3.0.2 2.1 1.6.1
3.0.0 2.0.6 1.4.4

8.1.2. Software and Hardware Requirements

You must ensure that all the servers used in this environment meet the following requirements:
  • Must have at least the following hardware specification:
    • 2 x 2 GHz 4 core processors
    • 32 GB RAM
    • 500 GB of storage capacity
    • 1 x 1 GbE NIC
  • Must have iptables disabled.
  • Must use fully qualified domain names (FQDN). For example rhs-1.server.com is acceptable, but rhs-1 is not allowed.
  • Either, all servers must be configured to use a DNS server and must be able to use DNS for FQDN resolution or all the storage nodes must have the FQDN of all of the servers in the cluster listed in their /etc/hosts file.
  • Must have the following users and groups available on all the servers.
    User Group
    yarn hadoop
    mapred hadoop
    hive hadoop
    hcat hadoop
    ambari-qa hadoop
    hbase hadoop
    tez hadoop
    zookeeper hadoop
    oozie hadoop
    falcon hadoop
    The specific UIDs and GIDs for the respective users and groups are up to the Administrator of the trusted storage pool, but they must be consistent across the trusted storage pool. For example, if the "hadoop" user has a UID as 591 on one server, the hadoop user must have UID as 591 on all other servers. This can be quite a lot of work to manage using Local Authentication and it is common and acceptable to install a central authentication solution such as LDAP or Active Directory for your cluster, so that users and groups can be easily managed in one place. However, to use local authentication, you can run the script below on each server to create the users and groups and ensure they are consistent across the cluster:
    groupadd hadoop -g 590; useradd -u 591 mapred -g hadoop; useradd -u 592 yarn -g hadoop; useradd -u 594 hcat -g hadoop; useradd -u 595 hive -g hadoop; useradd -u 590 ambari-qa -g hadoop; useradd -u 593 tez -g hadoop; useradd -u 596 oozie -g hadoop; useradd -u 597 zookeeper -g hadoop; useradd -u 598 falcon -g hadoop; useradd -u 599 hbase -g hadoop

8.1.3. Existing Red Hat Storage Trusted Storage Pool

If you have an existing Red Hat Storage trusted storage pool, you need to add two additional servers to run the Hortonworks Ambari Management Services and the YARN Master Services, respectively. For more information on recommended deployment topologies, see Administering the Hortonworks Data Platform on Red Hat Storage chapter in Red Hat Storage Administration Guide.
In addition, all nodes within the Red Hat Storage Trusted Storage Pool that contain volumes that are to be used with Hadoop must contain a local glusterfs-fuse mount of that volume. The path of the mount for each volume must be consistent across the cluster.
For information on expanding your trusted storage pool by adding servers, see section Expanding Volumes in the Red Hat Storage 3.0 Administration Guide.

Note

The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2 or 3.

Important

New Red Hat Storage and Hadoop Clusters use the naming conventions of /mnt/brick1 as the mount point for Red Hat Storage bricks and /mnt/glusterfs/volname as the mount point for Red Hat Storage volume. It is possible that you have an existing Red Hat Storage volume that has been created with different mount points for the Red Hat Storage bricks and volumes. If the mount points differ from the convention, replace the prefix listed in this installation guide with the prefix that you have.
Information on how to mount and configure bricks and volumes with required parameters and description of required local mount of gluster volume are available in Section 8.2.5, “Enabling Existing Volumes for use with Hadoop”

8.1.4. New Red Hat Storage Trusted Storage Pool

You must create a Red Hat Storage trusted storage pool with at least four bricks for two-way replication and with six bricks for three-way replication. The servers on which these bricks reside must have the Red Hat Storage installed on them. The number of bricks must be a multiple of the replica count for a distributed replicated volume.
For more information on installing Red Hat Storage see Chapter 4, Installing Red Hat Storage or for upgrading to Red Hat Storage 3.0, see Chapter 6, Upgrading Red Hat Storage .
Red Hat recommends that you have an additional two servers set aside to run the Hortonworks Ambari Management Services and the YARN Master Services, respectively. Alternate deployment topologies are also possible, for more information on various supported deployment topologies, see Administering the Hortonworks Data Platform on Red Hat Storage chapter in Red Hat Storage Administration Guide.
For information on expanding your trusted storage pool by adding servers, see section Expanding Volumes in the Red Hat Storage 3.0 Administration Guide.

Note

The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2 or 3.

8.1.5. Red Hat Storage Server Requirements

You must install Red Hat Storage Server this server. While installing the server, you must ensure to specify a fully qualified domain name (FQDN). A hostname alone will not meet the requirements for the Hortonworks Data Platform Ambari deployment tool.
You must also enable the rhs-big-data-3-for-rhel-6-server-rpms channel on this server.
  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhs-big-data-3-for-rhel-6-server-rpms
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel rhel-x86_64-server-6-rhs-bigdata-3

8.1.6. Hortonworks Ambari Server Requirements

You must install Red Hat Enterprise Linux 6.6 on the servers. You can also choose to install Red Hat Storage Console on this server as well, but this is optional. This allows all aspects of the Red Hat Storage trusted pool to be managed from a single server. While installing the server, you must ensure to specify a fully qualified domain name (FQDN). A hostname alone will not meet the requirements for the Horton Data Platform Ambari deployment tool. It is mandatory to setup a passwordless-SSH connection from the Ambari Server to all other servers within the trusted storage pool. Instructions for installing and configuring Hortonworks Ambari is provided in the further sections of this chapter.
If the Hortonworks Ambari server is installed on a different node than Red Hat Storage Server, you must also enable the rhel-6-server-rh-common-rpms channel on this server.
  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhel-6-server-rh-common-rpms
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel rhel-x86_64-server-rh-common-6

Warning

Red Hat Storage Console enables Nagios Alerting for Red Hat Storage. The Nagios Client libraries are shipped with Red Hat Storage and are on each Red Hat Storage Server. This causes a conflict with the Nagios System that is bundled with the Hortonworks Data Platform (HDP). Hence, using Ambari to deploy and manage HDP Nagios is not supported.

Note

If you are using one of the condensed deployment topologies listed in the Administration Guide and you have elected to place the Ambari Management server on the same node as a Red Hat Storage Server, you must only enable the rhs-big-data-3-for-rhel-6-server-rpms channel on that server.
  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhs-big-data-3-for-rhel-6-server-rpms
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel rhel-x86_64-server-6-rhs-bigdata-3

8.1.7. YARN Master Server Requirements

You must install the Red Hat Enterprise Linux 6.6 on this server. While installing the server, you must ensure to specify a fully qualified domain name (FQDN). A hostname alone will not meet the requirements for the Horton Data Platform Ambari deployment tool.
If the YARN Master server is installed on a different node than Red Hat Storage Server, you must also enable the rhel-6-server-rh-common-rpms and rhel-6-server-rhs-client-1-rpms channels on the YARN server.
  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhel-6-server-rh-common-rpms --enable=rhel-6-server-rhs-client-1-rpms
    
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel rhel-x86_64-server-rh-common-6
    # rhn-channel --add --channel rhel-x86_64-server-rhsclient-6

Note

If you are using one of the condensed deployment topologies listed in the Administration Guide and you have elected to place the YARN Master server on the same node as a Red Hat Storage Server, you must only enable the rhs-big-data-3-for-rhel-6-server-rpms channel on that server.
  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhs-big-data-3-for-rhel-6-server-rpms
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel rhel-x86_64-server-6-rhs-bigdata-3