Red Hat Training

A Red Hat training course is available for Red Hat Gluster Storage

Chapter 12. Deploying the Hortonworks Data Platform 2.0.6 on Red Hat Storage

Red Hat Storage provides compatibility for Apache Hadoop and it uses the standard file system APIs available in Hadoop to provide a new storage option for Hadoop deployments. Red Hat has created a Hadoop File System plug-in that enables Hadoop Distributions to run on Red Hat Storage.

12.1. Prerequisites

Before you begin installation, you must establish the basic infrastructure required to enable Hadoop to run on Red Hat Storage.

12.1.1. Supported Versions

Red Hat Storage 3.0 can be integrated successfully with Hortonworks Data Platform (HDP) version 2.0.6.

12.1.2. Software and Hardware Requirements

You must ensure that all the servers used in this environment meet the following requirements:
  • Must have at least the following hardware specification:
    • 2 x 2GHz 4 core processors
    • 32 GB RAM
    • 500 GB of storage capacity
    • 1 x 1GbE NIC
  • Must have iptables disabled.
  • Must use fully qualified domain names (FQDN). For example rhs-1.server.com is acceptable, but rhs-1 is not allowed.
  • Must have access to a DNS server and must be able to use DNS for FQDN resolution.
  • Must have the following users and groups available on all the servers.
    User Group
    yarn hadoop
    mapred hadoop
    hive hadoop
    hcat hadoop
    ambari-qa hadoop
    The specific UIDs and GIDs for the respective users and groups are up to the Administrator of the trusted storage pool, but they must be consistent across the trusted storage pool. For example, if the "hadoop" user has a UID as 591 on one server, the hadoop user must have UID as 591 on all other servers. This can be quite a lot of work to manage using Local Authentication and it is common and acceptable to install a central authentication solution such as LDAP or Active Directory for your cluster, so that users and groups can be easily managed in one place. However, to use local authentication, you can run the script below on each server to create the users and groups and ensure they are consistent across the cluster:
    groupadd hadoop -g 590; useradd -u 591 mapred -g hadoop; useradd -u 592 yarn -g hadoop; useradd -u 594 hcat -g hadoop; useradd -u 595 hive -g hadoop; useradd -u 596 ambari-qa -g hadoop

12.1.3. Existing Red Hat Storage Trusted Storage Pool

If you have an existing Red Hat Storage trusted storage pool, you need to add two additional servers to run the Hortonworks Ambari Management Services and the YARN Master Services, respectively. For more information on recommended deployment topologies, see Administering the Hortonworks Data Platform on Red Hat Storage chapter in Red Hat Storage Administration Guide.
For information on expanding your trusted storage pool by adding servers, see section Expanding Volumes in the Red Hat Storage 3.0 Administration Guide.

Note

The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2.

12.1.4. New Red Hat Storage Trusted Storage Pool

You must create a Red Hat Storage trusted storage pool with at least four bricks. The servers on which these bricks reside must have the Red Hat Storage 3.0 installed on them and must have at least one RAID 6 block device per server. With these bricks, you must create a Distributed Replicated volume.
For more information on installing Red Hat Storage 3.0, see Chapter 4, Installing Red Hat Storage or for upgrading to Red Hat Storage 3.0, see Chapter 6, Upgrading Red Hat Storage.
Red Hat recommends that you have an additional two servers set aside to run the Hortonworks Ambari Management Services and the YARN Master Services, respectively. Alternate deployment topologies are also possible, for more information on various supported deployment topologies, see Administering the Hortonworks Data Platform on Red Hat Storage chapter in Red Hat Storage Administration Guide.
For information on expanding your trusted storage pool by adding servers, see section Expanding Volumes in the Red Hat Storage 3.0 Administration Guide.

Note

The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2.

12.1.5. Red Hat Storage Server Requirements

You must install Red Hat Storage Server 3 on this server. While installing the server, you must ensure to specify a fully qualified domain name (FQDN). A hostname alone will not meet the requirements for the Horton Data Platform Ambari deployment tool.
You must also enable the rhs-big-data-3-for-rhel-6-server-rpms channel on this server.
  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhs-big-data-3-for-rhel-6-server-rpms
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel rhel-x86_64-server-6-rhs-bigdata-3

12.1.6. Hortonworks Ambari Server Requirements

You must install Red Hat Enterprise Linux 6.5 on the servers. You can also choose to install Red Hat Storage Console on this server as well, but this is optional. This allows all aspects of the Red Hat Storage trusted pool to be managed from a single server. While installing the server, you must ensure to specify a fully qualified domain name (FQDN). A hostname alone will not meet the requirements for the Horton Data Platform Ambari deployment tool. It is mandatory to setup a passwordless-SSH connection from the Ambari Server to all other servers within the trusted storage pool. Instructions for installing and configuring Hortonworks Ambari is provided in the further sections of this chapter.
If the Hortonworks Ambari server is installed on a different node than Red Hat Storage Server, you must also enable the rhel-6-server-rh-common-rpms channel on this server.
  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhel-6-server-rh-common-rpms
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel rhel-x86_64-server-rh-common-6

Warning

Red Hat Storage Console enables Nagios Alerting for Red Hat Storage. The Nagios Client libraries are shipped with Red Hat Storage and are on each Red Hat Storage Server. This causes a conflict with the Nagios System that is bundled with the Hortonworks Data Platform (HDP). As such, using HDP 2.0.6 to deploy and manage Nagios is not supported.

Note

If you are using one of the condensed deployment topologies listed in the Administration Guide and you have elected to place the Ambari Management server on the same node as a Red Hat Storage Server, you must only enable the rhs-big-data-3-for-rhel-6-server-rpms channel on that server.
  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhs-big-data-3-for-rhel-6-server-rpms
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel rhel-x86_64-server-6-rhs-bigdata-3

12.1.7. YARN Master Server Requirements

You must install the Red Hat Enterprise Linux 6.5 on this server. While installing the server, you must ensure to specify a fully qualified domain name (FQDN). A hostname alone will not meet the requirements for the Horton Data Platform Ambari deployment tool.
If the YARN Master server is installed on a different node than Red Hat Storage Server, you must also enable the rhel-6-server-rh-common-rpms and rhel-6-server-rhs-client-1-rpms channels on the YARN server.
  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhel-6-server-rh-common-rpms /
    --enable=rhel-6-server-rhs-client-1-rpms
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel rhel-x86_64-server-rh-common-6
    # rhn-channel --add --channel rhel-x86_64-server-rhsclient-6

Note

If you are using one of the condensed deployment topologies listed in the Administration Guide and you have elected to place the YARN Master server on the same node as a Red Hat Storage Server, you must only enable the rhs-big-data-3-for-rhel-6-server-rpms channel on that server.
  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhs-big-data-3-for-rhel-6-server-rpms
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel rhel-x86_64-server-6-rhs-bigdata-3