Chapter 3. Creating the Environment

3.1. Prerequisites

Prerequisites for creating this reference architecture include a supported Operating System and JDK. Refer to Red Hat documentation for JBoss Data Grid 7.0 Supported Configurations.

3.2. Downloads

Download the attachments to this document. These application code and files will be used in configuring the reference architecture environment:

https://access.redhat.com/node/2640031/40/0

If you do not have access to the Red Hat customer portal, See the Comments and Feedback section to contact us for alternative methods of access to these files.

Download the Red Hat JBoss Data Grid 7.0.0 Server from Red Hat’s Customer Support Portal

Download Apache Spark 1.6 from Apache Spark website download page

This reference architecture use the spark-1.6.0-bin-hadoop2.6 build

3.3. Installation

3.3.1. Apache Spark

Installing Apache Spark is very simple and mainly involves extracting the downloaded archive file on each node.

# tar xvf spark-1.6.0-bin-hadoop2.6.tgz

3.3.2. JBoss Data Grid 7

JBoss Data Grid 7 does not require any installation steps. The archive file simply needs to be extracted after the download. This reference architecture requires installation of JBoss Data Grid 7.0.0 Server on each node.

# unzip jboss-datagrid-7.0.0-server.zip -d /opt/

3.4. Configuration

3.4.1. Overview

Various other types of configuration may be required for UDP and TCP communication. For example, Linux operating systems typically have a low maximum socket buffer size configured, which is lower than the default cluster JGroups buffer size. It may be important to correct any such warnings observed in the JDG logs. For more information, please follow the Administration and Configuration Guide for JDG 7

3.4.2. JDG 7 configuration

This reference architecture installs and configures a three-node cluster on separate machines. The names node1, node2 and node3 are used in this paper to refer to both the machines and the JDG 7 nodes on them.

Figure 3.1. Deployment Clusters

Deployment Clusters

3.4.2.1. Adding Users

The first important step in configuring the JDG 7 clusters is to add the required users. They are Admin Users and Node Users.

1) Admin User

An administrator user is required for each domain. Assuming the user ID of admin and the password of password1! for this admin user:

On node1:

# /opt/jboss-datagrid-7.0.0-server/bin/add-user.sh admin password1!

This uses the non-interactive mode of the add-user script, to add management users with a given username and password.

2) Node Users

The next step is to add a user for each node that will connect to the cluster. That means creating two users called node2 and node3 (since node1 hosts the domain controller and does not need to use a password to authenticate against itself). This time, provide no argument to the add-user script and instead follow the interactive setup.

The first step is to specify that a management user is being added. The interactive process is as follows:

# /opt/jboss-datagrid-7.0.0-server/bin/add-user.sh

What type of user do you wish to add?
a) Management User (mgmt-users.properties)
b) Application User (application-users.properties)
(a): a

  • Simply press enter to accept the default selection of a

Enter the details of the new user to add.
Realm (ManagementRealm) :

  • Once again simply press enter to continue

Username : node2

  • Enter the username and press enter (node1, node2 or node3)

Password : password1!

  • Enter password1! as the password, and press enter

Re-enter Password : password1!

  • Enter password1! again to confirm, and press enter

About to add user 'node_X_' for realm 'ManagementRealm'
Is this correct yes/no? yes

  • Type yes and press enter

The continue:

Is this new user going to be used for one AS process to connect to another AS process?
e.g. for a slave host controller connecting to the master or for a Remoting connection for server to server EJB calls.
yes/no?

  • Type yes and press enter

To represent the user add the following to the server-identities definition <secret value="cGFzc3dvcmQxIQ==" />

This concludes the setup of required management users to administer the domains and connect the slave machines.

3.4.3. JDG 7 Cache configuration

On node1, execute the following scripts to add two new JDG 7 distributed caches, sensor-data and sensor-avg-data. These 2 caches will be used by the sample IoT sensor application.

# /opt/jboss-datagrid-7.0.0-server/bin/cli.sh # embed-host-controller --domain-config=domain.xml --host-config=host.xml --std-out=echo # /profile=clustered/subsystem=datagrid-infinispan/cache-container=clustered/configurations=CONFIGURATIONS/distributed-cache-configuration=sensor-data:add(start=EAGER,template=false,mode=SYNC) # /profile=clustered/subsystem=datagrid-infinispan/cache-container=clustered/distributed-cache=sensor-data:add(configuration=sensor-data) # /profile=clustered/subsystem=datagrid-infinispan/cache-container=clustered/configurations=CONFIGURATIONS/distributed-cache-configuration=sensor-avg-data:add(start=EAGER,template=false,mode=SYNC) # /profile=clustered/subsystem=datagrid-infinispan/cache-container=clustered/distributed-cache=sensor-avg-data:add(configuration=sensor-avg-data) 

3.4.4. JDG 7 Cluster configuration

1) Update /opt/jboss-datagrid-7.0.0-server/domain/configuration/host-slave.xml on both node2 and node3, so these 2 nodes can form a JDG cluster with node1.

Update the first line for node2, adding host name.

<host name="node2" xmlns="urn:jboss:domain:4.0">

Update the first line for node3, adding host name.

<host name="node3" xmlns="urn:jboss:domain:4.0">

2) Update node2 and node3’s host-slave.xml, change server-identities value from default sample value to this new value, which is the encrypted password value of the node user from last section.

<server-identities>
  <secret value="cGFzc3dvcmQxIQ=="/>
</server-identities>

3) Update the server name for node1 in host.xml, by deleting the server-two tag.

<server name="server-two" group="cluster" auto-start="true">

Update the server name to server-two for node2 in host-slave.xml.

<server name="server-two" group="cluster"/>

Update the server name to server-three- for node3 in _host-slave.xml.

<server name="server-three" group="cluster"/>

After these change, the cluster will have 3 members, server-one on node1, server-two on node2 and server-three on node3.

3.5. Startup

To start the active domain, assuming that 10.19.137.34 is the IP address for the node1 machine, 10.19.137.35 for node2 and 10.19.137.36 for node3:

3.5.1. Start JDG 7.0 cluster

Log on to the three machines where JDG 7 is installed and navigate to the bin directory:

# cd /opt/jboss-datagrid-7.0.0-server/bin

To start the first node

# ./domain.sh -bmanagement=10.19.137.34 -b=10.19.137.34

To start the second node

# ./domain.sh -b=10.19.137.35 -bprivate=10.19.137.35 --master-address=10.19.137.34 --host-config=host-slave.xml

To start the third node

# ./domain.sh -b=10.19.137.36 -bprivate=10.19.137.36 --master-address=10.19.137.34 --host-config=host-slave.xml

3.5.2. Stop JDG 7.0 cluster

To stop the sensor applications, press ctrl-c or use "kill -9 PID" to stop the process.

3.5.3. Start Apache Spark cluster

Apache Spark currently supports three types of cluster managers:

  • Standalone – a simple cluster manager included with Spark that makes it easy to set up a cluster.
  • Apache Mesos – a general cluster manager that can schedule short-lived tasks and long-running services on shared compute resources.
  • Hadoop YARN – the resource manager in Hadoop 2.

This reference architecture uses the standalone cluster mode.

Note

Each streaming receiver will use a CPU core / thread from the processors allocated to Apache Spark. Ensure that the Spark application always has a higher number of CPU cores than receivers. Failure to allocate at least one extra processing core can result in receivers running but no data being processed by Spark.

3.5.3.1. Start Apache Spark standalone cluster

By default, Apache Spark uses port 8080 for its Web UI, which is coincidentally the same port used by JBoss Data Grid, as configured in its domain.xml:

<socket-binding name="rest" port="8080"/>

Therefore, an attempt to start Apache Spark on the same host as JDG 7 may result in the following exception due to a port conflict:

ERROR [org.jboss.msc.service.fail] (MSC service thread 1-3) MSC000001: Failed to start service jboss.datagrid-infinispan-endpoint.rest.rest-connector: org.jboss.msc.service.StartException in service jboss.datagrid-infinispan-endpoint.rest.rest-connector: DGENDPT10016: Could not start the web context for the REST Server [Server:server-one] at org.infinispan.server.endpoint.subsystem.RestService.start(RestService.java:110)

To avoid such a port conflict, please start Apache Spark with the --webui-port argument to use a different port.

On Node 1 (10.19.137.34), start both master and worker.

# cd /opt/spark-1.6.0-bin-hadoop2.6/sbin # ./start-master.sh --webui-port 9080 -h 10.19.137.34 # ./start-slave.sh spark://10.19.137.34:7077 --webui-port 9081 

On Node 2 (10.19.137.35), start one worker.

# cd /opt/spark-1.6.0-bin-hadoop2.6/sbin # ./start-slave.sh spark://10.19.137.34:7077 --webui-port 9081 

On Node 3 (10.19.137.36), start one worker.

# cd /opt/spark-1.6.0-bin-hadoop2.6/sbin # ./start-slave.sh spark://10.19.137.34:7077 --webui-port 9081 

3.5.4. Stop Apache Spark cluster

On Node1, stop both master and worker.

# cd /opt/spark-1.6.0-bin-hadoop2.6/sbin # ./stop-slave.sh # ./stop-master.sh 

On Node2 and Node3, only need to stop the worker.

# cd /opt/spark-1.6.0-bin-hadoop2.6/sbin # ./stop-slave.sh 
Note

The whole Apache Spark cluster can also be started and stopped using launch scripts, like sbin/start-all.sh and sbin/stop-all.sh, which need additional configuration. For details, please refer to Cluster Launch Scripts

3.5.5. Start IoT sensor application

3.5.5.1. Start Spark analysis application

 # /opt/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --master spark://10.19.137.34:7077 --deploy-mode cluster --supervise --class org.Analyzer target/ref-analysis-jar-with-dependencies.jar 10.19.137.34:11222;10.19.137.35:11222;10.19.137.36:11222 

The arguments provided to spark-submit are as follows:

  • master: The master URL for the cluster
  • deploy-mode cluster: deploy the driver on the worker nodes (cluster)
  • supervise: to make sure that the driver is automatically restarted if it fails with non-zero exit code.
  • class: entry point of the application
  • The last argument is the JDG 7 cluster address, which is used in the Spark connector.

For more information on how to use spark-submit, please refer to this link

3.5.5.2. Start Client application

# java -jar target/temperature-client-jar-with-dependencies.jar 10.19.137.34 shipment1 shipment5 shipment9 

The arguments to this application include:

  • The first argument is the address for the Hot Rod Java Client to connect to JDG 7, in this example it is 10.19.137.34. Since the JDG cluster has three nodes (10.19.137.34, 10.19.137.35 and 10.19.137.36), using either one of the IP addresses will work.
  • After that, it’s the shipment ID strings that the client will be listening to. It doesn’t have to be an exact match, for example "shipment1" will bring back all shipments with an ID starting with "shipment1", like shipment101 or shipment18.

3.5.5.3. Start Sensor application

# java -jar target/temperature-sensor-jar-with-dependencies.jar 10.19.137.34 

The first argument is the address for the Hot Rod Java Client to connect to JDG 7, in this example it is 10.19.137.34. Since the JDG cluster has three nodes (10.19.137.34, 10.19.137.35 and 10.19.137.36), using either one of the IP addresses is fine.

3.5.6. Stop IoT sensor application

To stop the Sensor applications, press ctrl-c or use "kill -9 PID" to stop the process. Otherwise, all 3 applications are set up to run for 24 hours.