Install SAP Data Hub 1.X Distributed Runtime on OpenShift Container Platform

Updated -

SAP Data Hub consists of several components, and one of them is the SAP Data Hub Distributed Runtime, a.k.a. SAP Vora, which should be installed on a Kubernetes cluster. Red Hat OpenShift Container Platform has been validated for running SAP Vora.

We will refer to SAP Data Hub Distributed Runtime with an abbreviation SDH from now on.

Please note that in SDH 1.3 and 1.4, the version of the Vora component is 2.2. Please don't be confused by the versioning convention.

In general, the installation of SDH follows these steps:

  • Install Red Hat OpenShift Container Platform
  • Configure the prerequisites for SAP Data Hub Distributed Runtime
  • Download SAP Data Hub Distributed Runtime installation binaries and run installer
  • Install SAP Data Hub Flow Agent

1. OpenShift Container Platform validation version matrix

The following version combinations of SDH, OCP and RHEL have been validated:

SAP Data Hub OpenShift Container Platform RHEL
1.3 3.7 7.4
1.4 3.9 7.5

Although not validated, other version combinations are supported and listed below in the compatibility matrix.

2. Hardware/VM Requirements

2.1. Persistent Volumes

Persistent storage is needed by SDH. It’s recommended to use storage that can be created dynamically. You can find more information in this document: Dynamic Provisioning and Creating Storage Classes

The size of the storage required by the SAP Vora on OpenShift depends on the storage type.

2.2. Compatibility Matrix

Later versions of SAP Data Hub support newer versions of Kubernetes and OpenShift Container Platform. Even if not listed in the OCP validation version matrix above, the following version combinations are fully supported and considered fully working:

SAP Data Hub OpenShift Container Platform RHEL
1.3 3.7 7.5, 7.4 or 7.3
1.4 3.7 or 3.9 7.5, 7.4 or 7.3

2.3. OpenShift Cluster

The following are the minimum requirements for the OpenShift Cluster Nodes:

  • OS: Red Hat Enterprise Linux 7.5, 7.4 or 7.3
  • CPU: 4 cores
  • Memory: 16GB
  • Diskspace:
    • /var - used in docker configuration:
      • 50GB if you are using statically provisioned storage
      • 20GB if you are using dynamically provisioned storage
    • /var/local - used in Vora installation: /var/local/vora, and /var/local/db
      • 50 GB, if you are using statically provisioned storage
      • no minimum requirement if you are using dynamically provisioned storage
    • 100 GB of free LVM storage for the docker-pool

2.4. Jump Server

For the installation of SAP Data Hub Distributed Runtime, it is highly recommended to do this from an external Jump Server and not from within the OpenShift Cluster, because you need to build the docker images for the SAP Data Hub Distributed Runtime on the Jump Server.

On OpenShift, you need to setup an external registry to install SDH, otherwise the installer might fail due to permission problems or wrong certificates. The Jump Server can also host the external docker registry.

The hardware requirement for the Jump Server can be:

  • OS: Red Hat Enterprise Linux 7.5, 7.4 or 7.3
  • CPU: 2 cores
  • Memory: 16GB
  • Diskspace:
    • / - 15GB to put the work directory and the installation binaries of SAP Vora and SAP Data Hub Flow Agent
    • 50 GB of free LVM storage for the docker-pool

2.5. Hadoop (Optional)

It's optional to install the extensions to the Spark environment on Hadoop. Please refer to Installation Guide for SAP Data Hub - System Landscapes for details. This document doesn't cover the Hadoop part.

3. Install Red Hat OpenShift Container Platform

3.1. Prepare the Subscription and Packages

  1. On each host of the OpenShift cluster, register system using subscription-manager. Look up and then and attach to the pool that provides the OpenShift Container Platform subscription.

    # subscription-manager register --username=UserName --password=Password
    your system is registered with ID: XXXXXXXXXXXXXXXX
    # subscription-manager list --available
    # subscription-manager attach --pool=Pool_Id_Identified_From_Previous_Command
    
  2. Subscribe each host only to the following repositories.

    # subscription-manager repos --disable='*'
    # subscription-manager repos --enable='rhel-7-fast-datapath-rpms' \
        --enable='rhel-7-server-extras-rpms' --enable='rhel-7-server-optional-rpms' \
        --enable='rhel-7-server-rpms'
    
  3. Enable the channel for OpenShift 3.7 or 3.9 on each host.

    # # for OCP 3.7
    # subscription-manager --enable='rhel-7-server-ose-3.7-rpms'
    # $ for OCP 3.9
    # subscription-manager --enable='rhel-7-server-ose-3.9-rpms'
    
  4. Install the following packages on each host.

    # yum -y install curl git net-tools bind-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct
    # yum -y install atomic-openshift-utils ansible openshift-ansible-playbooks docker
    

3.2. Install OpenShift

Install OpenShift Container Platform on your desired cluster hosts. Follow the OpenShift installation guide or use the playbooks for a cloud reference architecture.

IMPORTANT: Make sure to add feature-gates to kublet arguments with the following inventory line:

openshift_node_kubelet_args={'feature-gates':['ReadOnlyAPIDataVolumes=false']}

It will cause all secret and configMap volumes to be mounted in read-write directories in containers. SAP Vora diagnostic pods expect these directories to be write-able and fail to deploy otherwise.

For other installation methods, please make sure to add the following to /etc/origin/node/node-config.yaml files on all the schedule-able nodes:

kubeletArguments:
  feature-gates:
  - ReadOnlyAPIDataVolumes=false

IMPORTANT: Make sure not to set the default node selector. Otherwise, daemon sets will fail to deploy all their pods which will cause the installation to fail. For new advanced installations, comment out lines with osm_default_node_selector. For existing clusters, unset the node selector in /etc/origin/master/master-config.yaml with the following lines:

    projectConfig:
      defaultNodeSelector: ''

NOTE: On AWS you have to label all nodes according to Labeling Clusters for AWS with: openshift_clusterid="Key=kubernetes.io/cluster/,Value=ocp37"

4. Configure the Prerequisites for SAP Data Hub Distributed Runtime (SAP Vora)

4.1. Set up an External Docker Registry

NOTE: On OpenShift, you need to use an external registry, because SAP Vora cannot use the OpenShift internal secured registry until the problem is solved.

  1. On a separate host from the OpenShift cluster, setup an external registry for building and delivering the SAP VORA containers. Please follow the steps in article How do I setup/install a Docker registry?. You can install the docker registry on the Jump Server.

    After the setup you should have an external docker registry with the following url: My_Docker_Registry_FQDN:PORT

  2. Configure docker on the Jump Server

    The SAP Vora installer builds the containers locally on the Jump Server and pushes them to the registry that is later used for installation on OpenShift. In order to push images to the docker registry you need to add the registry to your docker config in /etc/sysconfig/docker and /etc/containers/registries.conf:

    # vi /etc/sysconfig/docker
    ...
    OPTIONS='--selinux-enabled --log-driver=journald --signature-verification=false --insecure-registry=My_Docker_Registry_FQDN:PORT'
    ...
    
    # vi /etc/containers/registries.conf
    ...
    registries:
      - My_Docker_Registry_FQDN:PORT
      - registry.access.redhat.com
    ...
    

    NOTE: The docker registry must be added as an insecure registry using option --insecure-registry.

    Restart the docker daemon

    # systemctl restart docker
    

    NOTE: if you configure docker on the Jump Server as a non-root user, please check Appendix 7.1 for instructions.

4.2. Configure the OpenShift Cluster for Vora

NOTE: Many commands below require cluster admin privileges. To become a cluster-admin, you can do one of the following:

  • Log-in to any master node as the root user and execute following command

    # oc login -u system:admin
    
  • Make any existing user a cluster admin by doing the previous step followed by:

    # oc adm policy add-cluster-role-to-user cluster-admin $USER
    
  • Copy the admin kubeconfig file from a remote master node to a local host and use that:

    # scp master.node:/etc/origin/master/admin.kubeconfig .
    # export KUBECONFIG=$(pwd)/admin.kubeconfig
    # oc login -u system:admin
    

NOTE: For testing purpose you might now set SELinux to permissive in case there is more SELinux configuration needed (setenforce 0). In a production system, please check carefully and add appropriate rules according to your required setup. The following step 1 and 2 are tested to be working for the validated versions of SDH and OCP.

  1. On every (scheduled) Node of the OpenShift cluster, create the following directories and add the proper SELinux fcontext container_file_t to them.

    # mkdir -p /var/local/vora /var/local/db 
    # semanage fcontext -a -t container_file_t /var/local/vora
    # semanage fcontext -a -t container_file_t /var/local/db
    # restorecon -v /var/local/vora
    # restorecon -v /var/local/db
    # ls -Z /var/local  
    

    In the output, verify that the SELinux fcontext has been correctly set on sub-directories db and vora.

  2. On every (scheduled) node of the OpenShift cluster, make sure containers can mount via NFS:

    # setsebool virt_use_nfs true
    
  3. On every (scheduled) node of the OpenShift cluster, change the SELinux security context of file /var/run/docker.sock.

    # semanage fcontext -m -t svirt_sandbox_file_t -f s "/var/run/docker\.sock"
    # restorecon -v /var/run/docker.sock
    

    To make the change permanent, execute the following on all the nodes:

    # cat >/etc/systemd/system/docker.service.d/socket-context.conf <<EOF
    [Service]
    ExecStartPost=/sbin/restorecon /var/run/docker.sock
    EOF
    
  4. Create an OpenShift user for the SAP Vora installation, using the authentication method of your choice. For example, dhadmin.

  5. Create a project in OpenShift. The name of the project will be the namespace for the SAP Vora installation, for example, vora. Login to OpenShift as a cluster-admin, and perform the following configurations for the installation:

    # oc new-project vora
    # oc create sa tiller
    # oc adm policy add-cluster-role-to-user cluster-admin -z tiller
    # oc adm policy add-scc-to-group anyuid "system:serviceaccounts:$(oc project -q)"
    # oc adm policy add-scc-to-group hostmount-anyuid "system:serviceaccounts:$(oc project -q)"
    # oc adm policy add-scc-to-user privileged -z default
    # oc adm policy add-role-to-user admin dhadmin
    # oc adm policy add-cluster-role-to-user system:node-reader dhadmin
    
  6. Verify the service account of tiller:

    # oc get serviceaccounts -n vora
    tiller     2         1    7s    
    

    NOTE: The output should contain a tiller account. Otherwise review the previous step and fix the issue. You need the tiller account for the Vora installation.

  7. As a cluster-admin, allow the project admin to manage SAP VORA custom resources.

    3.7 only! On OCP 3.7, the admin cluster role can be modified with the following command.

    # oc patch --type=json clusterrole admin \
        -p '[{"op":"add", "path":"/rules/-", "value":{
          "apiGroups":["sap.com"],
          "resources":["voraclusters","voracluster","vc"],
          "verbs":["create","delete","get","list","update","watch","patch"]
        }}]'
    

    3.9 only! On OCP 3.9, aggregation rules need to be created. They will indirectly update the corresponding cluster roles:

    # oc create -f - <<EOF
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: aggregate-sapvc-admin-edit
      labels:
        rbac.authorization.k8s.io/aggregate-to-admin: "true"
        rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rules:
    - apiGroups: ["sap.com"]
      resources: ["voraclusters"]
      verbs: ["get", "list", "watch", "create", "update", "patch", "delete", "deletecollection"]
    ---
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: aggregate-sapvc-view
      labels:
        # Add these permissions to the "view" default role.
        rbac.authorization.k8s.io/aggregate-to-view: "true"
    rules:
    - apiGroups: ["sap.com"]
      resources: ["voraclusters"]
      verbs: ["get", "list", "watch"]
    EOF
    

4.3. Prepare the Jump Server

  1. Install a helm client on the Jump Server.

    • Download from https://github.com/kubernetes/helm
    • unpack zip file and copy to your path

      # curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > get_helm.sh
       % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                      Dload  Upload   Total   Spent    Left  Speed
      100  6329  100  6329    0     0  28539      0 --:--:-- --:--:-- --:--:-- 28638
      # chmod 700 get_helm.sh
      # ./get_helm.sh
      Downloading https://kubernetes-helm.storage.googleapis.com/helm-v2.7.0-linux-amd64.tar.gz
      Preparing to install into /usr/local/bin
      helm installed into /usr/local/bin/helm
      Run 'helm init' to configure helm.
      

    See the blog Getting started with Helm on OpenShift for more information.

  2. Download and install kubectl

    # curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
    # chmod +x ./kubectl
    # mv ./kubectl /usr/local/bin/kubectl
    
  3. Set up helm/tiller for the deployment, for example, the namespace is vora:

    # export TILLER_NAMESPACE=vora
    # helm init --service-account=tiller
    

    Wait for a short time till the tiller pod is deployed

    # oc get pods
    NAME                            READY     STATUS    RESTARTS   AGE
    tiller-deploy-551988758-dzjx5   1/1       Running   0          1m
    # helm ls
    [There should be no error in the output. If there is no output at all, it means good news, no error]
    

5. Install SAP Vora on OpenShift

5.1. Download and unpack the SAP VORA binaries

Download and unpack SAP Vora installation binary onto the Jump Server.

  1. Goto SAP Software Download Center, login with your SAP account and search for SAP DATA HUB SP04 or SAP DATA HUB SP03 for versions 1.4 or 1.3 respectively.

  2. Download the SAP Data Hub Distributed Runtime file, for example: DHDISTRUNTIM04_0-80003052.ZIP (SP04 Patch0 for SAP DATA HUB DISTRIB RUNTM 1.0).

    NOTE: The Data Hub Spark Extension is not covered here, because it is not installed on OpenShift. It has to be installed on your Hadoop Cluster.

  3. Unpack the installer file. For example, when you unpack the DHDISTRUNTIM04_0-80003052.ZIP package, it will create the installation folder SAPVora-2.2.48-DistributedRuntime.

    # unzip DHDISTRUNTIM04_0-80003052.ZIP
    

5.1.1. Installation on cluster with 3 nodes in total

IMPORTANT This note is useful for just small PoCs, not for production deployment.

Vora's dlog pod expects at least 3 schedulable nodes without *role=infra* label. This requirement can be mitigated by reducing replication factor of dlog pod with the following patch applied to the runtime directory:

--- SAPVora-2.2.42-DistributedRuntime.orig/deployment/helm/vora-cluster/values.yaml
+++ SAPVora-2.2.42-DistributedRuntime/deployment/helm/vora-cluster/values.yaml
@@ -43,7 +43,7 @@ components:
   dlog:
     storageSize: 50Gi
     bufferSize: 4g
-    replicationFactor: 2
+    replicationFactor: 1
     standbyFactor: 1
     useHostPath: false
     hostPath:

5.2. Install SAP Vora

  1. Run the SAP Vora installer as described in Installing SAP Vora and SAP Data Hub Pipeline Engine.

  2. Best practice examples of Installer Parameters

    The installer parameters can be found in Command Line Parameters (Kubernetes). Depending on the deployment type and storage type, the usage of the parameters may vary.

    --deployment-type=cloud: utilizes dynamic storage provider by default
    --deployment-type=onpremise: utilizes either NFS persistent volumes or hostPath

    Below are some of the best practice examples:

    • Deploying in cloud using dynamic storage provider:

      --deployment-type=cloud
      

      NOTE: If you are using GCE and AWS, you can change the default dynamically provisioned storage by following Changing the Default StorageClass.

    • Deploying in cloud using static storage provider:

      --deployment-type=cloud
      --use-hostpath-for-consul=yes
      --use-hostpath-for-dqp=yes
      
    • Deploying on-premise with static storage: (--use-hostpath-for-consul=no and --use-hostpath-for-dqp=no are default)

      --deployment-type=onpremise
      
    • Deploying on-premise with only one dynamic storage provisioner:

      --deployment-type=onpremise
      --use-hostpath-for-consul=no
      --use-hostpath-for-dqp=no
      
    • Deploying on-premise, when you have multiple dynamic storage provisioners but there is no default defined, you need to specify the storage using --vsystem-storage-class:

      --deployment-type=onpremise
      --use-hostpath-for-consul=no
      --use-hostpath-for-dqp=no
      --vsystem-storage-class=StorageClass In Use
      

      NOTE: If the default dynamic storage provisioner has been defined, the parameter --vsystem-storage-class can be omitted. To define the default dynamic storage provisioner, check this document Changing the Default StorageClass.

    • NFS Persistent Volumes

      The SAP Vora installer can provision NFS persistent volumes, regardless of the deployment type, whether it's cloud or onpremise. It's useful when you wish to utilize persistent volumes but have no dynamic storage provisioner. So if you have a NFS server and want to have the installer provision persistent NFS volumes, use the following parameters:

      --provision-persistent-volumes=yes
      --nfs-address=Address of the NFS server
      --nfs-path=Path on NFS
      --local-nfs-path=Local path where NFS is mounted
      
  3. After a successful installation, create a route for the SAP Vora service. You can find more information in OpenShift documentation Using Wildcard Routes (for a Subdomain).

    • Look up the service, for example, the namespace is vora:

      # oc get services -n vora
      vsystem      172.30.81.230    <nodes>       10002:31753/TCP,10000:32322/TCP     1d
      
    • Create the route:

      # oc create route passthrough --service=vsystem -n vora
      # oc get route -n vora
      NAME      HOST/PORT                      PATH     SERVICES  PORT         TERMINATION   WILDCARD
      vsystem   vsystem-vora.wildcard-domain   vsystem  vsystem   passthrough   None
      
  4. Access the SAP Vora Tools web console at https://vsystem-vora.wildcard-domain.

  5. Validate SAP Vora Installation on OpenShift
    It helps to validate the SAP Vora installation before moving forward. Please follow the instructions in Validate the SAP Vora Installation.

6. Install SAP Data Hub Flow Agent

The SAP Data Hub Flow Agent can be installed before, during or after the Vora installation. This document installs Flow Agent after the Vora installation.

  1. Download the SAP Data Hub - Data Integration package (aka Flow Agent) from the SAP Software Download Center, for example: DHFLOWAGENT04_0-80003551.ZIP (SP04 Patch0 for SAP DATA HUB FLOWAGENT 1.0). And upload the file onto the jump host.
  2. Unpack the package on the jump host and prepare the deployment package.

    # unzip DHFLOWAGENT04_0-80003551.ZIP
    # cd bdh-assembly-vsystem
    # ./prepare.sh 
    

    NOTE: we extract the package in the same directory as the SDH's runtime zip file.

  3. Import the vsolution from the SDH's runtime directory:

    # cd ~/SAPVora-2.2.48-DistributedRuntime
    # oc login -u dhadmin
    # oc project vora
    # ./install.sh --import-vsolution --vsolution-import-path=../bdh-assembly-vsystem
    

NOTE: If you don't have the following environment parameters set, the installer may ask for the values. You can include them in the installer command.

--namespace=
--docker-registry=
--vora-admin-username=
--vora-admin-password=

7. Troubleshooting Tips

7.1. Configure Docker on Jump Server as a non-root user

  • Append -G dockerroot to OPTIONS= in /etc/sysconfig/docker file on your Jump Server.

    # vi /etc/sysconfig/docker
    ...
    OPTIONS='--selinux-enabled --log-driver=journald --signature-verification=false --insecure-registry=My_Docker_Registry_FQDN:PORT -G dockerroot'
    ...
    
  • Run the following commands on the Jump Server, after you modify the /etc/sysconfig/docker file.

    # sudo usermod -a -G dockerroot InstallUserName
    # sudo chown root:dockerroot /var/run/docker.sock
    
  • Log out and Re-log-in to the Jump Server for the changes to become effective.

7.2. How to check, if the template for Service Accounts was applied

There should be a tiller service account in the output.

# oc get sa --all-namespaces
NAMESPACE         NAME                                      SECRETS   AGE
...
vora              builder                                   2         22h
vora              default                                   2         22h
vora              deployer                                  2         22h
vora              tiller                                    2         22h

7.3. SAP Vora Installation Error: render error in "vora-consul/templates/consul.yaml"

Vora Installation Error: render error in "vora-consul/templates/consul.yaml": template: vora-consul/templates/consul.yaml:98:34: executing "vora-consul/templates/consul.yaml" at <index $global.Values...>: error calling index: index of untyped nil

Solution: run the SAP Vora installer with the parameter “--assign-nodes”, for example, in namespace vora.

# install.sh --namespace=vora --docker-registry=My_Docker_Registry_FQDN:PORT --assign-nodes
[... there will be output showing that the installer is doing node assignment ...]
Node assignment is done!

Now run the SAP Vora installer again.

7.4. Vora Installation Error: timeout at “Deploying vora-consul”

Vora Installation Error: timeout at “Deploying vora-consul with: helm install --namespace vora -f values.yaml …”

To view the log messages, you can login to the OpenShift web console, navigate to Applications -> Pods, select the failing pod e.g. vora-consul-2-0, and check the log under the Events tab.

A common error: if the external docker registry is a insecure registry, but the OpenShift cluster is configured to pull from a secure registry, you will see errors in the log. If secure registry is not feasible, follow the commands below to configure every (scheduled) Node of the OpenShift cluster to use insecure registry. Please note that insecure registry is not recommended for production environment.

# vi /etc/sysconfig/docker
INSECURE_REGISTRY='--insecure-registry My_Docker_Registry_FQDN:PORT’
# systemctl daemon-reload;systemctl restart docker

You can now test pulling the image from the docker registry, if it succeeds, you can re-try the installation.

# docker pull My_Docker_Registry_FQDN:PORT/vora/consul:0.9.0-sap10

7.5. Clean up Failed Installation, for example, namespace is `vora`

# install.sh --purge --force-deletion --namespace=vora --docker-registry=My_Docker_Registry_FQDN:PORT

7.6. Un-install Helm

In the following example, the namespace is vora

# export TILLER_NAMESPACE=vora
# helm reset

8. Additional resources

Comments