SAP Data Hub 2 on OpenShift Container Platform 3

Updated -

Table of Contents

In general, the installation of SAP Data Hub Foundation (SDH) follows these steps:

  • Install Red Hat OpenShift Container Platform
  • Configure the prerequisites for SAP Data Hub Foundation
  • Install SAP Data Hub Foundation on OpenShift Container Platform

The last step has three different approaches listed below. Each approach is compatible with this guide. Please refer to the SAP's documentation for more information (2.7) / (2.6) / (2.5) / (2.4) / (2.3).

  • mpsl - installation using the Maintenance Planner and SL Plugin (recommended by SAP)
  • mpfree - installation using SL Plugin without Maintenance Planner
  • manual - manual installation using an installation script

If you're interested in installation of older SDH or SAP Vora releases, please refer to the other installation guides:

1. OpenShift Container Platform validation version matrix

The following version combinations of SDH 2.X, OCP, RHEL have been validated:

SAP Data Hub OpenShift Container Platform RHEL for OCP Installation Infrastructure and (Storage)
2.3 3.10, 3.9 7.6, 7.5 AWS (EBS), RHV (Ceph RBD)
2.4 3.10 7.6 AWS (EBS)
2.4 3.11 7.6 AWS (EBS, Ceph RBD, OCS 3.11 ), VMware vSphere (Ceph RBD)
2.5 3.10 7.6 AWS (Ceph RBD, EBS)
2.5 3.11 7.6 AWS (EBS), VMware vSphere (Ceph RBD)
2.5 Patch 1 3.11 7.6 AWS (EBS), Bare metal (Ceph RBD), VMware vSphere (Ceph RBD)
2.6 3.11 7.7 AWS (EBS), Bare metal (Ceph RBD)
2.6 Patch 1 3.11 7.7 AWS (EBS), Bare metal (Ceph RBD)
2.7 3.11 7.7 KVM/libvirt (Ceph RBD)
2.7 Patch 1 3.11 7.7 KVM/libvirt (Ceph RBD)
2.7 Patch 3 3.11 7.7 IBM Cloud™ (IBM Cloud Block Storage), KVM/libvirt (OCS 3.11 )

Gluster Block Storage with gluster.org/glusterblock-glusterfs PV provisioner was used.

Please refer to compatibility matrix for version combinations that are considered as working.
If you are looking for OCP releases 4.1 or higher, please refer to SAP Data Hub 2 on OpenShift Container Platform 4.

For more information on OCP on IBM Cloud™, please refer to Getting started with Red Hat OpenShift on IBM Cloud. If using this platform, then you may jump directly to the chapter Install Red Hat OpenShift Container Platform right away because you don't need to install OpenShift.

2. Requirements

2.1. Hardware/VM and OS Requirements

2.1.1. OpenShift Cluster

Make sure to consult the following official cluster requirements corresponding to your release:

2.1.1.1. Worker Nodes

The following are the minimum requirements for the OpenShift Worker Nodes for proof-of-concept deployments for the latest validated SDH and OCP 3.X releases:

  • OS: Red Hat Enterprise Linux 7.7, 7.6, 7.5, 7.4 or 7.31
  • CPU: 4 virtual cores
  • Memory: 32GB
  • Diskspace:
    • 230GiB for / including at least
      • 90 GiB for /var/lib/docker
      • 100 GiB for /var/lib/origin/openshift.local.volumes for (Ephemeral volume storage for pods)
    • at least 500GiB for persistent volumes if hosting persistent storage

The minimum number of Worker Nodes is 3. SDH can be deployed on a cluster with just 2 Worker Nodes as well with additional installation parameters.

2.1.1.2. Master Nodes

The following are the minimum requirements for the OpenShift Master nodes for proof-of-concept deployments for the latest validated SDH and OCP 3.X releases:

  • OS: Red Hat Enterprise Linux 7.7, 7.6, 7.5, 7.4 or 7.3
  • CPU: 4 virtual cores
  • Memory: 16GB
  • Diskspace:
    • 100GiB for / including at least
      • 50 GiB for /var/lib/docker

Under the following assumptions:

The minimum number of Master/Infra Nodes is 1.

2.1.2. Jump host

It is recommended to do the installation of SAP Data Hub Foundation from an external Jump host and not from within the OpenShift Cluster.

The Jump host is used among other things for:

The hardware requirements for the Jump host can be:

  • OS: Red Hat Enterprise Linux 7.7, 7.6, 7.5, 7.4 or 7.3
  • CPU: 2 cores
  • Memory: 4GB
  • Diskspace:
    • 75GiB for /:
      • to store the work directory and the installation binaries of SAP Data Hub Foundation
      • including at least 50 GiB for /var/lib/docker
    • Additional 50 GiB for registry's storage if hosting image registry (by default at /var/lib/registry).

NOTE: It is of course possible not to have a dedicated Jump host and instead, run the installation from one of the OCP cluster hosts - ideally from one of the master nodes. In that case please run all the commands meant for the Jump host on the host of your choice.

2.2. Software Requirements

2.2.1. Compatibility Matrix

Later versions of SAP Data Hub support newer versions of Kubernetes and OpenShift Container Platform. Even if not listed in the OCP validation version matrix above, the following version combinations are considered fully working:

SAP Data Hub OpenShift Container Platform RHEL for OCP Installation Storage
2.3 3.9 7.6, 7.5, 7.4, 7.3 Ceph RBD, OCS , cloud2
2.3 3.10 7.6, 7.5, 7.4 Ceph RBD, OCS , cloud
2.5, 2.4 3.10 7.6, 7.5, 7.4 Ceph RBD, OCS , cloud
2.7, 2.6, 2.5, 2.4 3.11 7.8, 7.7, 7.6, 7.5, 7.4 Ceph RBD, OCS, cloud

Storage option marked with is compatible, however, not supported. Please refer to OCS and OCP interoperability matrix for more information. The only supported OCS version for SDH is 3.11.

Unless stated otherwise, the compatibility of a listed SDH version covers all its patch releases as well.

2.2.2. Prepare the Subscription and Packages

  • On each host of the OpenShift cluster, register system using subscription-manager. Look up and then and attach to the pool that provides the OpenShift Container Platform subscription.

    # subscription-manager register --username=UserName --password=Password
    your system is registered with ID: XXXXXXXXXXXXXXXX
    # subscription-manager list --available
    # subscription-manager attach --pool=Pool_Id_Identified_From_Previous_Command
    
  • Subscribe each host only to the following repositories.

    # subscription-manager repos --disable='*' \
        --enable=rhel-7-server-rpms \
        --enable=rhel-7-server-extras-rpms \
        --enable=rhel-7-fast-datapath-rpms \
        --enable=rhel-7-server-ose-3.9-rpms \
        --enable=rhel-7-server-ansible-2.4-rpms
    

    For OCP 3.10, please use the following commands:

    # subscription-manager repos --disable='*' \
        --enable=rhel-7-server-rpms \
        --enable=rhel-7-server-extras-rpms \
        --enable=rhel-7-server-ose-3.10-rpms \
        --enable=rhel-7-server-ansible-2.4-rpms
    

    For OCP 3.11, please use the following commands:

    # subscription-manager repos --disable='*' \
        --enable=rhel-7-server-rpms \
        --enable=rhel-7-server-extras-rpms \
        --enable=rhel-7-server-ose-3.11-rpms \
        --enable=rhel-7-server-ansible-2.6-rpms
    
  • Additionally, if you plan to use Ceph RBD as a storage, make sure to enable also the following repository so that Ceph client tools installed to OCP cluster are of the same version as the Ceph packages on server.

    # subscription-manager repos --enable=rhel-7-server-rhceph-3-tools-rpms
    
  • Please follow the host preparation documentation (3.11) / (3.10) / (3.9) corresponding to your cluster version.

2.2.3. Persistent Volumes

Persistent storage is needed for SDH. It’s required to use storage that can be created dynamically. You can find more information in the Dynamic Provisioning and Creating Storage Classes document.

Recommended and tested storage options for on-premise installations are:

For installations in cloud, please use the recommended provider for particular cloud platform.

2.2.3.1. Ceph RBD

To make use of an existing Ceph cluster in OCP, please follow the Using Ceph RBD for dynamic provisioning guide (3.11) / (3.10) / (3.9).

Please refer to Installing the Ceph Object Gateway on enabling S3 object interface for Checkpoint store enablement.

For production environment, see the supported configurations.

For small-to-medium sized test and pre-production deployments, Civetweb interface can be utilized. The S3 API endpoint must either be secured by a signed certificate (not self-signed) or insecure. In the latter case, scheme must be present in the endpoint URL specified during the installation (e.g. http://s3.172.18.0.85.nip.io:8080).

NOTE: Wildcard DNS does not need to be configured for the checkpoint store.

The following is an example configuration of a rados gateway residing in a Ceph configuration file (e.g. /etc/ceph/ceph.conf) for small-to-medium sized test and pre-production deployments.

[client.rgw.ip-172-18-0-85]
host = 0.0.0.0
keyring = /var/lib/ceph/radosgw/ceph-rgw.ip-172-18-0-85/keyring
log file = /var/log/ceph/ceph-rgw-ip-172-18-0-85.log
#rgw frontends = civetweb port=443s ssl_certificate=/etc/ceph/radosgw-172.18.0.85.nip.io.pem num_threads=100
rgw frontends = civetweb port=8080 num_threads=100
debug rgw = 1
rgw_enable_apis = s3
rgw_dns_name = ip-172-18-0-85.internal
rgw_resolve_cname = false

In this case the Amazon S3 Host input parameter shall be set to http://ip-172-18-0-85.internal:8080.

Once the Ceph Storage and OCP are deployed, please continue to Ceph integration.

2.2.3.2. OCS (OpenShift Container Storage) / Gluster

Can be deployed into the OpenShift cluster at the time of OCP installation or afterwards using either converged or independent mode. An existing standalone Gluster Storage deployed outside of the OpenShift cluster can be utilized as well. For various installation options, please refer to the OCP documentation.

For SDH, the only supported provisioners are gluster.org/glusterblock and gluster.org/glusterblock-glusterfs that provision gluster-block volumes. If the openshift-ansible scripts are used to deploy OCS and the openshift_storage_glusterfs_name parameter is not overridden, it is represented by the glusterfs-storage-block storage class.

The regular glusterfs volumes provisioned by kubernetes.io/glusterfs do not perform well for Data Hub. Its use will most probably result in the following problems described in the SAP Note #2755247.

Due to a known slow performance issue with GlusterFS storage on OpenShift 3.9 and 3.10 the vflow pod fails to start up after exceeding the 3 minute liveness probe timeout. Or if the vflow pod is running it spends a long time retrieving objects from the underlying storage.

opeshift-ansible Inventory Configuration

Below are several parameters for the inventory file used during the advanced installation. The inventory can be also used post OCP installation for the deployment of OCS alone. See Persistent Storage Using Red Hat Gluster Storage for instructions and examples.

The following is the minimum set of inventory arguments needed for the OCS installation (converged mode):

[OSEv3:children]
...
glusterfs

[glusterfs]
# the node names must be listed in the [nodes] section as well
# make sure to check for the right device names representing bare block devices with no partitions and no LVM PVs
node11.example.com glusterfs_devices='[ "/dev/xvdc", "/dev/xvdd" ]'
node12.example.com glusterfs_devices='[ "/dev/xvdc", "/dev/xvdd" ]'
node13.example.com glusterfs_devices='[ "/dev/xvdc", "/dev/xvdd" ]'

[OSEv3:vars]
...
openshift_storage_glusterfs_image="registry.access.redhat.com/rhgs3/rhgs-server-rhel7:v3.11"
openshift_storage_glusterfs_block_image="registry.access.redhat.com/rhgs3/rhgs-gluster-block-prov-rhel7:v3.11"
openshift_storage_glusterfs_heketi_image="registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7:v3.11"
openshift_storage_glusterfs_block_deploy=true
openshift_storage_glusterfs_block_host_vol_size=135
openshift_storage_glusterfs_block_storageclass=true

The *_image variables are mandatory starting from OCP 3.10. They must contain a specific :<version> tag. The :latest tag cannot be used. To see the complete list of image tags, execute the following command.

# # make sure to install the following packages first: skopeo, jq
# sudo skopeo inspect 'docker://registry.redhat.io/rhgs3/rhgs-server-rhel7' | \
    jq -r '.RepoTags[] | sub("^(?<a>[^.]+\\.[^.]+).*"; "\(.a)")' | sort -uV
3.1
...
v3.11

The parameter openshift_storage_glusterfs_block_host_vol_size specifies the size in GiB for glusterfs volumes hosting the block volumes. It effectively determines the maximum size of a block volume that can be provisioned. Since SAP HANA requires at least 128GiB big volume, it is recommended to set this parameter at least to 132.

Optionally, make the block storage class the default with:

openshift_storage_glusterfs_block_storageclass_default=true

Please refer to the Gluster Role Variables for more options.

SELinux Enablement

If a standalone Gluster Storage is being used, additional permissions need to be granted by enabling the following SELinux booleans on all the schedulable nodes:

# setsebool -P virt_sandbox_use_fusefs=on virt_use_fusefs=on

IMPORTANT: Gluster's S3 API cannot be currently used for checkpoint store. We are working on resolving the issues. If you demand this feature, please fill an RFE.

2.2.4. Checkpoint store enablement

In order to enable SAP Vora Database streaming tables, checkpoint store needs to be enabled. The store is an object storage on a particular storage back-end. Several back-end types are supported by the SDH installer that cover most of the storage cloud providers. For IBM Cloud™, please follow Preparing IBM Cloud IBM Cloud Object Storage for SAP Vora Checkpoint. For on-premise deployments, Ceph RBD can be utilized with their S3 interfaces.

The enablement is strongly recommended for production clusters. Clusters having this feature disabled are suitable for development or as PoCs.

Make sure to create a desired bucket before the SDH Installation. If the checkpoint store shall reside in a directory on a bucket, the directory needs to exist as well.

2.2.5. External Image Registry

The SDH installation requires an Image Registry where images are first mirrored from an SAP Registry and then delivered to the OCP cluster nodes. The integrated OpenShift Container Registry is not appropriate for this purpose or may require further analysis. For now, an external image registry needs to be setup instead.

On AWS, it is recommended to utilize Amazon Elastic Container Registry. Please refer to Using AWS ECR Registry for the Modeler for a post-configuration step to enable the registry for the Modeler.

On IBM Cloud™, you can utilize the image container registry provided by the platform.

If an external registry is not provided by your platform or not feasible, it needs to be deployed manually. As an example, the Jump host can be used to host the registry. Please follow the steps in article How do I setup/install a Docker registry?.

After the setup you should have an external image registry up and running at the URL My_Image_Registry_FQDN:5000. You can verify that with the following command.

# curl http://My_Image_Registry_FQDN:5000/v2/
{}

Make sure to mark the address as insecure.

Additionally, if using Kaniko Image Builder, make sure to mark the registry as insecure within the Pipeline Modeler.

2.2.5.1. Update the list of insecure registries
  • Since the external image registry deployed above is insecure by default, in order to push images to the image registry and pull them on nodes it must be listed as insecure in /etc/containers/registries.conf file on all the hosts, including the Jump host:

    # vi /etc/containers/registries.conf
    ...
    [registries.insecure]
    registries = [
      "My_Image_Registry_FQDN:5000"
    ]
    ...
    
  • For the changes to become effective, restart the docker daemon:

    # systemctl restart docker
    

If you plan to run the installation as a non-root user, please check the instructions below for additional steps.

During the advanced installation of the OCP, make sure to include the My_Image_Registry_FQDN:5000 among openshift_docker_insecure_registries.

NOTE These settings have no effect on the Kaniko Image Builder, which also needs to be aware of the insecure registry. Please refer to Marking the vflow registry as insecure for more information.

2.2.5.2. Update proxy settings

If there's a mandatory proxy in the cluster's network, make sure to include the My_Image_Registry_FQDN in the NO_PROXY settings in addition to the recommended NO_PROXY addresses (3.11) / (3.10) / (3.9).

Additionally, during the advanced installation (3.11) / (3.10) / (3.9) of the OCP, you should include My_Image_Registry_FQDN in openshift_no_proxy variable.

2.2.6. (Optional) Hadoop

It's optional to install the extensions to the Spark environment on Hadoop. Please refer to SAP Data Hub Spark Extensions on a Hadoop Cluster (2.7) / (2.6) / (2.5) / (2.4) / (2.3) for details. This document doesn't cover the Hadoop part.

3. Install Red Hat OpenShift Container Platform

If installing SAP Data Hub on IBM Cloud™, please follow the instructions that you find in Deploying your OpenShift cluster and jump host.

3.1. Prepare the Jump host

  1. Ideally, subscribe to the same repositories as on the cluster hosts.
  2. Install a helm client on the Jump host.

    • Download a script from https://github.com/helm/helm and execute it with the desired version set to the latest major release of helm for your SDH release. That is v2.11.0 for SDH 2.7, 2.6, 2.5 and 2.4 and v2.9.1 for SDH 2.3.

      # DESIRED_VERSION=v2.11.0   # or v2.9.1 for SDH releases 2.3.*
      # curl --silent https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | \
          DESIRED_VERSION="${DESIRED_VERSION:-v2.11.0}" bash
      
    • See the blog Getting started with Helm on OpenShift for more information.

  3. Download and install kubectl.

    • Either via standard repositories by installing atomic-openshift-clients:

      # sudo yum install -y atomic-openshift-clients
      

      NOTE: rhel-7-server-ose-X.Y-rpms repositories corresponding to the same major release version (e.g. 3.10) as on the cluster nodes need to be enabled.

    • Or by downloading and installing the binary manually after determining the right version (e.g. the latest v1.11 for OCP 3.11 cluster):

      # curl -LO https://dl.k8s.io/release/v1.11.10/bin/linux/amd64/kubectl
      # chmod +x ./kubectl
      # sudo mv ./kubectl /usr/local/bin/kubectl
      
  4. In case of mpsl and mpfree SDH installation methods, make sure to install and run the SAP Host Agent (2.7) / (2.6) / (2.5) / (2.4) / (2.3) as well.

    • However, in step 4, instead of downloading a *.SAR archive, as suggested by the guide, on RHEL it is recommended to download the latest RPM package (e.g. saphostagentrpm_40-20009394.rpm) and install it on the Jump host using a command like:

      # yum install saphostagentrpm_40-20009394.rpm
      

      NOTE (2.4 only): SAP Host Agent Patch Level (PL) 40 or higher with a self-signed SSL certificate is required.

    • This way, the installation of SAPCAR listed in prerequisites is not needed.

    • Step 6 (SAR archive extraction) can then be skipped.
    • In the step 7, the command then needs to be modified to:

      # cd /usr/sap/hostctrl/exe
      # ./saphostexec -setup slplugin -passwd
      
    • Additionally, make sure to set the password for sapadm user. You will be prompted for the username and password by the maintenance planner.

      # passwd sapadm
      

3.2. Install OpenShift Container Platform

This section can be skipped if using managed OpenShift platform or the cluster is already deployed.

Install OpenShift Container Platform on your desired cluster hosts. Follow the OpenShift installation guide (3.11) / (3.10) / (3.9) or use the playbooks for a cloud reference architecture.

NOTE: On AWS you have to label all nodes according to Labeling Clusters for AWS with openshift_clusterid="<clusterid>" where <clusterid> part matches the same part in the tag kubernetes.io/cluster/<clusterid>,Value=(owned|shared) of resources belonging to the cluster.

Important Advanced Installation Variables

Variable Description
openshift_release must contain one of 3.11, 3.10, 3.9
openshift_deployment must be set to openshift-enterprise
openshift_docker_insecure_registries shall contain the URL of the external image registry (My_Image_Registry_FQDN:5000)3
openshift_https_proxy, openshift_http_proxy shall be set up according to internal network policies
openshift_no_proxy if the proxy settings are set and the registry is deployed in the internal network, it must contain My_Image_Registry_FQDN
openshift_cloudprovider_kind the name of the target cloud provider if deploying in cloud (e.g. aws, azure, openstack or vsphere)
openshift_clusterid needs to be set only for AWS unless using IAM profiles4
openshift_master_default_subdomain the subdomain used for exposed routes5
oreg_auth_user and oreg_auth_password mandatory since 3.11 for the default registry.redhat.io registry, older releases continue to pull from registry.access.redhat.com
os_sdn_network_plugin_name set to redhat/openshift-ovs-multitenant if the cluster shall run multiple instances of SDH or another workloads

Please refer to OCP / Gluster section for additional parameters related to OCS if you plan on deploying it.

3.2.1. (OCP 3.11 only) Verify access to the Red Hat Registry

If using the default registry.redhat.io registry, verify you have access to it before launching the installation like this:

# sudo docker login -u $REDHAT_PORTAL_ACCOUNT registry.redhat.io
# sudo skopeo inspect docker://registry.redhat.io/openshift3/ose-pod
{
        "Name": "registry.redhat.io/openshift3/ose-pod",
        ...
}

3.3. (Optional) Validate the OpenShift cluster

Before continuing with the installation, you may want to double-check that the OpenShift cluster is healthy to rule out any possible issues resulting from misbehaving or misconfigured OpenShift cluster.

Please follow one of the health-check guides corresponding to you cluster version:

  1. Environment Health Checks for 3.11
  2. Environment Health Checks for 3.10
  3. Environment Health Checks for 3.9

3.4. OCP Post Installation Steps

3.4.1. Configure Dynamic Storage Provider

For cloud deployment, the default dynamic storage provisioner should already be in place. For example, on AWS, gp2 will be most probably configured as the default storage class:

# oc get sc
NAME            PROVISIONER                AGE
gp2 (default)   kubernetes.io/aws-ebs      7d

For IBM Cloud™ specifics, please refer to Choosing the Dynamic Storage Provisioner.

For on-premise installations, a suitable storage provisioner needs to be considered and deployed. Please refer to the validated provisioners listed above.

In case of OCS / Gluster, the provisioner and storage class can be deployed during OCP installation. In that case, you can skip this step.

3.4.2. Set up an External Image Registry

If you haven't done so already, please follow the External Image Registry prerequisite.

If installing SAP Data Hub on IBM Cloud™, please follow the steps Setting up the IBM Cloud Container Registry.

3.4.3. Configure the OpenShift Cluster for SDH

3.4.3.1. Becoming a cluster-admin

Many commands below require cluster admin privileges. To become a cluster-admin, you can do one of the following:

  • Copy the admin.kubeconfig file from a remote master node to a local host and use that:

    # scp master.node:/etc/origin/master/admin.kubeconfig .
    # export KUBECONFIG=$(pwd)/admin.kubeconfig
    # oc login -u system:admin
    

    This is recommended for mpsl and mpfree installation methods when using the Jump host.

    NOTE: the same file is used as the KUBECONFIG File input parameter for mpsl and mpfree installation methods.

  • Log-in to any master node as the root user, execute the following command and continue the installation as system:admin user. In this case, the master node becomes a Jump host.

    # oc login -u system:admin
    

    Possible for all installation methods without a Jump host.

  • (manual installation method only) Make any existing user (dhamin in this example) a cluster admin by doing the previous step followed by:

    # oc adm policy add-cluster-role-to-user cluster-admin dhadmin
    

You can learn more about the cluster-admin role in Cluster Roles and Local Roles article.

3.4.3.2. Project setup
3.4.3.2.1. Enable NFS in containers

On every schedulable node of the OpenShift cluster, make sure the NFS filesystem is ready for use by loading the nfsd kernel module and enabling it with SELinux boolean:

# setsebool virt_use_nfs true
3.4.3.2.2. Pre-load kernel modules

On every schedulable node, make sure to load the ipt_REDIRECT kernel module.

# modprobe ipt_REDIRECT
# echo "ipt_REDIRECT" > /etc/modules-load.d/ipt_redirect.conf
3.4.3.2.3. Permit access to docker socket

(OCP 3.9 or older) Unless using kaniko builds, on every schedulable node of the OpenShift cluster, permit vflow pod to access /var/run/docker.sock.

3.4.3.2.4. Allow administrator to manage SDH resources

As a cluster-admin, allow the project administrator to manage SDH custom resources.

# oc create -f - <<EOF
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: aggregate-sapvc-admin-edit
  labels:
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
rules:
- apiGroups: ["sap.com"]
  resources: ["voraclusters"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete", "deletecollection"]
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: aggregate-sapvc-view
  labels:
    # Add these permissions to the "view" default role.
    rbac.authorization.k8s.io/aggregate-to-view: "true"
rules:
- apiGroups: ["sap.com"]
  resources: ["voraclusters"]
  verbs: ["get", "list", "watch"]
EOF
3.4.3.2.5. Create privileged tiller service account

As a cluster-admin, create a tiller service account in the kube-system project (aka namespace) and grant it the necessary permissions:

# oc create sa -n kube-system tiller
# oc adm policy add-cluster-role-to-user cluster-admin -n kube-system -z tiller
3.4.3.2.6. Initialize helm

Set up helm and tiller for the deployment:

# helm init --service-account=tiller --upgrade --wait

Upon successful initialization, you should be able to see a tiller pod in the kube-system namespace:

# oc get pods -n kube-system
NAME                            READY     STATUS    RESTARTS   AGE
tiller-deploy-551988758-dzjx5   1/1       Running   0          1m
# helm ls
[There should be no error in the output. If there is no output at all, it means good news, no error]
3.4.3.2.7. Create sdh project

Create a dedicated project in OpenShift for the SDH deployment. For example sdh. Login to OpenShift as a cluster-admin, and perform the following configurations for the installation:

# oc new-project sdh
# oc adm policy add-scc-to-group anyuid "system:serviceaccounts:$(oc project -q)"
# oc adm policy add-scc-to-group hostmount-anyuid "system:serviceaccounts:$(oc project -q)"
# oc adm policy add-scc-to-user privileged -z "vora-vsystem-$(oc project -q)"
# oc adm policy add-scc-to-user privileged -z "vora-vsystem-$(oc project -q)-vrep"
# oc adm policy add-scc-to-user privileged -z "$(oc project -q)-elasticsearch"
# oc adm policy add-scc-to-user privileged -z "$(oc project -q)-fluentd"
# oc adm policy add-scc-to-user privileged -z "default"
# oc adm policy add-scc-to-user privileged -z "vora-vflow-server"
# oc adm policy add-scc-to-user hostaccess -z "$(oc project -q)-nodeexporter"
# oc patch namespace "$(oc project -q)" -p '{"metadata":{"annotations":{"openshift.io/node-selector":""}}}'
3.4.3.2.8. Granting privileges to sdh admin

(Optional) (Until SDH 2.5) In case of the manual installation, you may want to execute the installation as a regular user (not as a cluster-admin). You can either create a new OpenShift user or delegate the installation procedure to an existing OCP user. In any case, the user needs to be granted the following roles in the chosen target project (in this case dhadmin is the user name and sdh is the target project):

# oc adm policy add-role-to-user -n sdh admin dhadmin
# oc adm policy add-cluster-role-to-user system:node-reader dhadmin

NOTE: Starting from SDH 2.5, the regular user cannot perform the installation anymore. It needs to be performed by a cluster-admin.

3.4.3.2.9. Deploy SDH Observer

(OCP 3.10 or higher) Deploy sdh-observer in the sdh namespace. Please follow the appendix Deploy SDH Observer .

4. Install SDH on OpenShift

4.1. Required Input Parameters

A few important installation parameters are described below. Please refer to the official documentation (2.7) / (2.6) / (2.5) / (2.4) / (2.3) for their full description. Most of the parameters must be provided for the mpsl and mpfree installation methods. The Command line argument column describes corresponding options for the manual installation.

Name Condition Recommendation Command line argument
Kubernetes Namespace Always Must match the project name chosen in the Project Setup (e.g. sdh) -n sdh
Installation Type Installation or Update Choose Advanced Installation if you need to specify proxy settings or you want to choose particular storage class or there is no default storage class set. None
KUBECONFIG File Always The path to the kubeconfig file on the Jump host. It is the same file as described in Becoming a cluster-admin. If the SAP Host Agent is running on the master host, it can be set to /root/.kube/config. None6
Container Registry Installation Must be set to the external image registry. -r My_Image_Registry_FQDN:5000
Certificate Domain7 Installation Shall be set either to 1. the FQDN of the vsystem route,2. the wildcard domain matching the master default subdomain or 2. the external FQDN of the OpenShift node used to access the vsystem service if using NodePort. 1. --cert-domain vsystem-sdh.wildcard-domain 2. --cert-domain "*.wildcard-domain" 3. --cert-domain master.example.com
Cluster Proxy Settings Advanced Installation or Advanced Updates Make sure to configure this if the traffic to internet needs to be routed through a proxy. None
Cluster HTTP Proxy Advanced Installation or Advanced Updates Make sure to set this if the traffic to internet needs to be routed through a proxy. --cluster-http-proxy "$HTTP_PROXY"
Cluster HTTPS Proxy Advanced Installation or Advanced Updates Make sure to set this if the traffic to internet needs to be routed through a proxy. --cluster-https-proxy "$HTTPS_PROXY"
Cluster No Proxy Advanced Installation or Advanced Updates Make sure to set this if the traffic to internet needs to be routed through a proxy. --cluster-https-proxy "$NO_PROXY"
StorageClass Configuration Advanced Installation Configure this if you want to choose different dynamic storage provisioners for different SDH components or if there's no default storage class set or you want to choose non-default storage class for the SDH components. None
Default StorageClass Advanced Installation and if storage classes are configured Set this if there's no default storage class set or you want to choose non-default storage class for the SDH components. --pv-storage-class ceph-rbd
Additional Installer Parameters Advanced Installation Useful for making the SDH deploy to just 2 worker nodes, reducing the minimum memory requirements of the HANA pod, etc.(Starting from 2.5) Can be used to enable Kaniko Image Builder. -e vora-cluster.components.dlog.replicationFactor=2--enable-kaniko=yes

4.2. Kaniko Image Builder

NOTE Available starting from SDH 2.5.

By default, Pipeline Modeler (vflow) pod uses Docker Daemon on the node, where it runs, to build container images before they are run as graphs. This poses a security threat because in order for the vflow pod to mount it, one of the following must hold true:

  • the pod is run as privileged or super privileged
  • (OCP 3.9 or older) the SELinux context of the Docker socket must be modified, otherwise the mount will be denied by the enforcing SELinux policy

Additionally, Docker itself becomes a mandatory dependency, preventing from the use of another container runtime (e.g. CRI-O).

To make the platform secure and independent of the underlying container runtime, it is possible to configure the SDH to use Kaniko Image Builder. This is enabled with --enable-kaniko=yes parameter passed to the install.sh script during the manual installation. For the other installation methods, one can enable it by appending --enable-kaniko=yes to SLP_EXTRA_PARAMETERS (Additional Installation Parameters).

4.2.1. Registry requirements for the Kaniko Image Builder

The Kaniko Image Builder supports out-of-the-box only connections to secure image registries with a certificate signed by a trusted certificate authority.

In order to use an insecure image registry (e.g. the proposed external image registry) in combination with the builder, the registry must be whitelisted in Pipeline Modeler by marking it as insecure.

4.3. Installation using the Maintenance Planner and SL Plugin (mpsl)

Is a web-based installation method recommended by SAP offering you an option to send analytics data and feedback to SAP. All the necessary prerequisites have been satisfied by applying all the steps described above. The Installation using the Maintenance Planner and SL Plugin (2.7) / (2.6) / (2.5) / (2.4) / (2.3) documentation will guide you through the process.

4.4. Installation using SL Plugin without Maintenance Planner (mpfree)

Is an alternative command-line-based installation method. Please refer to the SAP Data Hub documentation (2.7)) / (2.6) / (2.5) / (2.4) / (2.3) for more information and the exact procedure.

SDH 2.3 Note

If you have prepared the installation host according to the prior instructions, you will have sapcar binary available at /usr/sap/hostctrl/exe/SAPCAR. Thus the extraction step 2.b) in the procedure will look like:

# /usr/sap/hostctrl/exe/SAPCAR -xvf /tmp/download/SLPUGIN<latest SP-version>.SAR /tmp/slplugin/bin

4.5. Manual Installation using an installation script (manual)

4.5.1. Download and unpack the SDH binaries

Download and unpack SDH installation binary onto the Jump host.

  1. Go to SAP Software Download Center, login with your SAP account and search for SAP DATA HUB 2 or access this link.

  2. Download the SAP Data Hub Foundation file, for example: DHFOUNDATION03_3-80004015.ZIP (SAP DATA HUB - FOUNDATION 2.3) or DHFOUNDATION04_0-80004015.ZIP (SAP DATA HUB - FOUNDATION 2.4).

  3. Unpack the installer file. For example, when you unpack the DHFOUNDATION04_0-80004015.ZIP package, it will create the installation folder SAPDataHub-2.4.63-Foundation.

    # unzip DHFOUNDATION04_0-80004015.ZIP
    
4.5.1.1. Note on Installation on cluster with 3 nodes in total

This note is useful for just a small proof-of-concept, not for production deployment.

SDH's dlog pod expects at least 3 schedulable compute nodes that are neither master nor infra nodes. This requirement can be mitigated by reducing replication factor of dlog pod with the following patch applied to the foundation directory. See below for instructions on making the installer cope with just 2 compute nodes.

4.5.1.2. Patch the fluentd daemonset deployment files

In order to allow diagnostics-fluentd pods to access /var/log directories on the nodes, the diagnostics-fluentd daemonset needs to be patched to run as privileged.

To achieve that, the easiest solution is to deploy SDH Observer for OCP releases 3.10 or higher.
For older OCP releases, you can patch the deployment files now - before the Data Hub installation. In order to do so, please change to the unzipped foundation directory and follow the instructions Grant fluentd pods permissions to logs.

4.5.2. Install SAP Data Hub

4.5.2.1. Remarks on the installation options
Feature Installation method Parameter Name
Storage Class Manual --pv-storage-class="$storage_class_name"
Advanced mpsl or mpfree installations Default storage class
Kaniko Image Builder Manual --enable-kaniko=yes
Advanced mpsl or mpfree installations Additional Installation Parameters or SLP_EXTRA_PARAMETERS

Storage Class

When there is no default dynamic storage provisioner defined, the preferred one needs to be specified explicitly.
If the default dynamic storage provisioner has been defined, the parameter can be omitted. To define the default dynamic storage provisioner, please follow the document Changing the Default StorageClass.

Kaniko Image Builder

Can be enabled starting from SDH 2.5. During mpsl and mpfree installation methods, one needs to append --enable-kaniko=yes to the list of Additional Installation Parameters or SLP_EXTRA_PARAMETERS.
See Kaniko Image Builder for more information.

4.5.2.2. Executing the installation script

Run the SDH installer as described in Manually Installing SAP Data Hub on the Kubernetes Cluster (2.7) / (2.6) / (2.5) / (2.4) / (2.3).

For IBM Cloud™ specific installation parameters, please visit Running the installation shell script.

4.6. SDH Post installation steps

4.6.1. (Optional) Expose SDH services externally

There are multiple possibilities how to make SDH services accessible outside of the cluster. Compared to Kubernetes, OpenShift offers additional method, which is recommended for most of the scenarios including SDH services. It's based on OpenShift Router and routes. The other methods documented in the official SAP Data Hub documentation are still available.

4.6.1.1. Using OpenShift Router and routes

OpenShift allows you to access the Data Hub services via routes as opposed to regular NodePorts. For example, instead of accessing the vsystem service via https://master-node.example.com:32322, after the service exposure, you will be able to access it at https://vsystem-sdh.wildcard-domain. This is an alternative to the official guide documentation to Expose the Service From Outside the Network.

NOTE: For this to work, a wildcard domain needs to be preconfigured in the local DNS server to resolve the desired wildcard-domain and all its subdomains (e.g. vsystem-sdh.wildcard-domain) to the node, where OpenShift Router (or its load balancer) runs. Please follow Using Wildcard Routes (for a Subdomain) for more information.

There are two kinds of routes. The reencrypt kind, allows for a custom signed or self-signed certificate to be used. The other is a passthrough kind which uses the pre-installed certificate generated by the installer or passed to the installer.

4.6.1.1.1. Export services with an reencrypt route

With this kind of route, different certificates are used on client and service sides of the route. The router stands in the middle and re-encrypts the communication coming from either side using a certificate corresponding to the opposite side. In this case, the client side is secured by a provided certificate and the service side is encrypted with the original certificate generated or passed to the SAP Data Hub installer.

The reencrypt route allows for securing the client connection with a proper signed certificate.

  1. Look up the vsystem service:

    # oc project sdh            # switch to the Data Hub project
    # oc get services | grep "vsystem "
    vsystem   ClusterIP   172.30.227.186   <none>   8797/TCP   19h
    

    When exported, the resulting hostname will look like vsystem-${SDH_NAMESPACE}.wildcard-domain. However, an arbitrary hostname can be chosen instead as long as it resolves correctly to the IP of the router.

  2. Get or generate the certificates. In the following example, a self-signed certificate is created.

    # openssl genpkey -algorithm RSA -out sdhroute-privkey.pem \
            -pkeyopt rsa_keygen_bits:2048 -pkeyopt rsa_keygen_pubexp:3
    # openssl req -new -key sdhroute-privkey.pem -out sdhroute.csr \
            -subj "/C=DE/ST=BW/L=Walldorf/O=SAP SE/CN=vsystem-$(oc project -q).wildcard-domain"
    # openssl x509 -req -days 365 -in sdhroute.csr -signkey sdhroute-privkey.pem -out sdhroute.crt
    

    Please refer to Generating Certificates for more information.
    If you want to export more SDH services without using multiple certificates, the Common Name (CN) attribute could be set to the *.wildcard-domain instead, which will match all its possible subdomains.
    The sdhroute.crt self-signed certificate can be imported to a web browser or passed to any other vsystem client.

  3. Obtain the SDH's root certificate authority bundle generated at the SDH's installation time. The generated bundle is available at SAPDataHub-*-Foundation/deployment/vsolutions/certs/vrep/ca/ca.crt when installing manually. But it is also available as in the ca-bundle.pem secret in the sdh namespace.

    # # in case of manual installation
    # cp path/to/SAPDataHub-*-Foundation/deployment/vsolutions/certs/vrep/ca/ca.crt sdh-service-ca-bundle.pem
    # # otherwise get it from the ca-bundle.pem secret
    # oc get -o go-template='{{index .data "ca-bundle.pem"}}' secret/ca-bundle.pem | base64 -d >sdh-service-ca-bundle.pem
    
  4. Create the reencrypt route for the vsystem service like this:

    # oc create route reencrypt --cert=sdhroute.crt --key=sdhroute-privkey.pem \
        --dest-ca-cert=sdh-service-ca-bundle.pem --service=vsystem
    # oc get route
    NAME      HOST/PORT                     PATH  SERVICES  PORT      TERMINATION  WILDCARD
    vsystem   vsystem-sdh.wildcard-domain         vsystem   vsystem   reencrypt    None
    
  5. Verify you can access the SDH web console at https://vsystem-sdh.wildcard-domain. If generated a self-signed certificate in step 2, import it in your web browser and refresh the page.

4.6.1.1.2. Export services with a passthrough route

With the passthrough route, the communication is encrypted by the SDH service's certificate all the way to the client. It can be treated as secure by the clients as long as the SDH installer has been given proper Certificate Domain to generate a certificate having the Common Name matching the route's hostname and the certificate is imported or passed to the client.

  1. Obtain the SDH's root certificate as documented in step 3 of Export services with an reencrypt route.
  2. Print its attributes and make sure its Common Name (CN) matches the expected hostname or wildcard domain.

    # openssl x509 -noout -subject -in sdh-service-ca-bundle.pem
    subject= /C=DE/ST=BW/L=Walldorf/O=SAP SE/CN=*.wildcard-domain
    

    In this case, the certificate will be valid for any subdomain of the .wildcard-domain.

  3. Look up the vsystem service:

    # oc project sdh            # switch to the Data Hub project
    # oc get services | grep "vsystem "
    vsystem   ClusterIP   172.30.227.186   <none>   8797/TCP   19h
    
  4. Create the route:

    # oc create route passthrough --service=vsystem
    # oc get route
    NAME      HOST/PORT                     PATH  SERVICES  PORT      TERMINATION  WILDCARD
    vsystem   vsystem-sdh.wildcard-domain         vsystem   vsystem   passthrough  None
    

    Verify that it matches the Common Name (CN) attribute from step 2. You can modify the hostname with --hostname parameter. Make sure it resolves to the router's IP.

  5. Import the self-signed certificate to your web browser and access the SDH web console at https://vsystem-sdh.wildcard-domain to verify.

4.6.1.2. Using NodePorts

Until SDH 2.5, the SAP Data Hub services were exposed implicitly by the installer. From this version onward, the services need to be exposed manually if desired.

Exposing SAP Data Hub vsystem

NOTE For OpenShift, an exposure using routes is preferred.

  • Either with an auto-generated node port:

    # oc expose service vsystem --type NodePort --name=vsystem-nodeport --generator=service/v2
    # oc get -o jsonpath=$'{.spec.ports[0].nodePort}\n' services vsystem-nodeport
    30617
    
  • Or with a specific node port (e.g. 32123):

    # oc expose service vsystem --type NodePort --name=vsystem-nodeport --generator=service/v2 --dry-run -o yaml | \
        oc patch -p '{"spec":{"ports":[{"port":8797, "nodePort": 32123}]}}' --local -f - -o yaml | oc create -f -
    

The original service remains accessible on the same ClusterIP:Port as before. Additionally, it is now accessible from outside of the cluster under the node port.

Exposing SAP Vora Transaction Coordinator and HANA Wire

NOTE Routes cannot be used for exposing these services, therefore please use either NodePorts (as documented here) or an alternative method for Getting Traffic into a Cluster.

# oc expose service vora-tx-coordinator-ext --type NodePort --name=vora-tx-coordinator-nodeport --generator=service/v2
# oc get -o jsonpath=$'tx-coordinator:\t{.spec.ports[0].nodePort}\nhana-wire:\t{.spec.ports[1].nodePort}\n' \
    services vora-tx-coordinator-nodeport
tx-coordinator: 32445
hana-wire:      32192

The output shows the generated node ports for the newly exposed services.

4.6.2. (AWS or IBM Cloud™ only) Configure registry secret for the Modeler

If installing SAP Data Hub on AWS, please follow Using AWS ECR Registry for the Modeler if this registry shall be used for the Pipeline Modeler.

If installing SAP Data Hub on IBM Cloud™, please refer to Provide the Modeler's Access Credentials for the IBM Cloud Container Registry for a post-configuration step to enable the registry for the SAP Data Hub Modeler.

4.6.3. SDH Validation

Validate SDH installation on OCP to make sure everything works as expected. Please follow the instructions in Testing Your Installation (2.7) / (2.6) / (2.5) / (2.4) / (2.3).

5. Upgrade of SDH to a newer release

This section will guide you through the SAP Data Hub upgrade to a newer release version. The upgrade involves also an upgrade of OpenShift cluster if you run SDH 2.3 on OCP 3.9 cluster.

The following steps must be performed in the given order. Unless an OCP upgrade is needed, the steps marked with (ocp-upgrade) can be skipped.

  1. Make sure to get familiar with the official SAP Upgrade guide (2.7) / (2.6) / (2.5) / (2.4).
  2. (ocp-upgrade) Make yourself familiar with the OpenShift's upgrade guide (3.11) / (3.10).
  3. Plan for a downtime.
  4. Follow and execute the SAP Pre-Upgrade Procedures (2.7) / (2.6) / (2.5) / (2.4).

    • If you exposed the vsystem service using routes, delete the route:

      # oc get route vsystem -o yaml >route-vsystem.bak.yaml  <br><br>\# make a backup
      # oc delete route vsystem
      
  5. (ocp-upgrade) Choose one of the OCP's upgrade methods (3.11) / (3.10) and execute it.

  6. (SDH 2.3 to 2.4) Execute the following items from the Project setup that became necessary since SDH 2.4:

    1. Load the ipt_REDIRECT kernel module on every schedulable node:

      # modprobe ipt_REDIRECT
      # echo "ipt_REDIRECT" > /etc/modules-load.d/ipt_redirect.conf
      
    2. Execute the following as the cluster-admin in the SDH's namespace (e.g. sdh):

      # oc project sdh
      # oc adm policy add-scc-to-user privileged -z "vora-vflow-server"
      # oc patch namespace "$(oc project -q)" -p '{"metadata":{"annotations":{"openshift.io/node-selector":""}}}'
      
    3. Set up helm according to the instructions in the Project setup.

  7. (SDH 2.4 to 2.5) Execute the following items from the Project setup that became necessary since SDH 2.5:

    1. Deploy SDH Observer in the Data Hub's namespace.
    2. Make sure the OpenShift user performing the upgrade has been granted cluster-admin role. See becoming a cluster-admin for details.
    3. If an insecure external registry is used and kaniko shall be enabled, make sure to mark the registry as insecure.
    4. Execute the following as the cluster-admin in the SDH's namespace (e.g. sdh):

      # oc adm policy add-scc-to-user hostaccess -z "$(oc project -q)-nodeexporter"
      
  8. (SDH 2.5 to 2.6) Execute the following items from the Project setup that became necessary since SDH 2.6:

    1. Execute the following as the cluster-admin in the SDH's namespace (e.g. sdh):

      # oc adm policy add-scc-to-user hostaccess -z "$(oc project -q)-nodeexporter"
      
  9. (SDH 2.6 to 2.7) Execute the following items from the Project setup that became necessary since SDH 2.7:

    1. Execute the following as the cluster-admin in the SDH's namespace (e.g. sdh):

      # oc adm policy add-scc-to-user privileged -z "vora-vsystem-$(oc project -q)-vrep"
      
  10. Execute the SDH upgrade according to the official instructions. You may again choose between different upgrade methods:

  11. Execute the Post-Upgrade Procedures for the SDH (2.7) / (2.6) / (2.5) / (2.4).

    • If you exposed the vsystem service using routes, re-create the route. If no backup is available, please follow Using OpenShift Router and routes.

      # oc create -f route-vsystem.bak.yaml
      

6. Appendix

6.1. Ceph and OCP integration

In order to dynamically provision Ceph volumes in OCP cluster, several steps need to be performed to make it aware of the storage back-end.

6.1.1. Create a dedicated pool and user in Ceph cluster for OCP

Please follow the instructions on Creating a pool for dynamic volumes in OCP documentation.

This will create the kube pool and the kube user in the Ceph cluster.

6.1.2. Install ceph-common package on OCP cluster hosts

Make sure to enable the following repository on OCP cluster hosts and install the ceph-common package:

# subscription-manager repos --enable=rhel-7-server-rhceph-3-tools-rpms
# yum install -y ceph-common

This allows OCP nodes to interact with Ceph cluster.

6.1.3. Configure the ceph secrets

Ceph RBD storage class requires at least one secret (admin). However, for the sake of security, rather than using Ceph administrator's credentials for all interaction, it's recommended to use a dedicated user (e.g. kube) for authentication against a dedicated pool for OCP (e.g. kube).

  • admin secret - needs to be present in an arbitrary namespace (in this example we stick to kube-system)
  • user secret - needs to be present in all namespaces that need to provision Ceph RBD persistent volumes; if not specified in storage class (see below), the admin secret will be used instead
  1. Get the base64-encoded keys for both users with the following commands executed from a Ceph's administrator or MON node:

    # ceph auth get-key client.admin | base64
    YXFhOU5xRENueTJib2JhYS82cDF4NGVxQjdBVUw2dnZWdDZsWVc9PQ==
    # ceph auth get-key client.kube  | base64
    YXFhdk5xOUNCMkZlYUhhYXllWDhNaHlWTUFidDZWSC8vV0FyY1c9PQ==
    
  2. As the cluster-admin, create the admin secret like this while making sure to use your own $ADMIN_KEY:

    # ADMIN_KEY="YXFhOU5xRENueTJib2JhYS82cDF4NGVxQjdBVUw2dnZWdDZsWVc9PQ=="
    # oc create -f - <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: ceph-admin-secret
      namespace: kube-system
    data:
      key: ${ADMIN_KEY}
    type: kubernetes.io/rbd
    EOF
    
  3. As the cluster-admin, create a custom project template that will provision user secret in each newly created project/namespace. Again make sure to use your own $USER_KEY.

    # USER_KEY="YXFhdk5xOUNCMkZlYUhhYXllWDhNaHlWTUFidDZWSC8vV0FyY1c9PQ=="
    # oc adm create-bootstrap-project-template -o yaml | oc patch --local -f - -o yaml --type=json -p '[{
      "path": "/objects/0",
      "op": "add",
      "value": {
        "apiVersion": "v1",
        "kind": "Secret",
        "data": { "key": "'"$USER_KEY"'" },
        "metadata": { "name": "ceph-user-secret" },
        "type": "kubernetes.io/rbd"
      }
    }, {
      "path": "/metadata/namespace",
      "op": "add",
      "value": "default"
    }, {
      "path": "/metadata/name",
      "op": "add",
      "value": "ceph-project"
    }]' | oc create -f - -n default
    

    This creates a template called ceph-project in the default namespace. If another non-default project template is already being used, you can modify it to contain the following object instead:

    apiVersion: v1
    kind: Secret
    metadata:
      name: ceph-user-secret
      namespace: kube-system
    data:
      key: ${USER_KEY}
    type: kubernetes.io/rbd
    
  4. As the root user on all the OCP master nodes, modify the master's configuration file to instantiate the newly created template for all new projects:

    # cp /etc/origin/master/master-config{,.bak}.yaml
    # oc ex config patch -p '{
      "projectConfig": {
        "projectRequestTemplate": "default/ceph-project"
      }
    }' /etc/origin/master/master-config.bak.yaml >/etc/origin/master/master-config.yaml
    
  5. Restart the master api and controllers for the change to take effect:

    # # for OCP 3.9 only
    # systemctl restart atomic-openshift-master-{api,controllers}
    
    # # for OCP 3.10 or higher
    # for component in api controllers; do /usr/local/bin/master-restart $component $component; done
    

Now if a new project is created, there will be the ceph-user-secret auto-generated. To verify, execute the following:

# PROJECT="test-$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w 6 | head -n 1)"
# oc new-project $PROJECT
Now using project "test-zj1nq9" on server "https://ip-172-18-0-251.ec2.internal:8443".
...
# oc get secret ceph-user-secret
NAME               TYPE                DATA      AGE
ceph-user-secret   kubernetes.io/rbd   1         47s
# oc delete project $PROJECT
project.project.openshift.io "test-zj1nq9" deleted

6.1.4. Create Ceph RBD storage class

The following will create the storage class named ceph-rbd usable in the whole OCP cluster and make it the default storage class for new PVCs. Make sure to use MONITORS corresponding to your cluster.

# MONITORS="192.168.1.11:6789,192.168.1.12:6789,192.168.1.13:6789"
# oc create -f - <<EOF
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: ceph-rbd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/rbd
parameters:
  monitors: ${MONITORS}
  adminId: admin
  adminSecretName: ceph-admin-secret
  adminSecretNamespace: kube-system
  pool: kube
  userId: kube
  userSecretName: ceph-user-secret
EOF

If it is not desirable to make the ceph-rbd storage class the default, remove the annotations section from the yaml above or with the following command as a post-creation step:

# oc annotate sc ceph-rbd storageclass.kubernetes.io/is-default-class-
storageclass.storage.k8s.io/ceph-rbd annotated

Otherwise, if there is another storage class already marked as default, make sure to remove its annotation:

# oc get sc
NAME                      PROVISIONER                AGE
ceph-rbd (default)        kubernetes.io/rbd          1d
glusterfs-storage         kubernetes.io/glusterfs    1d
gp2 (default)             kubernetes.io/aws-ebs      1d
# oc annotate sc/gp2 storageclass.kubernetes.io/is-default-class- \
                        storageclass.beta.kubernetes.io/is-default-class-
storageclass.storage.k8s.io/gp2 annotated
# oc get sc
NAME                      PROVISIONER                AGE
ceph-rbd (default)        kubernetes.io/rbd          1d
glusterfs-storage         kubernetes.io/glusterfs    1d
gp2                       kubernetes.io/aws-ebs      1d

6.1.5. (Optional) Test the Ceph RBD storage class

Please follow the OCP documentation Using an existing Ceph cluster for dynamic persistent storage from step 6. Create the PVC object definition onward.

NOTE If your ceph-rbd class is not the default, you need to specify it in the claim like this:

kind: PersistentVolumeClaim
...
requests:
  storage: 2Gi
storageClassName: ceph-rbd

6.2. SDH uninstallation

Choose one of the uninstallation methods based on what method has been used for the installation.

When done, you may continue with a new installation round in the same or another namespace.

6.2.1. Using the SL Plugin

Please follow the SAP documentation Uninstall SAP Data Hub Using the SL Plugin (2.7) / (2.6) / (2.5) / (2.4) / (2.3) if you have installed SDH using either mpsl or mpfree methods.

6.2.2. Manual uninstallation

The installation script allows for project clean-up where all the deployed pods, persistent volume claims, secrets, etc. are deleted. If the manual installation method was used, the SDH can be uninstalled using the same script and different set of parameters. The snippet below is an example where the SDH installation resides in the project sdh. In addition to running this script, the project needs to be deleted as well if the same project shall host the new installation.

# ./install.sh --delete --purge --force-deletion --namespace=sdh \
    --docker-registry=registry.local.example:5000
# oc delete project sdh
# # start the new installation

The deletion of the project often takes quite a while. Until fully uninstalled, the project will be listed as Terminating in the output of oc get project. You may speed the process up with the following command. Again please mind the namespace.

# oc delete pods --all --grace-period=0 --force --namespace sdh

NOTE: Make sure not to run the same installation script more than once at the same time even when working with different OpenShift projects.

6.3. Uninstall Helm

# helm reset

6.4. Allow a non-root user to interact with Docker on Jump host

  • Append -G dockerroot to OPTIONS= in /etc/sysconfig/docker file on your Jump host.

  • Run the following commands on the Jump host, after you modify the /etc/sysconfig/docker file. Make sure to replace alice with your user name.

    # sudo usermod -a -G dockerroot alice
    # sudo chown root:dockerroot /var/run/docker.sock
    
  • Log out and re-log-in to the Jump host for the changes to become effective.

If the Jump host is part of the OCP cluster, make sure to add -G dockerroot to openshift_docker_options in the inventory file before the advanced installation.

6.5. Load nfsd kernel modules

Execute the following on all the schedulable nodes in bash:

# sudo mount -t nfsd nfsd /proc/fs/nfsd
# sudo modprobe nfsv4
# sudo tee /etc/modules-load.d/nfsd.conf <<<$'nfsd\nnfsv4'

6.6. Grant fluentd pods permissions to logs

The diagnostics-fluentd-* pods need access to /var/log directories on nodes. For this to work. The pods need to be run as privileged. There are two steps necessary to make it happen:

  1. the ${SDH_PROJECT_NAME}-fluentd service account needs to be added on privileged scc list with the following command copied from the project setup:

    # oc project "${SDH_PROJECT_NAME}"
    # oc adm policy add-scc-to-user privileged -z "$(oc project -q)-fluentd"
    
  2. the daemonset diagnostics-fluentd needs to be patched to request the privileged security context.

The recommended way to execute the second step is to deploy deploy SDH Observer. Alternatively, the patching of the daemonset can be done either before the SDH Installation (only applicable for manual installation) or afterwards.

6.6.1. Before the installation

For SDH 2.5 or newer, the recommended approach is to deploy SDH observer that will patch the diagnostics-fluentd daemonset as soon as it appears.

Nevertheless, it is still possible to patch the helm template directly when installing SDH manually:

# patch -p1 -B patchbaks/ -r - <<EOF
Index: SAPDataHub-2.5.114-Foundation/deployment/helm/vora-diagnostic/templates/logging/fluentd.yaml
===================================================================
--- SAPDataHub-2.5.114-Foundation.orig/deployment/helm/vora-diagnostic/templates/logging/fluentd.yaml
+++ SAPDataHub-2.5.114-Foundation/deployment/helm/vora-diagnostic/templates/logging/fluentd.yaml
@@ -41,6 +41,7 @@ spec:
         - name: "FLUENT_ELASTICSEARCH_SCHEME"
           value: "http"
         securityContext:
+          privileged: true
           runAsUser: 0
           runAsNonRoot: false
         volumeMounts:
EOF

For SDH 2.4+ during manual installation only:

# patch -p1 -B patchbaks/ -r - <<EOF
Index: SAPDataHub-2.4.83-Foundation/deployment/helm/vora-diagnostic/templates/logging/fluentd.yaml
===================================================================
--- SAPDataHub-2.5.88-Foundation.orig/deployment/helm/vora-diagnostic/templates/logging/fluentd.yaml
+++ SAPDataHub-2.5.88-Foundation/deployment/helm/vora-diagnostic/templates/logging/fluentd.yaml
@@ -38,6 +38,8 @@ spec:
           value: {{ .Context.elasticsearch.service.port | quote }}
         - name: "FLUENT_ELASTICSEARCH_SCHEME"
           value: "http"
+        securityContext:
+          privileged: true
         volumeMounts:
         - name: "settings"
           mountPath: "/etc/fluent"
EOF

For SDH 2.3 during manual installation only:

# patch -p1 -B patchbaks/ -r - <<EOF
Index: SAPDataHub-2.3.173-Foundation/deployment/helm/vora-diagnostic/templates/logging/fluentd-kubernetes-ds.yaml
===================================================================
--- SAPDataHub-2.3.173-Foundation.orig/deployment/helm/vora-diagnostic/templates/logging/fluentd-kubernetes-ds.yaml
+++ SAPDataHub-2.3.173-Foundation/deployment/helm/vora-diagnostic/templates/logging/fluentd-kubernetes-ds.yaml
@@ -63,6 +63,8 @@ spec:
         - name: {{ .Values.docker.imagePullSecret }}
       {{- end }}
       terminationGracePeriodSeconds: 30
+      securityContext:
+        privileged: true
       volumes:
       - name: config-volume
         configMap:
EOF

6.6.2. After the SDH installation

For OCP cluster releases 3.10 or higher, the recommended approach is to deploy SDH observer.

For older releases, execute the following command once the SDH installation is finished in the Data Hub's namespace.

# oc patch ds/diagnostics-fluentd  -p '{ "spec": { "template": { "spec": {
      "containers": [{ "name": "diagnostics-fluentd", "securityContext": { "privileged": true }}]
    }}}}'

The fluentd pods will get restarted automatically.

6.7. Make the installer cope with just 3 nodes in cluster

IMPORTANT: This hint is useful for just small PoCs, not for production deployment. For the latter, please increase the number of schedulable compute nodes.

SDH's dlog pod expects at least 3 schedulable compute nodes that are neither master nor infra nodes. This requirement can be mitigated by reducing replication factor of dlog pod with the following parameters passed either to the installation script (when installing manually) or as Additional Installation Parameters during mpsl or mpfree installation methods:

-e=vora-cluster.components.dlog.standbyFactor=0 -e=vora-cluster.components.dlog.replicationFactor=2

Alternatively, you may choose 1 for both standby and replication factors. The parameters are documented in the Installation Guide (2.7) / (2.6) / (2.5) / (2.4) / (2.3).

6.8. Unset the default node selector on Data Hub's project

The daemonsets deployed by the SDH installer are expected to deploy to all the schedulable nodes, including infra nodes. If there is a default node selector set that is not present on all the schedulable nodes, the pods will get restricted to it. This will result in less available pods than desired:

# oc get daemonset -n sdh
NAME                                   DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
diagnostics-fluentd                    4         3         3         3            3           <none>          16h
diagnostics-prometheus-node-exporter   4         3         3         3            3           <none>          16h
vsystem-module-loader                  4         3         3         3            3           <none>          16h

For example, on OCP 3.10, the default node selector is set to node-role.kubernetes.io/compute=true. So unless overridden, all the pods will be scheduled to compute nodes. And as the following output shows, only 3 nodes out of 4 have the compute role.

# oc get nodes
NAME                  STATUS ROLES        AGE VERSION
master.example.com    Ready  infra,master 2d  v1.10.0+b81c8f8
node1.example.com     Ready  compute      2d  v1.10.0+b81c8f8
node2.example.com     Ready  compute      2d  v1.10.0+b81c8f8
node3.example.com     Ready  compute      2d  v1.10.0+b81c8f8

To address this, as cluster-admin, set the Data Hub project's node selector to empty string, which overrides the global setting:

# oc patch namespace sdh -p '{"metadata":{"annotations":{"openshift.io/node-selector":""}}}'

To actually trigger the deployment of the missing pods, re-create the daemonsets:

# oc get ds -o yaml | oc replace --force -f -

6.9. Deploy SDH Observer

The SDH Observer observes the SDH namespace and applies fixes to deployments as they appear. It does the following for OCP releases prior to 4.1:

  • modifies the Pipeline Modeler (aka vflow) to run as Super Privileged Container to enable it to access /var/run/docker.sock socket on the host if kaniko builds are disabled
  • enables the Pipeline Modeler (aka vflow) to talk to an insecure registry - needed only if kaniko builds are enabled and the registry is insecure
  • makes the SDH's diagnostic-fluentd pods privileged to allow them to access log files on the hosts

Apart from accessing resources in sdh namespace, it also requires node-reader cluster role.

To deploy it, as a cluster-admin execute the following command in the SDH namespace before, during or after the SDH installation:

# OCPVER=v3.11               # this must match OCP minor release
# INSECURE_REGISTRY=false    # set to true if the registry is insecure
# oc process -f https://raw.githubusercontent.com/redhat-sap/sap-datahub/master/sdh-observer.yaml \
        NAMESPACE="$(oc project -q)" \
        BASE_IMAGE_TAG="${OCPVER:-3.11}" \
        MARK_REGISTRY_INSECURE=${INSECURE_REGISTRY:-0} | oc create -f -

NOTE: that the BASE_IMAGE_TAG must match one of the tags available in the quay.io/openshift/origin-cli repository. The difference between the client's minor release version and OCP server's minor release must not exceed 1.

For IBM Cloud™ specifics, please visit Deploying Red Hat's SAP Data Hub (SDH) Observer.

6.10. Permit Pipeline Modeler to access Docker socket

NOTE: applicable to OCP cluster release 3.9 or older. For newer releases, please follow Deploy SDH Observer instead.

The SDH Pipeline Modeler running in the vflow pod needs access to /var/run/docker.sock socket in order to build images. This is a security violation of the default SELinux policy and it will be denied at runtime unless explicitly allowed. To allow it, run the following commands as a user with root permissions on all schedulable nodes:

# semanage fcontext -m -t container_file_t -f s "/var/run/docker\.sock"
# restorecon -v /var/run/docker.sock

To make the change permanent, execute the following on all the nodes:

# cat >/etc/systemd/system/docker.service.d/socket-context.conf <<EOF
[Service]
ExecStartPost=/sbin/restorecon /var/run/docker.sock
EOF

6.11. Marking the vflow registry as insecure

NOTE: applicable only when kaniko image builds are enabled.
NOTE: applicable before, during or a after SDH installation.

In SAP Data Hub 2.5.x and 2.6.x releases, it is possible to configure insecure registry for Pipeline Modeler (aka vflow pod) neither via installer nor in the UI.

The insecure registry needs to be set if the container registry listens on an insecure port (HTTP) or the communication is encrypted using a self-signed certificate.

Without the insecure registry set, kaniko builder cannot push built images into the configured registry for the Pipeline Modeler (see "Container Registry for Pipeline Modeler" Input Parameter at the official SAP Data Hub documentation.

To mark the configured vflow registry as insecure, the SDH Observer needs to be deployed with MARK_REGISTRY_INSECURE=true parameter. If it is already deployed, it can be re-configured to take care of insecure registries by executing the following command in the sdh namespace:

# oc set env dc/sdh-observer MARK_REGISTRY_INSECURE=true

Once deployed, all the existing pipeline modeler pods will be patched. It will take a couple of tens of seconds until all the modified pods become available.

For more information, take a look at SAP Data Hub RHT CoP repo.

6.12. Running SDH pods on particular nodes

Due to shortcomings in SDH's installer, the validation of SDH installation fails if its daemonsets are not deployed to all the nodes in the cluster.
Therefor, the installation should be executed without a restriction on nodes. After the installation is done, the pods can be re-scheduled to desired nodes like this:

  1. choose a label to apply to the SAP Data Hub project and the desired nodes (e.g. run-sdh-project=sdhblue)

  2. label the desired nodes (in this example worker1, worker2, worker3 and worker4)

    # for node in worker{1,2,3,4}; do oc label node/$node run-sdh-project=sdhblue; done
    
  3. set the project node selector of the sdhblue namespace to match the label

    # oc patch namespace sdhblue -p '{"metadata":{"annotations":{"openshift.io/node-selector":"run-sdh-project=sdhblue"}}}'
    
  4. evacuate the pods from all the other nodes by killing them (requires jq utility installed)

    # oc project sdhblue                    # switch to the SDH project
    # label="run-sdh-project=sdhblue"       # set the chosen label
    # nodeNames="$(oc get nodes -o json | jq -c '[.items[] |
        select(.metadata.labels["'"${label%=*}"'"] == "'"${label#*=}"'") | .metadata.name]')"
    # oc get pods -o json | jq -r '.items[] | . as $pod |
        select(('"$nodeNames"' | all(. != $pod.spec.nodeName))) | "pod/\(.metadata.name)"' | xargs -r oc delete
    

NOTE: Please make sure the Data Hub instance is not being used because killing its pods will cause a downtime.

The pods will be re-launched on the nodes labeled with run-sdh-project=sdhblue. It may take several minutes before the SDH becomes available again.

6.13. Running multiple SDH instances on a single OCP cluster

Two instances of SAP Data Hub running in parallel on a single OCP cluster have been validated. Running more instances is possible, but most probably needs an extra support statement from SAP.

Please consider the following before deploying more than one SDH instance to a cluster:

  • Each SAP Data Hub instance must run in its own namespace/project.
  • It is recommended to dedicate particular nodes to each SDH instance.
  • It is recommended to use ovs-multitenant network plug-in for project-level network isolation and improved security. This, however, cannot be changed post OCP installation.
  • If running the production and test (aka blue-green) SDH deployments on a single OCP cluster, mind also the following:
    • There is no way to test an upgrade of OCP cluster before an SDH upgrade.
    • The idle (non-productive) landscape should have the same network security as the live (productive) one.

To deploy a new SDH instance to OCP cluster, please repeat the steps from project setup starting from point 6 with a new project name and continue with SDH Installation.

6.14. Using AWS ECR Registry for the Modeler

Post-sdh-installation step

SAP Data Hub installer allows to specify "AWS IAM Role for Pipeline Modeler" when AWS ECR Registry is used as the external registry. However, due to a bug in Data Hub, the Modeler cannot use it. In order to use AWS ECR Registry for Data Hub, one can follow the instructions at Provide Access Credentials for a Password Protected Container Registry with the following modification.

# cat >/tmp/vsystem-registry-secret.txt <<EOF
username: "AWS_ACCESS_KEY_ID"
password: "AWS_SECRET_ACCESS_KEY"
EOF

The AWS_* credentials must belong to a user that has the power-user access to the ECR registry provided by the AmazonEC2ContainerRegistryPowerUser. Please refer to Amazon ECR Repository Policies when you need fine-grained access control.

7. Troubleshooting Tips

7.1. SDH Installation or Upgrade problems

7.1.1. HANA, consul and UAA pods keep restarting

If the mentioned pods keep restarting, like the following output illustrates, you may be using a buggy version of docker.

# oc get pods
NAME                                                  READY     STATUS     RESTARTS   AGE
auditlog-759889fc5-d2sbm                              1/1       Running    3          25m
hana-0                                                1/1       Running    5          29m
tiller-deploy-5778c7768-96jbc                         1/1       Running    0          1d
uaa-7df64bf-wnssm                                     0/2       Init:0/1   0          24m
vora-consul-0                                         1/1       Running    5          29m
vora-consul-1                                         1/1       Running    5          29m
vora-consul-2                                         1/1       Running    5          29m
vora-deployment-operator-7dffb56687-pxwkz             1/1       Running    0          29m
vora-security-operator-7b8ffcfb59-tb6n4               1/1       Running    0          24m
vora-spark-resource-staging-server-849c68f457-dpxst   1/1       Running    0          29m

Make sure not to install docker-1.13.1-84.git07f3374.el7.x86_64 on schedulable nodes. Either update to a newer version or downgrade to an earlier and exclude the version from future updates.

# yum downgrade -y docker-1.13.1-75.git8633870.el7_5.x86_64
# sed -i "s/^exclude.*/\0 docker*-1.13.1-84.git07f3374.el7/" /etc/yum.conf

7.1.2. Vsystem-vrep pod not starting

Upon inspection, it looks like the NFS filesystem is not supported:

# oc get pods -l vora-component=vsystem-vrep
NAME             READY     STATUS             RESTARTS   AGE
vsystem-vrep-0   0/1       CrashLoopBackOff   6          8m
# oc logs $(oc get pods -o name -l vora-component=vsystem-vrep)
2018-08-30 10:41:51.935459|+0000|INFO |Starting Kernel NFS Server||vrep|1|Start|server.go(53)
2018-08-30 10:41:52.014885|+0000|ERROR|service nfs-kernel-server start: Not starting NFS kernel daemon: no support in current kernel.||vrep|1|Start|server.go(80)
2018-08-30 10:41:52.014953|+0000|ERROR|error starting nfs-kernel-server: exit status 1
||vrep|1|Start|server.go(82)
2018-08-30 10:41:52.014976|+0000|FATAL|Error starting NFS server: NFSD error||vsystem|1|fail|server.go(145)

This means that the proper NFS kernel module hasn't been loaded yet. Make sure to load it permanently.

7.1.3. Vora Installation Error: timeout at “Deploying vora-consul”

Vora Installation Error: timeout at "Deploying vora-consul with: helm install --namespace vora -f values.yaml ..."

To view the log messages, you can login to the OpenShift web console, navigate to Applications -> Pods, select the failing pod e.g. vora-consul-2-0, and check the log under the Events tab.

A common error: if the external image registry is insecure, but the OpenShift cluster is configured to pull from a secure registry, you will see errors in the log. If secure registry is not feasible, follow the instructions on configuring the registry as insecure.

7.1.4. Too few worker nodes

If you see the installation failing with the following error, there are too few schedulable non-infra nodes in the cluster.

Status:
  Message:  Less available workers than Distributed Log requirements
  State:    Failed
Events:
  Type  Reason               Age   From                      Message
  ----  ------               ----  ----                      -------
        New Vora Cluster     11m   vora-deployment-operator  Started processing
        Update Vora Cluster  11m   vora-deployment-operator  Processing failed: less available workers than Distributeed Log requirements
2018-09-14T16:16:15+0200 [ERROR] Timeout waiting for vora cluster! Please check the status of the cluster from above logs and kubernetes dashboard...

If you have at least 2 schedulable non-infra nodes, you may still make the installation succeed by reducing the dlog's replication factor. After the patch is applied, make sure to uninstall the failed installation and start it anew.

7.1.5. Privileged security context unassigned

If there are pods, replicasets, or statefulsets not coming up and you can see an event similar to the one below, you need to add privileged security context constraint to its service account.

# oc get events | grep securityContext
1m          32m          23        diagnostics-elasticsearch-5b5465ffb.156926cccbf56887                          ReplicaSet                                                                            Warning   FailedCreate             replicaset-controller                  Error creating: pods "diagnostics-elasticsearch-5b5465ffb-" is forbidden: unable to validate against any security context constraint: [spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]

Copy the name in the fourth column (the event name - diagnostics-elasticsearch-5b5465ffb.156926cccbf56887) and determine its corresponding service account name.

# eventname="diagnostics-elasticsearch-5b5465ffb.156926cccbf56887"
# oc get -o go-template=$'{{with .spec.template.spec.serviceAccountName}}{{.}}{{else}}default{{end}}\n' \
    "$(oc get events "${eventname}" -o jsonpath=$'{.involvedObject.kind}/{.involvedObject.name}\n')"
sdh-elasticsearch

The obtained service account name (sdh-elasticsearch) now needs to be assigned privileged scc:

# oc adm policy add-scc-to-user privileged -z sdh-elasticsearch

The pod then shall come up on its own unless this was the only problem.

7.1.6. No Default Storage Class set

If pods are failing because because of PVCs not being bound, the problem may be that the default storage class has not been set and no storage class was specified to the installer.

# oc get pods
NAME                                                  READY     STATUS    RESTARTS   AGE
hana-0                                                0/1       Pending   0          45m
vora-consul-0                                         0/1       Pending   0          45m
vora-consul-1                                         0/1       Pending   0          45m
vora-consul-2                                         0/1       Pending   0          45m

# oc describe pvc data-hana-0
Name:          data-hana-0
Namespace:     sdh
StorageClass:
Status:        Pending
Volume:
Labels:        app=vora
               datahub.sap.com/app=hana
               vora-component=hana
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
  Type    Reason         Age                  From                         Message
  ----    ------         ----                 ----                         -------
  Normal  FailedBinding  47s (x126 over 30m)  persistentvolume-controller  no persistent volumes available for this claim and no storage class is set

To fix this, either make sure to set the Default StorageClass or provide the storage class name to the installer. For the manual installation, that would be ./install.sh --pv-storage-class STORAGECLASS.

7.1.7. vsystem-app pods not coming up

If you have SELinux in enforcing mode you may see the pods launched by vsystem crash-looping because of the container named vsystem-iptables like this:

# oc get pods
NAME                                                          READY     STATUS             RESTARTS   AGE
auditlog-59b4757cb9-ccgwh                                     1/1       Running            0          40m
datahub-app-db-gzmtb-67cd6c56b8-9sm2v                         2/3       CrashLoopBackOff   11         34m
datahub-app-db-tlwkg-5b5b54955b-bb67k                         2/3       CrashLoopBackOff   10         30m
...
internal-comm-secret-gen-nd7d2                                0/1       Completed          0          36m
license-management-gjh4r-749f4bd745-wdtpr                     2/3       CrashLoopBackOff   11         35m
shared-k98sh-7b8f4bf547-2j5gr                                 2/3       CrashLoopBackOff   4          2m
...
vora-tx-lock-manager-7c57965d6c-rlhhn                         2/2       Running            3          40m
voraadapter-lsvhq-94cc5c564-57cx2                             2/3       CrashLoopBackOff   11         32m
voraadapter-qkzrx-7575dcf977-8x9bt                            2/3       CrashLoopBackOff   11         35m
vsystem-5898b475dc-s6dnt                                      2/2       Running            0          37m

When you inspect one of those pods, you can see an error message similar to the one below:

# oc logs voraadapter-lsvhq-94cc5c564-57cx2 -c vsystem-iptables
2018-12-06 11:45:16.463220|+0000|INFO |Execute: iptables -N VSYSTEM-AGENT-PREROUTING -t nat||vsystem|1|execRule|iptables.go(56)
2018-12-06 11:45:16.465087|+0000|INFO |Output: iptables: Chain already exists.||vsystem|1|execRule|iptables.go(62)
Error: exited with status: 1
Usage:
  vsystem iptables [flags]

Flags:
  -h, --help               help for iptables
      --no-wait            Exit immediately after applying the rules and don't wait for SIGTERM/SIGINT.
      --rule stringSlice   IPTables rule which should be applied. All rules must be specified as string and without the iptables command.

And in the audit log on the node, where the pod got scheduled, you should be able to find an AVC denial similar to:

# grep 'denied.*iptab' /var/log/audit/audit.log
type=AVC msg=audit(1544115868.568:15632): avc:  denied  { module_request } for  pid=54200 comm="iptables" kmod="ipt_REDIRECT" scontext=system_u:system_r:container_t:s0:c826,c909 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
...

To fix this, the ip_REDIRECT kernel module needs to be loaded with the following commands executed on all the schedulable nodes:

# modprobe ipt_REDIRECT
# echo "ipt_REDIRECT" > /etc/modules-load.d/ipt_redirect.conf

7.1.8. Fluentd pods cannot access /var/log

If you see errors like shown below in the logs of fluentd pods, make sure to follow the Grant fluentd pods permissions to logs to fix the problem.

# oc logs $(oc get pods -o name -l datahub.sap.com/app-component=fluentd | head -n 1) | tail -n 20
2019-04-15 18:53:24 +0000 [error]: unexpected error error="Permission denied @ rb_sysopen - /var/log/es-containers-sdh25-mortal-garfish.log.pos"
  2019-04-15 18:53:24 +0000 [error]: suppressed same stacktrace
  2019-04-15 18:53:25 +0000 [warn]: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp
  2019-04-15 18:53:26 +0000 [error]: unexpected error error_class=Errno::EACCES error="Permission denied @ rb_sysopen - /var/log/es-containers-sdh25-mortal-garfish.log.pos"
  2019-04-15 18:53:26 +0000 [error]: /usr/lib64/ruby/gems/2.5.0/gems/fluentd-0.14.8/lib/fluent/plugin/in_tail.rb:151:in `initialize'
  2019-04-15 18:53:26 +0000 [error]: /usr/lib64/ruby/gems/2.5.0/gems/fluentd-0.14.8/lib/fluent/plugin/in_tail.rb:151:in `open'
...

7.2. Validation errors

May happen during the validation phase initiated by running the SDH installation script with the validate flag:

# ./install.sh --validate --namespace=sdh

7.2.1. Services not installed

Failure description

vora-vsystem validation complains about services not installed even though deployed:

2018-08-28T13:32:19+0200 [INFO] Validating...
2018-08-28T13:32:19+0200 [INFO] Running validation for vora-cluster...OK!
2018-08-28T13:33:14+0200 [INFO] Running validation for vora-sparkonk8s...OK!
2018-08-28T13:34:56+0200 [INFO] Running validation for vora-vsystem...2018-08-28T13:35:01+0200 [ERROR] Failed! Please see the validation logs -> /home/miminar/SAPDataHub-2.3.144-Foundation/logs/20180828_133214/vora-vsystem_validation_log.txt
2018-08-28T13:35:01+0200 [INFO] Running validation for datahub-app-base-db...OK!
2018-08-28T13:35:01+0200 [ERROR] There is a failed validation. Exiting...


# cat /home/miminar/SAPDataHub-2.3.144-Foundation/logs/20180828_133214/vora-vsystem_validation_log.txt
2018-08-28T13:34:56+0200 [INFO] Connecting to vSystem ...
2018-08-28T13:34:57+0200 [INFO] Wait until pod vsystem-vrep-0 is running...
2018-08-28T13:34:57+0200 [INFO] Wait until containers in the pod vsystem-vrep-0 are ready...
2018-08-28T13:34:58+0200 [INFO] Wait until pod vsystem-7d7ffdd649-8mcxv is running...
2018-08-28T13:34:58+0200 [INFO] Wait until containers in the pod vsystem-7d7ffdd649-8mcxv are ready...
2018-08-28T13:35:01+0200 [INFO] Logged in!
2018-08-28T13:35:01+0200 [INFO] Installed services:

/home/miminar/SAPDataHub-2.3.144-Foundation/validation/vora-vsystem/vsystem-validation.sh: line 19: error: command not found

Alternatively, the following message may appear at the end of the vora-vsyste_validation_log.txt:

2018-08-31T08:59:52+0200 [ERROR] Connection Management is not installed

Resolution

If there are proxy environment variables set, make sure to include 127.0.0.1 and localhost in no_proxy and NO_PROXY environment variables. You may find Setting Proxy Overrides and Working with HTTP Proxies helpful. Apart from the recommended settings, do not forget to include My_Image_Registry_FQDN in the NO_PROXY settings if the registry is hosted inside the proxied network.

7.2.2. Less than desired daemonset pods deployed

If you can see the diagnostics pod failing because of less than desired daemonset pods available. Most probably, there is a default node selector set in the master config.

Start diagnostics readiness checks for namespace dh24
2018-12-07T11:00:13+0100 [INFO] Check readiness of daemonset diagnostics-fluentd ............. failed
2018-12-07T11:05:23+0100 [ERROR] daemonset diagnostics-fluentd not ready: found 3/4 ready pods

NAME                                                DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE   SELECTOR AGE
daemonset.apps/diagnostics-fluentd                  4       3       3     3          3         <none> 16d
daemonset.apps/diagnostics-prometheus-node-exporter 4       3       3     3          3         <none> 16d
daemonset.apps/vsystem-module-loader                4       3       3     0          3         <none> 16d

Make sure to unset the default node selector on the Data Hub project to fix this.

7.2.3. Diagnostics Prometheus Node Exporter pods not starting

During an installation or upgrade, it may happen, that the Node Exporter pods keep restarting:

# oc get pods  | grep node-exporter
diagnostics-prometheus-node-exporter-5rkm8                        0/1       CrashLoopBackOff   6          8m
diagnostics-prometheus-node-exporter-hsww5                        0/1       CrashLoopBackOff   6          8m
diagnostics-prometheus-node-exporter-jxxpn                        0/1       CrashLoopBackOff   6          8m
diagnostics-prometheus-node-exporter-rbw82                        0/1       CrashLoopBackOff   7          8m
diagnostics-prometheus-node-exporter-s2jsz                        0/1       CrashLoopBackOff   6          8m

The validation will fail like this:

2019-08-15T15:05:01+0200 [INFO] Validating...
2019-08-15T15:05:01+0200 [INFO] Running validation for vora-cluster...OK!
2019-08-15T15:05:51+0200 [INFO] Running validation for vora-vsystem...OK!
2019-08-15T15:05:57+0200 [INFO] Running validation for vora-diagnostic...2019-08-15T15:11:06+0200 [ERROR] Failed! Please see the validation logs -> /root/wsp/clust/foundation/logs/20190815_150455/vora-diagnostic_validation_log.txt
...
2019-08-15T15:11:35+0200 [ERROR] There is a failed validation. Exiting...

# cat /root/wsp/clust/foundation/logs/20190815_150455/vora-diagnostic_validation_log.txt
2019-08-15T15:05:57+0200 [INFO] Start diagnostics readiness checks for namespace sdhup
2019-08-15T15:05:57+0200 [INFO] Check readiness of daemonset diagnostics-fluentd ... ok
2019-08-15T15:05:58+0200 [INFO] Check readiness of daemonset diagnostics-prometheus-node-exporter ............. failed
2019-08-15T15:11:06+0200 [ERROR] daemonset diagnostics-prometheus-node-exporter not ready: found 2/5 ready pods

The possible reason is that the limits on resource consumption set on the pods are too low. To address this post-installation, you can patch the daemonset like this (in the SDH's namespace):

# oc patch -p '{"spec": {"template": {"spec": {"containers": [
    { "name": "diagnostics-prometheus-node-exporter",
      "resources": {"limits": {"cpu": "200m", "memory": "100M"}}
    }]}}}}' ds/diagnostics-prometheus-node-exporter

To address this during the installation (using any installation method), add the following parameters:

-e=vora-diagnostics.resources.prometheusNodeExporter.resources.limits.cpu=200m
-e=vora-diagnostics.resources.prometheusNodeExporter.resources.limits.memory=100M

And then restart the validation (using the manual method) like this:

# ./install.sh --validate -n=sdh

7.2.4. Checkpoint store validation

If you see the following error during the installation for the checkpoint store validation, it means the bucket or the given directory does not exist. Make sure to create them first.

2018-12-05T06:30:17-0500 [INFO] Validating checkpoint store...
2018-12-05T06:30:17-0500 [INFO] Checking connection...
2018-12-05T06:30:42-0500 [INFO] AFSI CLI ouput:
2018-12-05T06:30:42-0500 [INFO] Unknown error when executing operation: Couldn't open URL : Cannot open connection: file/directory 'bucket1/dir' does not exist.
pod sdh/checkpoint-store-administration terminated (Error)
2018-12-05T06:30:42-0500 [ERROR] Connection check failed!
2018-12-05T06:30:42-0500 [ERROR] Checkpoint store validation failed!

2018-12-05T06:30:42-0500 [ERROR] Please reconfigure your checkpoint store connection...

7.2.5. Node goes down when new tenants are created or new users added to SDH

If a node running vsystem-vrep-0 pod goes down when a new SDH tenant or user is created or a new launchpad is accessed, there is probably nfsv4 kernel module not loaded.

Diagnosis

  1. Run oc get pods -o wide in sdh namespace and look for vsystem-vrep-0, which is either not running or restarted.
  2. Determine its corresponding node from the output.
  3. Login to the node via ssh.
  4. Run systemctl and watch it hang.
  5. Run docker ps and watch it hand.

Resolution

  1. Make sure to follow Load nfsd kernel modules to load the necessary kernel modules.
  2. Reboot the hanging node.

7.3. Pipeline Modeler troubleshooting

7.3.1. Graphs cannot be run in the Pipeline Modeler

If in the log of the vflow pod, there are problems with reaching outside of the private network like the following output shows, make sure to verify your proxy settings and make sure that the installation script is run with the following parameters:

# ./install.sh --cluster-http-proxy="${HTTP_PROXY}" --cluster-https-proxy="${HTTPS_PROXY}" --cluster-no-proxy="${NO_PROXY}"

vflow log can be displayed with a command like oc logs $(oc get pods -o name -l vora-component=vflow | head -n 1):

W: Failed to fetch http://deb.debian.org/debian/dists/stretch/InRelease  Could not connect to deb.debian.org:80 (5.153.231.4), connection timed out [IP: 5.153.231.4 80]
W: Failed to fetch http://security.debian.org/debian-security/dists/stretch/updates/InRelease  Could not connect to security.debian.org:80 (217.196.149.233), connection timed out [IP: 217.196.149.233 80]
W: Failed to fetch http://deb.debian.org/debian/dists/stretch-updates/InRelease  Unable to connect to deb.debian.org:http: [IP: 5.153.231.4 80]

7.3.2. Graphs cannot be built by the Pipeline Modeler

If an attempt to run a pipeline fails with a message like the one below, the most probable reason is that SELinux prevents the modeler from accessing the docker socket.

failed to prepare graph description: failed to prepare image: error building docker image. Docker daemon error: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.23/build?buildargs=%7B%22HTTPS_PROXY%22%3A%22%22%2C%22HTTP_PROXY%22%3A%22%22%2C%22NO_PROXY%22%3A%22%22%2C%22http_proxy%22%3A%22%22%2C%22https_proxy%22%3A%22%22%2C%22no_proxy%22%3A%22%22%7D&cachefrom=null&cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&forcerm=1&labels=null&memory=0&memswap=0&networkmode=&rm=1&shmsize=0&t=ip-172-18-11-229.ec2.internal%3A5000%2Fvora%2Fvflow-node-482f9340ff573d1a7a03108d18556792bb70ae2a%3A2.5.29-com.sap.debian&ulimits=null: dial unix /var/run/docker.sock: connect: permission denied

To verify it is indeed an SELinux problem, you can inspect the logs as a root user on a node where the vflow pod is running like this:

# ausearch --input-logs -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -i  | grep docker | tail -n 10
type=AVC msg=audit(05/09/2019 12:59:00.741:170409) : avc:  denied  { connectto } for  pid=119617 comm=tmp.IojHz2WXIo path=/run/docker.sock scontext=system_u:system_r:container_t:s0:c9,c12 tcontext=system_u:system_r:container_runtime_t:s0 tclass=unix_stream_socket permissive=0

The output says that the connection to the docker.sock has been denied to a application with PID 119617 running in a container. This confirms there is an SELinux issue.

7.3.2.1. Determine the Pipeline Modeler's node

To determine the node, where vflow pod runs and where to run the ausearch command above, you can run the following:

# oc get -o wide pods -l datahub.sap.com/app-component=vflow
NAME                                                              READY     STATUS    RESTARTS   AGE       IP            NODE
vflow-9761d22d34f4e7fa22ff797e1e10e22aea9a0771qh2pn-76f84dxrqgm   1/1       Running   0          18m       10.129.0.63   ip-172-18-4-77.ec2.internal

If there are multiple vflow pods, you can filter further by tenant name and tenant user. In the example below it is default and sdhadmin respectively:

# oc get -o wide pods -l datahub.sap.com/app-component=vflow,vsystem.datahub.sap.com/tenant=default,vsystem.datahub.sap.com/user=sdhadmin
NAME                                                              READY     STATUS    RESTARTS   AGE       IP            NODE
vflow-9761d22d34f4e7fa22ff797e1e10e22aea9a0771qh2pn-76f84dxrqgm   1/1       Running   0          18m       10.129.0.63   ip-172-18-4-77.ec2.internal
7.3.2.2. Fix the SELinux issue

Please follow Deploy SDH Observer to automatically patch all recent and future vflow pods if your OCP cluster is 3.10 or newer. Otherwise, please follow Permit Pipeline Modeler to access Docker socket.

7.3.3. Pipeline Modeler cannot push images to the registry

If the SDH is configured to build images with kaniko and the vflow registry is not configured with a certificate signed by a trusted certificate authority, the builder will not be able to push the built images there. The Pipeline Modeler will then label the graphs as dead with a message like the following:

failed to prepare graph description: failed to prepare image: build failed for image: internal-registry.example.org:5000/vora/vflow-node-482f9340ff573d1a7a03108d18556792bb70ae2a:com.sap.debian

To determine the cause, the log of the vflow pod needs to be inspected. There, you can notice the root issue - in this case it is the insecure registry internal-registry.example.org:5000 accessible only via HTTP protocol.

# oc logs $(oc get pods -o name -l vora-component=vflow | head -n 1)
...
INFO[0019] Using files from context: [/workspace/vflow]
INFO[0019] COPY /vflow /vflow
INFO[0019] Taking snapshot of files...
INFO[0023] ENTRYPOINT ["/vflow"]
error pushing image: failed to push to destination internal-registry.example.org:5000/vora/vflow-node-482f9340ff573d1a7a03108d18556792bb70ae2a:com.sap.debian: Get https://internal-registry.example.org:5000/v2/: http: server gave HTTP response to HTTPS client |vflow|container|192|getPodLogs|build.go(126)
...

To resolve it, you can:

7.3.4. Modeler does not run when AWS ECR registry is used

If the initialization of the vflow pod fails with a message like the one below, your SDH deployment suffers from a bug that prevents it from using the AWS IAM Role for authentication against the AWS ECR Registry.

# oc logs $(oc get pods -o name -l vora-component=vflow | head -n 1)
....
2019-07-15 12:23:03.147231|+0000|INFO |Statistics Publisher started with publication interval 30s ms|vflow|statistic|38|loop|statistics_monitor.go(89)
2019-07-15 12:23:30.446482|+0000|INFO |connecting to vrep at vsystem-vrep.sdh:8738|vflow|container|1|NewImageFactory|factory.go(131)
2019-07-15 12:23:30.446993|+0000|INFO |Creating AWS ECR Repository 'sdh26/vora/vflow-node-482f9340ff573d1a7a03108d18556792bb70ae2a'|vflow|container|1|assertRepositoryExists|ecr.go(106)
2019-07-15 12:23:35.001030|+0000|ERROR|API node execution is failed: cannot instantiate docker registry client: failed to assert repository existance: Error creating AWS ECR repository 'sdh26/vora/vflow-node-482f9340ff573d1a7a03108d18556792bb70ae2a': NoCredentialProviders: no valid providers in chain. Deprecated.
        For verbose messaging see aws.Config.CredentialsChainVerboseErrors
failed to create image factory
main.runMaster
        /data/xmake/prod-build7010/w/velocity/.../vflow/src/cmd/vflow/main.go:386
main.main
        /data/xmake/prod-build7010/w/velocity/.../vflow/src/cmd/vflow/main.go:357
runtime.main
        /data/xmake/tools/xmake-tools/FA/org.golang.download.go/go/1.11.4-bin/go/src/runtime/proc.go:201
runtime.goexit
        /data/xmake/tools/xmake-tools/FA/org.golang.download.go/go/1.11.4-bin/go/src/runtime/asm_amd64.s:1333|vflow|vflow|1|main|main.go(359)

The work-around is to use a registry pull secret.


  1. 7.3 is applicable only for OCP 3.9. ↩︎

  2. any storage provisioned by your cloud provider listed in OCP docs (3.11) / (3.10) / 3.9 ↩︎

  3. For example: openshift_docker_insecure_registries=['172.30.0.0/16', 'docker-registry.default.svc:5000', 'My_Image_Registry_FQDN:5000'] ↩︎

  4. See Configuring OpenShift Container Platform for AWS with Ansible for more details. ↩︎

  5. The subdomain is a wildcard domain, which resolves to the OpenShift Router's IP. The router routes requests to the exposed OCP and SDH services based on the target hostname. The domain is needed to Expose SDH services externally↩︎

  6. The environment variable $KUBECONFIG shall be set instead. ↩︎

  7. This setting assumes that all Data Hub services are accessed under the same name using NodePort. However, using OpenShift Routes, each service will be assigned a different hostname. Therefor, for production environment, it is necessary to provide signed certificates for these routes. You may consider configuring a custom wildcard certificate for master default subdomain↩︎

Comments