SAP Data Hub 2 on OpenShift Container Platform 4

Updated -

Table of Contents

In general, the installation of SAP Data Hub Foundation (SDH) follows these steps:

  • Install Red Hat OpenShift Container Platform
  • Configure the prerequisites for SAP Data Hub Foundation
  • Install SAP Data Hub Foundation on OpenShift Container Platform

The last step has three different approaches listed below. Each approach is compatible with this guide. Please refer to the SAP's Documentation for more information (2.7) / (2.6).

  • mpsl - installation using the Maintenance Planner and SL Plugin (recommended by SAP)
  • mpfree - installation using SL Plugin without Maintenance Planner
  • manual - manual installation using an installation script

If you're interested in installation of SAP Data Intelligence, older SDH or SAP Vora releases, please refer to the other installation guides:

1. OpenShift Container Platform validation version matrix

The following version combinations of SDH 2.X, OCP, RHEL or RHCOS have been validated:

SAP Data Hub OpenShift Container Platform Operating System Infrastructure and (Storage)
2.6 4.1 RHCOS (nodes), RHEL 7.6 (Jump) AWS (EBS)
2.6 Patch 1 4.1 RHCOS (nodes), RHEL 7.6 (Jump) AWS (EBS)
2.7 4.1 RHCOS (nodes), RHEL 7.7 (Jump) AWS (EBS)
2.7 Patch 3 4.2 RHCOS (nodes), RHEL 7.8 (Jump) VMware vSphere (OCS 4.2)
2.7 Patch 4 4.2 RHCOS (nodes), RHEL 7.8 (Jump) VMware vSphere (OCS 4.2), (NetApp Trident 20.04)

Please refer to the compatibility matrix for version combinations that are considered as working.

2. Requirements

2.1. Hardware/VM and OS Requirements

2.1.1. OpenShift Cluster

Make sure to consult the following official cluster requirements:

There are 4 kind of nodes:

  • Bootstrap Node - A temporary bootstrap node needed for the OCP deployment. The node can be either destroyed by the installer (using infrastructure-provisioned-installation -- aka IPI) or can be deleted manually by the administrator. Alternatively, it can be re-used as a worker node. Please refer to the Installation process for more information.
  • Master Nodes - The control plane manages the OpenShift Container Platform cluster.
  • Compute Nodes - Run the actual workload - all SDH pods among others.
  • OCS Nodes - Run OpenShift Container Storage (aka OCS) -- currently supported only on AWS and VMware vSphere. The nodes can be divided into starting (running both OSDs and monitors) and additional nodes (running only OSDs). Needed only when OCS shall be used as the backing storage provider.
  • Jump host - The Jump host is used among other things for:

The hardware/software requirements for the Jump host can be:

  • OS: Red Hat Enterprise Linux 7.8, 7.7 or 7.6
  • Diskspace:
    • 70GiB for /:
      • to store the work directory and the installation binaries of SAP Data Hub Foundation
      • including at least 50 GiB for /var/lib/docker
    • Additional 50 GiB for registry's storage if hosting image registry (by default at /var/lib/registry).
2.1.1.1. Minimum Hardware Requirements

The table below lists the minimum requirements and the minimum number of instances for each node type for the latest validated SDH and OCP 4.X releases. This is sufficient of a PoC (Proof of Concept) environments.

Type Count Operating System vCPU RAM (GB) Storage (GB) AWS Instance Type
Bootstrap 1 RHCOS 4 16 120 m4.xlarge
Master 3+ RHCOS 4 16 120 m4.xlarge
Compute 3+ RHEL 7.6 or RHCOS 8 32 120 m4.2xlarge
Jump host 1 RHEL 7.8, 7.7 or 7.6 2 4 120 t2.medium

If using OCS 4, at least additional 3 (starting) nodes are recommended. Alternatively, the Compute nodes outlined above can also run OCS pods. In that case, the hardware specifications need to be extended accordingly. The following table lists the minimum requirements for each additional node:

Type Count Operating System vCPU RAM (GB) Storage (GB) AWS Instance Type
OCS starting (OSD+MON) 3 RHCOS 16 64 120 + 2048 + 10 m5.4xlarge
2.1.1.2. Minimum Production Hardware Requirements

The minimum production requirements for production systems for the latest validated SDH and OCP 4 are the following:

Type Count Operating System vCPU RAM (GB) Storage (GB) AWS Instance Type
Bootstrap 1 RHCOS 4 16 120 m4.xlarge
Master 3+ RHCOS 8 16 120 c5.xlarge
Compute 4+ RHEL 7.6 or RHCOS 16 64 120 m4.4xlarge
Jump host 1 RHEL 7.8, 7.7 or 7.6 2 4 120 t2.medium

If using OCS 4, at least additional 3 (starting) nodes are needed. The following table lists the minimum requirements for each node:

Type Count Operating System vCPU RAM (GB) Storage (GB) AWS Instance Type
OCS starting (OSD+MON) 3 RHCOS 16 64 120 + 3×2048 + 10 m5.4xlarge
OCS additional (OSD) 1 RHCOS 16 64 120 + 3×2048 m5.4xlarge

Please refer to OCS Node Requirements and OCS Sizing and scaling recommendations for more information.

2.2. Software Requirements

2.2.1. Compatibility Matrix

Later versions of SAP Data Hub support newer versions of Kubernetes and OpenShift Container Platform. Even if not listed in the OCP validation version matrix above, the following version combinations are considered fully working:

SAP Data Hub OpenShift Container Platform Worker Node Jump host Infrastructure and (Storage)
2.6 4.1 RHCOS or RHEL 7.6 RHEL 7.8, 7.7 or 7.6 Cloud , VMware vSphere (vSphere volumes )
2.7 4.2 RHCOS or RHEL 7.6 RHEL 7.8, 7.7 or 7.6 Cloud , VMware vSphere (OCS 4) or (NetApp Trident (iSCSI LUNs) ) or (vSphere volumes )

Cloud means any cloud provider supported by OpenShift Container Platform. For a complete list of tested and supported infrastructure platforms, please refer to OpenShift Container Platform 4.x Tested Integrations. The persistent storage in this case must be provided by the cloud provider. Please see refer to Understanding persistent storage for a complete list of supported storage providers.

This persistent storage provider does not offer a supported object storage service required by SDH's checkpoint store and therefor is suitable only for SAP Data Hub development and PoC clusters.

Unless stated otherwise, the compatibility of a listed SDH version covers all its patch releases as well.

2.2.2. Persistent Volumes

Persistent storage is needed for SDH. It is required to use storage that can be created dynamically. You can find more information in the Understanding persistent storage document.

2.2.3. External Image Registry

The SDH installation requires an Image Registry where images are first mirrored from an SAP Registry and then delivered to the OCP cluster nodes. The integrated OpenShift Container Registry is not appropriate for this purpose or may require further analysis. For now an external image registry needs to be setup instead. On AWS, it is recommended to utilize Amazon Elastic Container Registry. Please refer to Using AWS ECR Registry for the Modeler for a post-configuration step to enable the registry for the Modeler.

In AWS, it is recommended to use AWS ECR Registry. Please refer to Prepare for Installation on Amazon Elastic Container Service for Kubernetes (EKS) and Using AWS ECR Registry for the Modeler for more information.

If there is not a suitable external registry already available, the Jump host can be used to host the registry. Please follow the steps in article How do I setup/install a Docker registry?.

When finished you should have an external image registry up and running at the URL My_Image_Registry_FQDN:5000. You can verify that with the following command.

# curl http://My_Image_Registry_FQDN:5000/v2/
{}

Make sure to mark the address as insecure.

Additionally, make sure to mark the registry as insecure within the Pipeline Modeler.

2.2.3.1. Update the list of insecure registries
  • Since the external image registry deployed above is insecure by default, in order to push images to the image registry and pull them on nodes it must be listed as insecure in /etc/containers/registries.conf file on all the hosts, including the Jump host:

    # vi /etc/containers/registries.conf
    ...
    [registries.insecure]
    registries = [
      "My_Image_Registry_FQDN:5000"
    ]
    ...
    
  • On the Jump host, the change is achieved simply by editing the file and restarting the host:

    # systemctl restart docker
    
  • On the OCP nodes, the configuration is achieved by adding the "My_image_Registry_FQDN:5000" to the list of .spec.registrySources.insecureRegistries in the resource image.config.openshift.io/cluster. Please refer to Configuring image settings (4.2) / (4.1) for more information.

If you plan to run the installation on the Jump host as a non-root user, please check the instructions below for additional steps.

NOTE These settings have no effect on the Kaniko Image Builder, which also needs to be aware of the insecure registry. Please refer to Marking the vflow registry as insecure for more information.

2.2.4. Checkpoint store enablement

In order to enable SAP Vora Database streaming tables, checkpoint store needs to be enabled. The store is an object storage on a particular storage back-end. Several back-end types are supported by the SDH installer that cover most of the storage cloud providers.

The enablement is strongly recommended for production clusters. Clusters having this feature disabled are suitable for development or as PoCs.

Make sure to create a desired bucket before the SDH Installation. If the checkpoint store shall reside in a directory on a bucket, the directory needs to exist as well.

3. Install Red Hat OpenShift Container Platform

3.1. Prepare the Jump host

For SAP recommendations, please refer to Installation Host

  1. Subscribe the Jump host at least to the following repositories:

    # OCP_RELEASE=4.2
    # sudo subscription-manager repos        \
        --enable=rhel-7-server-rpms          \
        --enable=rhel-7-server-extras-rpms   \
        --enable=rhel-7-server-optional-rpms \
        --enable=rhel-7-server-ose-${OCP_RELEASE:-4.2}-rpms
    
  2. Install jq binary. In order to do that, EPEL (Extra Packages for Enterprise Linux) repository needs to be enabled:

    # sudo yum install -y yum-utils https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
    # sudo yum-config-manager --enable epel
    # sudo yum install -y jq
    

    Optionally, you can disable EPEL repository afterwards:

    # sudo yum-config-manager --disable epel
    
  3. Install a helm client on the Jump host.

    • Download a script from https://github.com/helm/helm and execute it with the desired version set to the latest major release of helm for your SDH release. That is v2.12 for SDH 2.6 and v2.13 for SDH 2.7.

      # DESIRED_VERSION=v2.13.1
      # curl --silent https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | \
          DESIRED_VERSION="${DESIRED_VERSION:-v2.13.1}" bash
      
  4. Download and install kubectl.

    • Either via standard repositories by installing atomic-openshift-clients:

      # sudo yum install -y openshift-clients
      

      NOTE: rhel-7-server-ose-X.Y-rpms repositories corresponding to the same major release version (e.g. 4.2) as on the cluster nodes need to be enabled.

    • Or by downloading and installing the binary manually after determining the right version:

      # curl -LO https://dl.k8s.io/release/${ver:-v1.13.12}/bin/linux/amd64/kubectl
      # chmod +x ./kubectl
      # sudo mv ./kubectl /usr/local/bin/kubectl
      
  5. In case of mpsl and mpfree SDH installation methods, make sure to install and run the SAP Host Agent as well.

    • However, in step 4, instead of downloading a *.SAR archive, as suggested by the guide, on RHEL it is recommended to download the latest RPM package (e.g. saphostagentrpm_45-20009394.rpm) and install it on the Jump host using a command like:

      # yum install saphostagentrpm_45-20009394.rpm
      
    • This way, the installation of SAPCAR listed in prerequisites is not needed.

    • Step 6 (SAR archive extraction) can then be skipped.
    • In the step 7, the command then needs to be modified to:

      # cd /usr/sap/hostctrl/exe
      # ./saphostexec -setup slplugin -passwd
      
    • Additionally, make sure to set the password for sapadm user. You will be prompted for the username and password by the maintenance planner.

      # passwd sapadm
      

3.2. Install OpenShift Container Platform

Install OpenShift Container Platform on your desired cluster hosts. Follow the OpenShift installation guide (4.2) / (4.1).

If you choose the Installer Provisioned Infrastructure (IPI) (4.2) / (4.1), please follow the Installing a cluster on AWS with customizations (4.2) / (4.1) methods to allow for customizations.

On VMware vSphere, please follow Installing a cluster on vSphere.

Several changes need to be done to the compute nodes running SDH workloads before SDH installation. These include:

  1. choose a sufficient number and type of compute instances for SDH workload
  2. pre-load needed kernel modules
  3. increasing the pids limit of CRI-O container engine
  4. configure insecure registry (if an insecure registry shall be used)

The first two items can be performed during or after OpenShift installation. The others only after the OpenShift installation.

3.2.1. Customizing IPI or UPI installation on AWS or WMvare vSphere

In order to allow for customizations, the installation need to be performed in steps:

  1. create the installation configuration file

    followed up by Modifying the installation ocnfiguration file

  2. create the ignition configuration files

    followed up by Modifying the worker ignition configuration file

  3. create the cluster

3.2.1.1. Modifying the installation configuration file

After the configuration file is created by the installer, you can specify the desired instance type of compute nodes by editing <installation_directory>/install-config.yaml. A shortened example for AWS could look like this:

apiVersion: v1
compute:
- hyperthreading: Enabled
  name: worker
  platform:
    aws:
      region: us-east-1
      type: m4.2xlarge
  replicas: 3

On AWS, to satisfy the SDH's production requirements, you can change the compute.platform.aws.type to r5.2xlarge and compute.replicas to 4.

For VMware vSphere, take a look at Sample install-config.yaml file.

Then continue to generate the ignition configuration files from the installation file by running the following command. Note that the configuration file will be deleted. Therefor, it may make sense to back it up first.

# openshift-install create ignition-configs --dir <installation_directory>
3.2.1.2. Modifying the worker ignition configuration file

The following kernel modules need to be pre-loaded on compute nodes running SDH workloads: iptable_filter, iptable_nat, ip_tables, ipt_owner, ipt_REDIRECT, nfsd and nfsv4.

That can be achieved by creating a file in /etc/modules-load.d on each node containing the list of the modules. The following snippet shows, how to do it in bash. Make sure to install jq application/RPM package and to change to the correct <installation_directory>.

# pushd <installation_directory>
# modules=( nfsd nfsv4 ip_tables ipt_REDIRECT ipt_owner )
# content="$(printf '%s\n' "${modules[@]}" | base64 -w0)"
# cp worker.{,bak.}ign                      # make a backup of the ignition file
# jq -c '.storage |= {"files": ((.files // []) + [{
    "contents": {
      "source": "data:text/plain;charset=utf-8;base64,'"${content}"'",
      "verification": {}
    },
    "filesystem": "root",
    "mode": 420,
    "path": "/etc/modules-load.d/sap-datahub-dependencies.conf"
}])} | .systemd |= {"units": ((.units // []) + [{
    "contents": "[Unit]\nDescription=Pre-load kernel modules for SAP Data Intelligence\nAfter=network.target\n\n[Service]\nType=oneshot\nExecStart=/usr/sbin/modprobe iptable_nat\nExecStart=/usr/sbin/modprobe iptable_filter\nRemainAfterExit=yes\n\n[Install]\nWantedBy=multi-user.target",
    "enabled": true,
    "name": "sdh-modules-load.service"
}])}' worker.bak.ign >worker.ign
# popd
3.2.1.3. (IPI only) Continue the installation by creating the cluster

To continue the IPI (e.g. on AWS) installation, execute the following command:

# openshift-install create cluster --dir <installation_directory>

3.3. OCP Post Installation Steps

3.3.1. (optional) Install Dynamic Persistent Storage Provider

3.3.1.1. Install OpenShift Container Storage

On AWS and WMware vSphere platforms, you have the option to deploy OCS to host the persistent storage for Data Hub. Please refer to the OCS documentation.

3.3.1.2. Install NetApp Trident

NetApp Trident has been validated for SAP Data Hub and OpenShift. More details can be found at SAP Data Hub 2 on OpenShift 4 with NetApp Trident.

3.3.2. Change the count and instance type of compute nodes

Please refer to Creating a MachineSet (4.2) / (4.1) for changing an instance type and Manually scaling a MachineSet (4.2) / (4.1) or Applying autoscaling to an OpenShift Container Platform cluster (4.2) / (4.1) for information on scaling the nodes.

3.3.3. Pre-load needed kernel modules

To apply the desired changes to the existing compute nodes, all that is necessary is to create a new machine config that will be merged with the existing configuration:

# modules=( nfsd nfsv4 ip_tables ipt_REDIRECT ipt_owner )
# content="$(printf '%s\n' "${modules[@]}" | base64 -w0)"
# oc create -f - <<EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 75-worker-sap-datahub
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: "data:text/plain;charset=utf-8;base64,$content"
          verification: {}
        filesystem: root
        mode: 420
        path: /etc/modules-load.d/sap-datahub-dependencies.conf
    systemd:
      units:
      - contents: |
          [Unit]
          Description=Pre-load kernel modules for SAP Data Hub
          After=network.target

          [Service]
          Type=oneshot
          ExecStart=/usr/sbin/modprobe iptable_nat
          ExecStart=/usr/sbin/modprobe iptable_filter
          RemainAfterExit=yes

          [Install]
          WantedBy=multi-user.target
        enabled: true
        name: sdh-modules-load.service
EOF

The changes will be rendered into machineconfigpool/worker. The workers will be restarted one-by-one until the changes are applied to all of them. See Applying configuration changes to the cluster (4.2) / (4.1) for more information.

3.3.4. Change the maximum number of PIDs per Container

The process of configuring the nodes is described at Modifying Nodes (4.2) / (4.1). In SDH case, the required settings are .spec.containerRuntimeConfig.pidsLimit in a ContainerRuntimeConfig. The result is a modified /etc/crio/crio.conf configuration file on each affected worker node with pids_limit set to the desired value.

  1. Label the particular pool of worker nodes.

    # oc label machineconfigpool/worker workload=sapdatahub
    
  2. Create the following ContainerRuntimeConfig resource.

    # oc create -f - <<EOF
    apiVersion: machineconfiguration.openshift.io/v1
    kind: ContainerRuntimeConfig
    metadata:
     name: bumped-pid-limit
    spec:
     machineConfigPoolSelector:
       matchLabels:
         workload: sapdatahub
     containerRuntimeConfig:
       pidsLimit: 16384
    EOF
    
  3. Wait until the machineconfigpool/worker becomes updated.

    # watch oc get  machineconfigpool/worker
    NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED
    worker   rendered-worker-8f91dd5fdd2f6c5555c405294ce5f83c   True      False      False
    

3.3.5. Deploy persistent storage provider

Unless your platform already offers a supported persistent storage provider, one needs to be deployed. Please refer to Understanding persistent storage (4.2) / (4.1) for an overview of possible options.

On AWS and VMware, one can deploy (OpenShift Container Storage (OCS) running converged on OCP nodes providing both persistent volumes and object storage. Please refer to OCS Planning your Deployment and Deploying OpenShift Container Storage for more information and installation instructions.

3.3.6. Configure S3 access and bucket

Object storage is required for checkpoint store feature providing regular back-ups of its database. Several interfaces to the object storage are supported by SDH. S3 interface is one of several. Please take a look at Checkpoint Store Type at Required Input Parameters for the complete list.

SAP help page covers preparation of object store for a couple of cloud service providers.

3.3.6.1. Using NooBaa as object storage gateway

OCS contains NooBaa object data service for hybrid and multi cloud environments which provides S3 API one can use with SAP Data Hub. For SDH, one needs to provide the following:

  • S3 host URL prefixed either with https:// or http://
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • bucket name

Once OCS is deployed, one can create the access keys and bucket using one of the following:

  • via NooBaa Management Console by default exposed at noobaa-mgmt-openshift-storage.apps.<cluster_name>.<base_domain>
  • via OpenShift command line interface covered below

In both cases, the S3 endpoint provided to the SAP Data Hub cannot be secured with a self-signed certificate because a custom CA file cannot be passed to SAP Data Hub. Unless NooBaa's endpoints are secured with a proper signed certificate, one must use insecure HTTP connection. NooBaa comes with such an insecure service reachable at the following URL, where s3 stands for a service name and openshift-storage for namespace where OCS is installed:

http://s3.openshift-storage.svc.cluster.local

The service is resolvable only within the cluster. One cannot reach this URL from outside of the cluster. One can verify that the service is available with the following command.

# oc get svc -n openshift-storage -l app=noobaa
NAME          TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                                   AGE
noobaa-mgmt   LoadBalancer   172.30.154.162   <pending>     80:31351/TCP,443:32681/TCP,8444:31938/TCP,8445:31933/TCP,8446:31943/TCP   7d1h
s3            LoadBalancer   172.30.44.242    <pending>     80:31487/TCP,443:30071/TCP                                                7d1h
3.3.6.1.1. Creating an S3 bucket using CLI

The bucket can be created with the command below. Make sure to double-check storage class name (e.g. using oc get sc). It can live in any OpenShift project (e.g. sdh-infra). Be sure to switch to appropriate project/namespace (e.g. sdh) first before executing the following.

# claimName="sdh-checkpoint-store"
# oc create -f - <<EOF
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: ${claimName}
spec:
  generateBucketName: ${claimName}
  storageClassName: openshift-storage.noobaa.io
EOF

After a while, the object bucket will be created, the claim will get bound and the secret with the same name (sdh-checkpoint-store) as the ObjectBucketClaim (aka obc) will be created. When ready, the obc will be bound:

# oc get obc -w
NAME                   STORAGE-CLASS                 PHASE   AGE
sdh-checkpoint-store   openshift-storage.noobaa.io   Bound   41s

The name of the created bucket can be determined with the following command:

# oc get cm sdh-checkpoint-store -o jsonpath=$'{.data.BUCKET_NAME}\n'
sdh-checkpoint-store-ef4999e0-2d89-4900-9352-b1e1e7b361d9

To determine the access keys, execute the following in bash:

# claimName=sdh-checkpoint-store
# printf 'Bucket/claim %s:\n  Bucket name:\t%s\n' "$claimName" "$(oc get obc -o jsonpath='{.spec.bucketName}' "$claimName")"; \
  for key in AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY; do
  printf '  %s:\t%s\n' "$key" "$(oc get secret "$claimName" -o jsonpath="{.data.$key}" | base64 -d)"
done | column -t -s $'\t'

An example output value can be:

Bucket/claim sdh-checkpoint-store:
  Bucket name:                      sdh-checkpoint-store-ef4999e0-2d89-4900-9352-b1e1e7b361d9
  AWS_ACCESS_KEY_ID:                LQ7YciYTw8UlDLPi83MO
  AWS_SECRET_ACCESS_KEY:            8QY8j1U4Ts3RO4rERXCHGWGIhjzr0SxtlXc2xbtE

The values of sdh-checkpoint-store shall be passed to the following SLC Bridge parameters during SDH's installation in order to enable checkpoint store.

Parameter Example value
Amazon S3 Access Key LQ7YciYTw8UlDLPi83MO
Amazon S3 Secret Access Key 8QY8j1U4Ts3RO4rERXCHGWGIhjzr0SxtlXc2xbtE
Amazon S3 bucket and directory sdh-checkpoint-store-ef4999e0-2d89-4900-9352-b1e1e7b361d9
Amazon S3 Region (optional) ``

please leave unset

During a manual installation, one can use the above information like this:

Please enter S3 access key: LQ7YciYTw8UlDLPi83MO
Please enter S3 secret access key: 8QY8j1U4Ts3RO4rERXCHGWGIhjzr0SxtlXc2xbtE
Please enter S3 host (empty for default 'https://s3.amazonaws.com'): http://s3.openshift-storage.svc.cluster.local
Please enter S3 region you want to connect to (empty for default 'us-east-1'):
Please enter S3 bucket and directory (in the form my-bucket/directory): sdh-checkpoint-store-ef4999e0-2d89-4900-9352-b1e1e7b361d9

3.3.7. Set up an External Image Registry

If you haven't done so already, please follow the External Image Registry prerequisite.

3.3.8. Configure an insecure registry

In order for OpenShift to deploy SDH images from an insecure registry, the cluster needs to be told to treat the registry as insecure. Please follow Configuring image settings (4.2) / (4.1) and the registry to the .spec.registrySources.insecureRegistries list. For example:

apiVersion: config.openshift.io/v1
kind: Image
metadata:
  annotations:
    release.openshift.io/create-only: "true"
  name: cluster
spec:
  registrySources:
    insecureRegistries:
    - My_Image_Registry_FQDN:5000

NOTE: it may take a couple of tens of minutes until the nodes are reconfigured. You can use the following commands to monitor the progress:

  • watch oc get machineconfigpool
  • watch oc get nodes

3.3.9. Configure the OpenShift Cluster for SDH

3.3.9.1. Becoming a cluster-admin

Many commands below require cluster admin privileges. To become a cluster-admin, you can do one of the following:

  • Use the auth/kubeconfig generated in the working directory during the installation of the OCP cluster:

    INFO Install complete!
    INFO Run 'export KUBECONFIG=<your working directory>/auth/kubeconfig' to manage the cluster with 'oc', the OpenShift CLI.
    INFO The cluster is ready when 'oc login -u kubeadmin -p <provided>' succeeds (wait a few minutes).
    INFO Access the OpenShift web-console here: https://console-openshift-console.apps.demo1.openshift4-beta-abcorp.com
    INFO Login to the console with user: kubeadmin, password: <provided>
    # export KUBECONFIG=working_directory/auth/kubeconfig
    # oc whoami
    system:admin
    
  • As a system:admin user or a member of cluster-admin group, make another user a cluster admin to allow him to perform the SDH installation:

    1. As a cluster-admin, configure the authentication and add the desired user (e.g. dhadmin).
    2. As a cluster-admin, grant the user a permission to administer the cluster:

      # oc adm policy add-cluster-role-to-user cluster-admin dhadmin
      

You can learn more about the cluster-admin role in Cluster Roles and Local Roles article (4.2) / (4.1).

3.3.9.2. Project setup
3.3.9.2.1. Create privileged tiller service account

As a cluster-admin, create a tiller service account in the kube-system project (aka namespace) and grant it the necessary permissions:

# oc create sa -n kube-system tiller
# oc adm policy add-cluster-role-to-user cluster-admin -n kube-system -z tiller
3.3.9.2.2. Initialize helm

Set up helm and tiller for the deployment:

# helm init --service-account=tiller --upgrade --wait

Upon successful initialization, you should be able to see a tiller pod in the kube-system namespace:

# oc get pods -n kube-system
NAME                            READY     STATUS    RESTARTS   AGE
tiller-deploy-551988758-dzjx5   1/1       Running   0          1m
# helm ls
[There should be no error in the output. If there is no output at all, it means good news, no error]
3.3.9.2.3. Create sdh project

Create a dedicated project in OpenShift for the SDH deployment. For example sdh. Login to OpenShift as a cluster-admin, and perform the following configurations for the installation:

# oc new-project sdh
# oc adm policy add-scc-to-group anyuid "system:serviceaccounts:$(oc project -q)"
# oc adm policy add-scc-to-group hostmount-anyuid "system:serviceaccounts:$(oc project -q)"
# oc adm policy add-scc-to-user privileged -z "vora-vsystem-$(oc project -q)"
# oc adm policy add-scc-to-user privileged -z "vora-vsystem-$(oc project -q)-vrep"
# oc adm policy add-scc-to-user privileged -z "$(oc project -q)-elasticsearch"
# oc adm policy add-scc-to-user privileged -z "$(oc project -q)-fluentd"
# oc adm policy add-scc-to-user privileged -z "default"
# oc adm policy add-scc-to-user privileged -z "vora-vflow-server"
3.3.9.2.4. Allow administrator to manage SDH resources

As a cluster-admin, allow the project administrator to manage SDH custom resources.

# oc create -f - <<EOF
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: aggregate-sapvc-admin-edit
  labels:
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
rules:
- apiGroups: ["sap.com"]
  resources: ["voraclusters"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete", "deletecollection"]
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: aggregate-sapvc-view
  labels:
    # Add these permissions to the "view" default role.
    rbac.authorization.k8s.io/aggregate-to-view: "true"
rules:
- apiGroups: ["sap.com"]
  resources: ["voraclusters"]
  verbs: ["get", "list", "watch"]
EOF
3.3.9.2.5. Mark external image registry as insecure

When Kaniko Image Builder shall be enabled and the external image registry is not secured with a certificate signed by a trusted certificate authority, the registry needs to be marked as insecure.

3.3.9.2.6. Deploy SDH Observer

Deploy sdh-observer in the sdh namespace. Please follow the appendix Deploy SDH Observer .

4. Install SDH on OpenShift

4.1. Required Input Parameters

A few important installation parameters are described below. Please refer to the Required Input Parameters (2.7) / (2.6) for their full description. Most of the parameters must be provided for the mpsl and mpfree installation methods. The Command line argument column describes corresponding options for the manual installation.

Name Condition Recommendation Command line argument
Kubernetes Namespace Always Must match the project name chosen in the Project Setup (e.g. sdh) -n sdh
Installation Type Installation or Update Choose Advanced Installation if you need to specify proxy settings or you want to choose particular storage class or there is no default storage class set. None
KUBECONFIG File Always The path to the kubeconfig file on the Jump host. It is the same file as described in Becoming a cluster-admin. If the SAP Host Agent is running on the master host, it can be set to /root/.kube/config. None1
Container Registry Installation Must be set to the external image registry. -r My_Image_Registry_FQDN:5000
Container Registry Settings for Pipeline Modeler Advanced Installation Shall be changed if the same registry is used for more than one SAP Data Hub instance. Either another <registry> or a different <prefix> or both will do. --vflow-registry <registry>/<prefix>
Certificate Domain2 Installation Shall be set either to 1. the external FQDN of the OpenShift node used to access the vsystem service if using NodePort or 2. the FQDN of the vsystem route. --cert-domain vsystem-sdh.apps.<cluster_name>.<base_domain>
Custom Certificate None3 Custom certificate for SDH services. Ideally a trusted wildcard certificate for applications under the .apps sub-domain should be configured. External traffic to the Cluster is managed by the Ingress Operator. --provide-certs
Cluster Proxy Settings Advanced Installation or Advanced Updates Make sure to configure this if the traffic to internet needs to be routed through a proxy. None
Cluster HTTP Proxy Advanced Installation or Advanced Updates Make sure to set this if the traffic to internet needs to be routed through a proxy. --cluster-http-proxy "$HTTP_PROXY"
Cluster HTTPS Proxy Advanced Installation or Advanced Updates Make sure to set this if the traffic to internet needs to be routed through a proxy. --cluster-https-proxy "$HTTPS_PROXY"
Cluster No Proxy Advanced Installation or Advanced Updates Make sure to set this if the traffic to internet needs to be routed through a proxy. --cluster-https-proxy "$NO_PROXY"
Checkpoint Store Configuration Installation Recommended for production deployments. --enable-checkpoint-store
Checkpoint Store Type If Checkpoint Store Configuration parameter is enabled. Set to s3 if using for example OCS's NooBaa service as the object storage. See Using NooBaa as object storage gateway for more details. --checkpoint-store-type s3
StorageClass Configuration Advanced Installation Configure this if you want to choose different dynamic storage provisioners for different SDH components or if there's no default storage class set or you want to choose non-default storage class for the SDH components. None
Default StorageClass Advanced Installation and if storage classes are configured Set this if there's no default storage class set or you want to choose non-default storage class for the SDH components. --pv-storage-class ceph-rbd
Additional Installer Parameters Advanced Installation Useful for reducing the minimum memory requirements of the HANA pod, enabling Kaniko Image Builder and much more. -e vora-cluster.components.dlog.replicationFactor=2--enable-kaniko=yes
AWS IAM Role for SAP Data Hub Modeler Advanced Installation and if the Pipeline Modeler registry is an Amazon Elastic Container Registry Do not specify. Configure the registry secret instead. --vflow-aws-iam-role
Docker Container Log Path Configuration Advanced Installation Do not configure. Configuring this will not address problems with fluentd pods. The fluentd daemonset shall be patched by SDH Observer. --docker-log-path

4.2. Kaniko Image Builder

By default, Pipeline Modeler (vflow) pod uses Docker Daemon on the node, where it runs, to build container images before they are run. This was possible on OCP releases prior to 4.0. Since then, OCP uses CRI-O container runtime.

To enable Pipeline Modeler to build images on recent OCP releases, it must be configured to use kaniko image builder. This is achieved by passing --enable-kaniko=yes parameter to the install.sh script during the manual installation. For the other installation methods, one can enable it by appending --enable-kaniko=yes to SLP_EXTRA_PARAMETERS (Additional Installation Parameters).

4.2.1. Registry requirements for the Kaniko Image Builder

The Kaniko Image Builder supports out-of-the-box only connections to secure image registries with a certificate signed by a trusted certificate authority.

In order to use an insecure image registry (e.g. the proposed external image registry) in combination with the builder, the registry must be whitelisted in Pipeline Modeler by marking it as insecure.

4.3. Installation using the Maintenance Planner and SL Plugin (mpsl)

Is a web-based installation method recommended by SAP offering you an option to send analytics data and feedback to SAP. All the necessary prerequisites have been satisfied by applying all the steps described above. The Installing SAP Data Hub using SLC Bridge (SL Plugin) with Maintenance Planner and SAP Host Agent (2.7) / (2.6) documentation will guide you through the process.

NOTE: Make sure to enable Advanced Installation and to add --enable-kaniko=yes parameter to the Advanced Installer Parameters.

4.4. Installation using SL Plugin without Maintenance Planner (mpfree)

Is an alternative command-line-based installation method. Please refer to the SAP Data Hub documentation (2.7) / (2.6) for more information and the exact procedure.

NOTE: Make sure to enable Advanced Installation and to add --enable-kaniko=yes parameter to the Advanced Installer Parameters.

4.5. Manual Installation using an installation script (manual)

4.5.1. Download and unpack the SDH binaries

Download and unpack SDH installation binary onto the Jump host.

  1. Go to SAP Software Download Center, login with your SAP account and search for DATA HUB 2 or access this link.

  2. Download the SAP Data Hub Foundation file, for example: DHFOUNDATION07_3-80004015.ZIP (SAP DATA HUB - FOUNDATION 2.7).

  3. Unpack the installer file. For example, when you unpack the DHFOUNDATION07_3-80004015.ZIP package, it will create the installation folder SAPDataHub-2.7.155-Foundation.

    # unzip DHFOUNDATION07_3-80004015.ZIP
    

4.5.2. Install SAP Data Hub

4.5.2.1. Remarks on the installation options
Feature Installation method Parameter Name
Storage Class Manual --pv-storage-class="$storage_class_name"
Advanced mpsl or mpfree installations Default storage class
Kaniko Image Builder Manual --enable-kaniko=yes
Advanced mpsl or mpfree installations Additional Installation Parameters or SLP_EXTRA_PARAMETERS
Diagnostics Node Exporter Manual -e=vora-diagnostics.resources.prometheusNodeExporter.resources.limits.cpu=200m
-e=vora-diagnostics.resources.prometheusNodeExporter.resources.limits.memory=100M
Advanced mpsl or mpfree installations Additional Installation Parameters or SLP_EXTRA_PARAMETERS

Storage Class

When there is no default dynamic storage provisioner defined, the preferred one needs to be specified explicitly.
If the default dynamic storage provisioner has been defined, the parameter can be omitted. To define the default dynamic storage provisioner, please follow the document Changing the Default StorageClass (4.2) / (4.1).

Kaniko Image Builder

Can be enabled starting from 2.5. It became necessary to enable on OCP releases 4.x. During mpsl and mpfree installation methods, one needs to append --enable-kaniko=yes to the list of Additional Installation Parameters or SLP_EXTRA_PARAMETERS.
See Kaniko Image Builder for more information.

Diagnostics Node Exporter (SDH 2.6 only)

The Node Exporter pods deployed as part of Data Hub's diagnostics suite consume more resources than is allowed by their limits. To prevent from installation failures, make sure to increase their limits with the following parameters passed either to the ./install.sh script during the manual installation or as Additional Installation Parameters.

-e=vora-diagnostics.resources.prometheusNodeExporter.resources.limits.cpu=200m
-e=vora-diagnostics.resources.prometheusNodeExporter.resources.limits.memory=100M

The issue is documented as Diagnostics Prometheus Node Exporter pods not starting.

4.5.2.2. Executing the installation script

Run the SDH installer as described in Manually Installing SAP Data Hub on the Kubernetes Cluster (2.7) / (2.6).

4.6. SDH Post installation steps

4.6.1. (Optional) Expose SDH services externally

There are multiple possibilities how to make SDH services accessible outside of the cluster. Compared to Kubernetes, OpenShift offers additional method, which is recommended for most of the scenarios including SDH services. It's based on OpenShift Ingress Operator. The other methods documented in the official SAP Data Hub documentation (2.7) / (2.6) are still available.

4.6.1.1. Using OpenShift Ingress Operator

OpenShift allows you to access the Data Hub services via Ingress Controllers as opposed to regular NodePorts. For example, instead of accessing the vsystem service via https://worker-node.example.com:32322, after the service exposure, you will be able to access it at https://vsystem-sdh.apps.<cluster_name>.<base_domain>. This is an alternative to the official guide documentation to Expose the Service From Outside the Network (2.7) / (2.6).

There are two kinds of routes. The reencrypt kind, allows for a custom signed or self-signed certificate to be used. The other is a passthrough kind which uses the pre-installed certificate generated by the installer or passed to the installer.

4.6.1.1.1. Export services with an reencrypt route

With this kind of route, different certificates are used on client and service sides of the route. The router stands in the middle and re-encrypts the communication coming from either side using a certificate corresponding to the opposite side. In this case, the client side is secured by a provided certificate and the service side is encrypted with the original certificate generated or passed to the SAP Data Hub installer.

The reencrypt route allows for securing the client connection with a proper signed certificate.

  1. Look up the vsystem service:

    # oc project sdh            # switch to the Data Hub project
    # oc get services | grep "vsystem "
    vsystem   ClusterIP   172.30.227.186   <none>   8797/TCP   19h
    

    When exported, the resulting hostname will look like vsystem-${SDH_NAMESPACE}.apps.<cluster_name>.<base_domain>. However, an arbitrary hostname can be chosen instead as long as it resolves correctly to the IP of the router.

  2. (optional) Get or generate the certificates. In the following example, a self-signed certificate is created. Unless generated and provided, the ingress operator will use the default router certificate signed with a CA certificate located in the router-ca secret in the openshift-ingress-operator namespace.

    # openssl genpkey -algorithm RSA -out sdhroute-privkey.pem \
            -pkeyopt rsa_keygen_bits:2048 -pkeyopt rsa_keygen_pubexp:3
    # openssl req -new -key sdhroute-privkey.pem -out sdhroute.csr \
            -subj "/C=DE/ST=BW/L=Walldorf/O=SAP SE/CN=vsystem-$(oc project -q).apps.<cluster_name>.<base_domain>"
    # openssl x509 -req -days 365 -in sdhroute.csr -signkey sdhroute-privkey.pem -out sdhroute.crt
    

    Please refer to Generating Certificates for more information.
    If you want to export more SDH services without using multiple certificates, the Common Name (CN) attribute could be set to the *.apps.<cluster_name>.<base_domain> instead, which will match all its possible subdomains.
    The sdhroute.crt self-signed certificate can be imported to a web browser or passed to any other vsystem client.

  3. Obtain the SDH's root certificate authority bundle generated at the SDH's installation time. The generated bundle is available at SAPDataHub-*-Foundation/deployment/vsolutions/certs/vrep/ca/ca.crt when installing manually. But it is also available as in the ca-bundle.pem secret in the sdh namespace.

    # # in case of manual installation
    # cp path/to/SAPDataHub-*-Foundation/deployment/vsolutions/certs/vrep/ca/ca.crt sdh-service-ca-bundle.pem
    # # otherwise get it from the ca-bundle.pem secret
    # oc get -o go-template='{{index .data "ca-bundle.pem"}}' secret/ca-bundle.pem | base64 -d >sdh-service-ca-bundle.pem
    
  4. Create the reencrypt route for the vsystem service like this:

    # oc create route reencrypt --cert=sdhroute.crt --key=sdhroute-privkey.pem \
        --dest-ca-cert=sdh-service-ca-bundle.pem --service=vsystem
    # oc get route
    NAME      HOST/PORT                                      SERVICES  PORT      TERMINATION  WILDCARD
    vsystem   vsystem-sdh.apps.<cluster_name>.<base_domain>  vsystem   vsystem   reencrypt    None
    
4.6.1.1.2. Export services with a passthrough route

With the passthrough route, the communication is encrypted by the SDH service's certificate all the way to the client. It can be treated as secure by the clients as long as the SDH installer has been given proper Certificate Domain to generate a certificate having the Common Name matching the route's hostname and the certificate is imported or passed to the client.

  1. Obtain the SDH's root certificate as documented in step 3 of Export services with an reencrypt route.
  2. Print its attributes and make sure its Common Name (CN) matches the expected hostname or wildcard domain.

    # openssl x509 -noout -subject -in sdh-service-ca-bundle.pem
    subject= /C=DE/ST=BW/L=Walldorf/O=SAP SE/CN=*.apps.<cluster_name>.<base_domain>
    

    In this case, the certificate will be valid for any subdomain of the .apps.<cluster_name>.<base_domain>.

  3. Look up the vsystem service:

    # oc project sdh            # switch to the Data Hub project
    # oc get services | grep "vsystem "
    vsystem   ClusterIP   172.30.227.186   <none>   8797/TCP   19h
    
  4. Create the route:

    # oc create route passthrough --service=vsystem
    # oc get route
    NAME      HOST/PORT                                      PATH  SERVICES  PORT      TERMINATION  WILDCARD
    vsystem   vsystem-sdh.apps.<cluster_name>.<base_domain>        vsystem   vsystem   passthrough  None
    

    Verify that it matches the Common Name (CN) attribute from step 2. You can modify the hostname with --hostname parameter. Make sure it resolves to the router's IP.

  5. Import the self-signed certificate to your web browser and access the SDH web console at https://vsystem-sdh.apps.<cluster_name>.<base_domain> to verify.

4.6.1.2. Using NodePorts

NOTE For OpenShift, an exposure using routes is preferred.

Exposing SAP Data Hub vsystem

  • Either with an auto-generated node port:

    # oc expose service vsystem --type NodePort --name=vsystem-nodeport --generator=service/v2
    # oc get -o jsonpath=$'{.spec.ports[0].nodePort}\n' services vsystem-nodeport
    30617
    
  • Or with a specific node port (e.g. 32123):

    # oc expose service vsystem --type NodePort --name=vsystem-nodeport --generator=service/v2 --dry-run -o yaml | \
        oc patch -p '{"spec":{"ports":[{"port":8797, "nodePort": 32123}]}}' --local -f - -o yaml | oc create -f -
    

The original service remains accessible on the same ClusterIP:Port as before. Additionally, it is now accessible from outside of the cluster under the node port.

Exposing SAP Vora Transaction Coordinator and HANA Wire

# oc expose service vora-tx-coordinator-ext --type NodePort --name=vora-tx-coordinator-nodeport --generator=service/v2
# oc get -o jsonpath=$'tx-coordinator:\t{.spec.ports[0].nodePort}\nhana-wire:\t{.spec.ports[1].nodePort}\n' \
    services vora-tx-coordinator-nodeport
tx-coordinator: 32445
hana-wire:      32192

The output shows the generated node ports for the newly exposed services.

4.6.2. (AWS only) Configure registry secret for the Modeler

Please follow Using AWS ECR Registry for the Modeler if this registry shall be used for the Pipeline Modeler.

4.6.3. SDH Validation

Validate SDH installation on OCP to make sure everything works as expected. Please follow the instructions in Testing Your Installation (2.7) / (2.6).

5. Appendix

5.1. SDH uninstallation

Choose one of the uninstallation methods based on what method has been used for the installation.

When done, you may continue with a new installation round in the same or another namespace.

5.1.1. Using the SL Plugin

Please follow the SAP documentation Uninstall SAP Data Hub Using the SL Plugin (2.7) / (2.6) if you have installed SDH using either mpsl or mpfree methods.

(SDH 2.7 only) if a complete uninstallation is desired (purge), one needs to additionally remove also ${NAMESPACE}-vsystem-application-runtime-storage persistent volume. Otherwise a re-installation to the same namespace will fail.

5.1.2. Manual uninstallation

The installation script allows for project clean-up (2.7) / (2.6) where all the deployed pods, persistent volume claims, secrets, etc. are deleted. If the manual installation method was used, the SDH can be uninstalled using the same script and different set of parameters. The snippet below is an example where the SDH installation resides in the project sdh. In addition to running this script, the project needs to be deleted as well if the same project shall host the new installation.

# ./install.sh --delete --purge --force-deletion --namespace=sdh \
    --docker-registry=registry.local.example:5000
# oc delete project sdh
# # start the new installation

The deletion of the project often takes quite a while. Until fully uninstalled, the project will be listed as Terminating in the output of oc get project. You may speed the process up with the following command. Again please mind the namespace.

# oc delete pods --all --grace-period=0 --force --namespace sdh

NOTE: Make sure not to run the same installation script more than once at the same time even when working with different OpenShift projects.

5.2. Deploy SDH Observer

The sdh-observer observes the SDH namespace and applies fixes to deployments as they appear. It does the following:

  • adds additional emptyDir volume to vsystem-vrep Stateful Set to allow it to run on RHCOS system
  • enables the Pipeline Modeler (aka vflow) to talk to an insecure registry - needed only if the registry is insecure
  • grants fluentd pods permissions to logs
  • reconfigures the fluentd pods to parse plain text file container logs on the OCP 4 nodes
  • (optionally) marks containers manipulating iptables on RHCOS hosts as privileged when the modules are not pre-loaded and the nodes

Apart from accessing resources in sdh namespace, it also requires node-reader cluster role.

To deploy it, as a cluster-admin execute the following command in the SDH namespace before the SDH installation:

# OCPVER=4.2                 # this must match OCP minor release
# INSECURE_REGISTRY=false    # set to true if the registry is insecure
# oc process -f https://raw.githubusercontent.com/redhat-sap/sap-datahub/master/sdh-observer.yaml \
        NAMESPACE="$(oc project -q)" \
        BASE_IMAGE_TAG="${OCPVER:-4.2}" \
        MARK_REGISTRY_INSECURE=${INSECURE_REGISTRY:-0} | oc create -f -

NOTE: that the BASE_IMAGE_TAG must match one of the tags available in the quay.io/openshift/origin-cli repository. The difference between the client's minor release version and OCP server's minor release must not exceed 1.

5.3. Allow a non-root user to interact with Docker on Jump host

  • Append -G dockerroot to OPTIONS= in /etc/sysconfig/docker file on your Jump host.

  • Run the following commands on the Jump host, after you modify the /etc/sysconfig/docker file. Make sure to replace alice with your user name.

    # sudo usermod -a -G dockerroot alice
    # sudo chown root:dockerroot /var/run/docker.sock
    
  • Log out and re-log-in to the Jump host for the changes to become effective.

5.4. Grant fluentd pods permissions to logs

The diagnostics-fluentd-* pods need access to /var/log directories on nodes. For this to work. The pods need to be run as privileged. There are two steps necessary to make it happen:

  1. the ${SDH_PROJECT_NAME}-fluentd service account needs to be added on privileged scc list with the following command copied from the project setup:

    # oc project "${SDH_PROJECT_NAME}"
    # oc adm policy add-scc-to-user privileged -z "$(oc project -q)-fluentd"
    
  2. the daemonset diagnostics-fluentd need to be patched to request the privileged security context.

The recommended way to execute the second step is to deploy deploy SDH Observer. Alternatively, the patching of the daemonset can be done either before the SDH Installation (only applicable for manual installation) or afterwards.

5.4.1. Before the installation

Execute one of the patch commands below depending on the SDH version.

For SDH 2.6 during manual installation only:

# patch -p1 -B patchbaks/ -r - <<EOF
Index: foundation/deployment/helm/vora-diagnostic/templates/logging/fluentd.yaml
===================================================================
--- foundation.orig/deployment/helm/vora-diagnostic/templates/logging/fluentd.yaml
+++ foundation/deployment/helm/vora-diagnostic/templates/logging/fluentd.yaml
@@ -41,6 +41,7 @@ spec:
         - name: "FLUENT_ELASTICSEARCH_SCHEME"
           value: "http"
         securityContext:
+          privileged: true
           runAsUser: 0
           runAsNonRoot: false
         volumeMounts:
EOF

5.4.2. After the SDH installation

Execute the following command once the SDH installation is finished in the Data Hub's namespace.

# oc patch ds/diagnostics-fluentd  -p '{ "spec": { "template": { "spec": {
      "containers": [{ "name": "diagnostics-fluentd", "securityContext": { "privileged": true }}]
    }}}}'

The fluentd pods will get restarted automatically.

5.5. Marking the vflow registry as insecure

NOTE: applicable before, during or a after SDH installation.

In SAP Data Hub 2.6.x releases, it is possible to configure insecure registry for Pipeline Modeler (aka vflow pod) neither via installer nor in the UI.

The insecure registry needs to be set if the container registry listens on an insecure port (HTTP) or the communication is encrypted using a self-signed certificate.

Without the insecure registry set, kaniko builder cannot push built images into the configured registry for the Pipeline Modeler (see "Container Registry for Pipeline Modeler" Input Parameter at the official SAP Data Hub documentation (2.7) / (2.6)).

To mark the configured vflow registry as insecure, the SDH Observer needs to be deployed with MARK_REGISTRY_INSECURE=true parameter. If it is already deployed, it can be re-configured to take care of insecure registries by executing the following command in the sdh namespace:

# oc set env dc/sdh-observer MARK_REGISTRY_INSECURE=true

Once deployed, all the existing pipeline modeler pods will be patched. It will take a couple of tens of seconds until all the modified pods become available.

For more information, take a look at SDH Helpers.

5.6. Running multiple SDH instances on a single OCP cluster

Two instances of SAP Data Hub running in parallel on a single OCP cluster have been validated. Running more instances is possible, but most probably needs an extra support statement from SAP.

Please consider the following before deploying more than one SDH instance to a cluster:

  • Each SAP Data Hub instance must run in its own namespace/project.
  • Each SAP Data Hub instance must use a different prefix or container image registry for the Pipeline Modeler. Please refer to If using a Different Container Registry for SAP Data Hub Modeler for the overview of repositories relating to the Pipeline Modeler.
  • It is recommended to dedicate particular nodes to each SDH instance.
  • It is recommended to use network policy SDN mode for completely granular network isolation configuration and improved security. Check network policy configuration for further references and examples. This, however, cannot be changed post OCP installation.
  • If running the production and test (aka blue-green) SDH deployments on a single OCP cluster, mind also the following:
    • There is no way to test an upgrade of OCP cluster before an SDH upgrade.
    • The idle (non-productive) landscape should have the same network security as the live (productive) one.

To deploy a new SDH instance to OCP cluster, please repeat the steps from project setup starting from point 6 with a new project name and continue with SDH Installation.

5.7. Using AWS ECR Registry for the Modeler

Post-sdh-installation step

SAP Data Hub installer allows to specify "AWS IAM Role for Pipeline Modeler" when AWS ECR Registry is used as the external registry. However, due to a bug in Data Hub, the Modeler cannot use it. In order to use AWS ECR Registry for Data Hub, one can follow the instructions at Provide Access Credentials for a Password Protected Container Registry (2.7) / (2.6) with the following modification.

# cat >/tmp/vsystem-registry-secret.txt <<EOF
username: "AWS_ACCESS_KEY_ID"
password: "AWS_SECRET_ACCESS_KEY"
EOF

The AWS_* credentials must belong to a user that has the power-user access to the ECR registry provided by the AmazonEC2ContainerRegistryPowerUser. Please refer to Amazon ECR Repository Policies when you need fine-grained access control.

NOTE: If the same registry shall be used for install multiple SDH instances with the same registry. Make sure to use different prefix for the Pipeline Modeler. Make sure to pre-create the needed repositories which is described at If using a Different Container Registry for SAP Data Hub Modeler

5.8. Running SDH pods on particular nodes

Due to shortcomings in SDH's installer, the validation of SDH installation fails if its daemonsets are not deployed to all the nodes in the cluster.
Therefor, the installation should be executed without a restriction on nodes. After the installation is done, the pods can be re-scheduled to desired nodes like this:

  1. choose a label to apply to the SAP Data Hub project and the desired nodes (e.g. run-sdh-project=sdhblue)

  2. label the desired nodes (in this example worker1, worker2, worker3 and worker4)

    # for node in worker{1,2,3,4}; do oc label node/$node run-sdh-project=sdhblue; done
    
  3. set the project node selector of the sdhblue namespace to match the label

    # oc patch namespace sdhblue -p '{"metadata":{"annotations":{"openshift.io/node-selector":"run-sdh-project=sdhblue"}}}'
    
  4. evacuate the pods from all the other nodes by killing them (requires jq utility installed)

    # oc project sdhblue                    # switch to the SDH project
    # label="run-sdh-project=sdhblue"       # set the chosen label
    # nodeNames="$(oc get nodes -o json | jq -c '[.items[] |
        select(.metadata.labels["'"${label%=*}"'"] == "'"${label#*=}"'") | .metadata.name]')"
    # oc get pods -o json | jq -r '.items[] | . as $pod |
        select(('"$nodeNames"' | all(. != $pod.spec.nodeName))) | "pod/\(.metadata.name)"' | xargs -r oc delete
    

NOTE: Please make sure the Data Hub instance is not being used because killing its pods will cause a downtime.

The pods will be re-launched on the nodes labeled with run-sdh-project=sdhblue. It may take several minutes before the SDH becomes available again.

6. Troubleshooting Tips

6.1. Installation or Upgrade problems

6.1.1. Vora Installation Error: timeout at “Deploying vora-consul”

Vora Installation Error: timeout at "Deploying vora-consul with: helm install --namespace vora -f values.yaml ..."

To view the log messages, you can login to the OpenShift web console, navigate to Applications -> Pods, select the failing pod e.g. vora-consul-2-0, and check the log under the Events tab.

A common error: if the external image registry is insecure, but the OpenShift cluster is configured to pull from a secure registry, you will see errors in the log. If secure registry is not feasible, follow the instructions on configuring the registry as insecure.

6.1.2. Privileged security context unassigned

If there are pods, replicasets, or statefulsets not coming up and you can see an event similar to the one below, you need to add privileged security context constraint to its service account.

# oc get events | grep securityContext
1m          32m          23        diagnostics-elasticsearch-5b5465ffb.156926cccbf56887                          ReplicaSet                                                                            Warning   FailedCreate             replicaset-controller                  Error creating: pods "diagnostics-elasticsearch-5b5465ffb-" is forbidden: unable to validate against any security context constraint: [spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]

Copy the name in the fourth column (the event name - diagnostics-elasticsearch-5b5465ffb.156926cccbf56887) and determine its corresponding service account name.

# eventname="diagnostics-elasticsearch-5b5465ffb.156926cccbf56887"
# oc get -o go-template=$'{{with .spec.template.spec.serviceAccountName}}{{.}}{{else}}default{{end}}\n' \
    "$(oc get events "${eventname}" -o jsonpath=$'{.involvedObject.kind}/{.involvedObject.name}\n')"
sdh-elasticsearch

The obtained service account name (sdh-elasticsearch) now needs to be assigned privileged scc:

# oc adm policy add-scc-to-user privileged -z sdh-elasticsearch

The pod then shall come up on its own unless this was the only problem.

6.1.3. No Default Storage Class set

If pods are failing because because of PVCs not being bound, the problem may be that the default storage class has not been set and no storage class was specified to the installer.

# oc get pods
NAME                                                  READY     STATUS    RESTARTS   AGE
hana-0                                                0/1       Pending   0          45m
vora-consul-0                                         0/1       Pending   0          45m
vora-consul-1                                         0/1       Pending   0          45m
vora-consul-2                                         0/1       Pending   0          45m

# oc describe pvc data-hana-0
Name:          data-hana-0
Namespace:     sdh
StorageClass:
Status:        Pending
Volume:
Labels:        app=vora
               datahub.sap.com/app=hana
               vora-component=hana
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
  Type    Reason         Age                  From                         Message
  ----    ------         ----                 ----                         -------
  Normal  FailedBinding  47s (x126 over 30m)  persistentvolume-controller  no persistent volumes available for this claim and no storage class is set

To fix this, either make sure to set the Default StorageClass or provide the storage class name to the installer. For the manual installation, that would be ./install.sh --pv-storage-class STORAGECLASS.

6.1.4. Checkpoint store validation

If you see the following error during the installation for the checkpoint store validation, it means the S3 bucket or the given directory does not exist. Make sure to create them first.

2018-12-05T06:30:17-0500 [INFO] Validating checkpoint store...
2018-12-05T06:30:17-0500 [INFO] Checking connection...
2018-12-05T06:30:42-0500 [INFO] AFSI CLI ouput:
2018-12-05T06:30:42-0500 [INFO] Unknown error when executing operation: Couldn't open URL : Cannot open connection: file/directory 'bucket1/dir' does not exist.
pod sdh/checkpoint-store-administration terminated (Error)
2018-12-05T06:30:42-0500 [ERROR] Connection check failed!
2018-12-05T06:30:42-0500 [ERROR] Checkpoint store validation failed!

2018-12-05T06:30:42-0500 [ERROR] Please reconfigure your checkpoint store connection...

6.1.5. vsystem-app pods not coming up

If you have SELinux in enforcing mode you may see the pods launched by vsystem crash-looping because of the container named vsystem-iptables like this:

# oc get pods
NAME                                                          READY     STATUS             RESTARTS   AGE
auditlog-59b4757cb9-ccgwh                                     1/1       Running            0          40m
datahub-app-db-gzmtb-67cd6c56b8-9sm2v                         2/3       CrashLoopBackOff   11         34m
datahub-app-db-tlwkg-5b5b54955b-bb67k                         2/3       CrashLoopBackOff   10         30m
...
internal-comm-secret-gen-nd7d2                                0/1       Completed          0          36m
license-management-gjh4r-749f4bd745-wdtpr                     2/3       CrashLoopBackOff   11         35m
shared-k98sh-7b8f4bf547-2j5gr                                 2/3       CrashLoopBackOff   4          2m
...
vora-tx-lock-manager-7c57965d6c-rlhhn                         2/2       Running            3          40m
voraadapter-lsvhq-94cc5c564-57cx2                             2/3       CrashLoopBackOff   11         32m
voraadapter-qkzrx-7575dcf977-8x9bt                            2/3       CrashLoopBackOff   11         35m
vsystem-5898b475dc-s6dnt                                      2/2       Running            0          37m

When you inspect one of those pods, you can see an error message similar to the one below:

# oc logs voraadapter-lsvhq-94cc5c564-57cx2 -c vsystem-iptables
2018-12-06 11:45:16.463220|+0000|INFO |Execute: iptables -N VSYSTEM-AGENT-PREROUTING -t nat||vsystem|1|execRule|iptables.go(56)
2018-12-06 11:45:16.465087|+0000|INFO |Output: iptables: Chain already exists.||vsystem|1|execRule|iptables.go(62)
Error: exited with status: 1
Usage:
  vsystem iptables [flags]

Flags:
  -h, --help               help for iptables
      --no-wait            Exit immediately after applying the rules and don't wait for SIGTERM/SIGINT.
      --rule stringSlice   IPTables rule which should be applied. All rules must be specified as string and without the iptables command.

And in the audit log on the node, where the pod got scheduled, you should be able to find an AVC denial similar to the following. On RHCOS nodes, you may need to inspect the output of dmesg command instead.

# grep 'denied.*iptab' /var/log/audit/audit.log
type=AVC msg=audit(1544115868.568:15632): avc:  denied  { module_request } for  pid=54200 comm="iptables" kmod="ipt_REDIRECT" scontext=system_u:system_r:container_t:s0:c826,c909 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
...
# # on RHCOS
# dmesg | grep denied

To fix this, the ipt_REDIRECT kernel module needs to be loaded. Please refer to Pre-load needed kernel modules.

6.1.6. Fluentd pods cannot access /var/log

If you see errors like shown below in the logs of fluentd pods, make sure to follow the Grant fluentd pods permissions to logs to fix the problem.

# oc logs $(oc get pods -o name -l datahub.sap.com/app-component=fluentd | head -n 1) | tail -n 20
2019-04-15 18:53:24 +0000 [error]: unexpected error error="Permission denied @ rb_sysopen - /var/log/es-containers-sdh25-mortal-garfish.log.pos"
  2019-04-15 18:53:24 +0000 [error]: suppressed same stacktrace
  2019-04-15 18:53:25 +0000 [warn]: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp
  2019-04-15 18:53:26 +0000 [error]: unexpected error error_class=Errno::EACCES error="Permission denied @ rb_sysopen - /var/log/es-containers-sdh25-mortal-garfish.log.pos"
  2019-04-15 18:53:26 +0000 [error]: /usr/lib64/ruby/gems/2.5.0/gems/fluentd-0.14.8/lib/fluent/plugin/in_tail.rb:151:in `initialize'
  2019-04-15 18:53:26 +0000 [error]: /usr/lib64/ruby/gems/2.5.0/gems/fluentd-0.14.8/lib/fluent/plugin/in_tail.rb:151:in `open'
...

6.1.7. License Manager cannot be initialized

The installation may fail with the following error.

2019-07-22T15:07:29+0000 [INFO] Initializing system tenant...
2019-07-22T15:07:29+0000 [INFO] Initializing License Manager in system tenant...2019-07-22T15:07:29+0000 [ERROR] Couldn't start License Manager!
The response: {"status":500,"code":{"component":"router","value":8},"message":"Internal Server Error: see logs for more info"}Error: http status code 500 Internal Server Error (500)
2019-07-22T15:07:29+0000 [ERROR] Failed to initialize vSystem, will retry in 30 sec...

In the log of license management pod, you can find an error like this:

# oc logs deploy/license-management-l4rvh
Found 2 pods, using pod/license-management-l4rvh-74595f8c9b-flgz9
+ iptables -D PREROUTING -t nat -j VSYSTEM-AGENT-PREROUTING
+ true
+ iptables -F VSYSTEM-AGENT-PREROUTING -t nat
+ true
+ iptables -X VSYSTEM-AGENT-PREROUTING -t nat
+ true
+ iptables -N VSYSTEM-AGENT-PREROUTING -t nat
iptables v1.6.2: can't initialize iptables table `nat': Permission denied
Perhaps iptables or your kernel needs to be upgraded.

This means, the vsystem-iptables container in the pod lacks permissions to manipulate iptables. It needs to be marked as privileged. Please follow the appendix Deploy SDH Observer and restart the installation.

6.2. Validation errors

May happen during the validation phase initiated by running the SDH installation script with the validate flag:

# ./install.sh --validate --namespace=sdh

6.2.1. Diagnostics Prometheus Node Exporter pods not starting

During an installation or upgrade, it may happen, that the Node Exporter pods keep restarting:

# oc get pods  | grep node-exporter
diagnostics-prometheus-node-exporter-5rkm8                        0/1       CrashLoopBackOff   6          8m
diagnostics-prometheus-node-exporter-hsww5                        0/1       CrashLoopBackOff   6          8m
diagnostics-prometheus-node-exporter-jxxpn                        0/1       CrashLoopBackOff   6          8m
diagnostics-prometheus-node-exporter-rbw82                        0/1       CrashLoopBackOff   7          8m
diagnostics-prometheus-node-exporter-s2jsz                        0/1       CrashLoopBackOff   6          8m

The validation will fail like this:

2019-08-15T15:05:01+0200 [INFO] Validating...
2019-08-15T15:05:01+0200 [INFO] Running validation for vora-cluster...OK!
2019-08-15T15:05:51+0200 [INFO] Running validation for vora-vsystem...OK!
2019-08-15T15:05:57+0200 [INFO] Running validation for vora-diagnostic...2019-08-15T15:11:06+0200 [ERROR] Failed! Please see the validation logs -> /root/wsp/clust/foundation/logs/20190815_150455/vora-diagnostic_validation_log.txt
...
2019-08-15T15:11:35+0200 [ERROR] There is a failed validation. Exiting...

# cat /root/wsp/clust/foundation/logs/20190815_150455/vora-diagnostic_validation_log.txt
2019-08-15T15:05:57+0200 [INFO] Start diagnostics readiness checks for namespace sdhup
2019-08-15T15:05:57+0200 [INFO] Check readiness of daemonset diagnostics-fluentd ... ok
2019-08-15T15:05:58+0200 [INFO] Check readiness of daemonset diagnostics-prometheus-node-exporter ............. failed
2019-08-15T15:11:06+0200 [ERROR] daemonset diagnostics-prometheus-node-exporter not ready: found 2/5 ready pods

The possible reason is that the limits on resource consumption set on the pods are too low. To address this post-installation, you can patch the daemonset like this (in the SDH's namespace):

# oc patch -p '{"spec": {"template": {"spec": {"containers": [
    { "name": "diagnostics-prometheus-node-exporter",
      "resources": {"limits": {"cpu": "200m", "memory": "100M"}}
    }]}}}}' ds/diagnostics-prometheus-node-exporter

To address this during the installation (using any installation method), add the following parameters:

-e=vora-diagnostics.resources.prometheusNodeExporter.resources.limits.cpu=200m
-e=vora-diagnostics.resources.prometheusNodeExporter.resources.limits.memory=100M

And then restart the validation (using the manual method) like this:

# ./install.sh --validate -n=sdh

6.3. Runtime troubleshooting

6.3.1. Pipeline Modeler troubleshooting

6.3.1.1. Graphs cannot be run in the Pipeline Modeler

If in the log of the vflow pod, there are problems with reaching outside of the private network like the following output shows, make sure to verify your proxy settings and make sure that the installation script is run with the following parameters:

# ./install.sh \
    --cluster-http-proxy="${HTTP_PROXY}" \
    --cluster-https-proxy="${HTTPS_PROXY}" \
    --cluster-no-proxy="${NO_PROXY}"

vflow log can be displayed with a command like oc logs $(oc get pods -o name -l vora-component=vflow | head -n 1):

W: Failed to fetch http://deb.debian.org/debian/dists/stretch/InRelease  Could not connect to deb.debian.org:80 (5.153.231.4), connection timed out [IP: 5.153.231.4 80]
W: Failed to fetch http://security.debian.org/debian-security/dists/stretch/updates/InRelease  Could not connect to security.debian.org:80 (217.196.149.233), connection timed out [IP: 217.196.149.233 80]
W: Failed to fetch http://deb.debian.org/debian/dists/stretch-updates/InRelease  Unable to connect to deb.debian.org:http: [IP: 5.153.231.4 80]
6.3.1.2. Pipeline Modeler cannot push images to the registry

If the SDH is configured to build images with kaniko and the vflow registry is not configured with a certificate signed by a trusted certificate authority, the builder will not be able to push the built images there. The Pipeline Modeler will then label the graphs as dead with a message like the following:

failed to prepare graph description: failed to prepare image: build failed for image: internal-registry.example.org:5000/vora/vflow-node-482f9340ff573d1a7a03108d18556792bb70ae2a:com.sap.debian

To determine the cause, the log of the vflow pod needs to be inspected. There, you can notice the root issue - in this case it is the insecure registry internal-registry.example.org:5000 accessible only via HTTP protocol.

# oc logs $(oc get pods -o name -l vora-component=vflow | head -n 1)
...
INFO[0019] Using files from context: [/workspace/vflow]
INFO[0019] COPY /vflow /vflow
INFO[0019] Taking snapshot of files...
INFO[0023] ENTRYPOINT ["/vflow"]
error pushing image: failed to push to destination internal-registry.example.org:5000/vora/vflow-node-482f9340ff573d1a7a03108d18556792bb70ae2a:com.sap.debian: Get https://internal-registry.example.org:5000/v2/: http: server gave HTTP response to HTTPS client |vflow|container|192|getPodLogs|build.go(126)
...

To resolve it, you can:

6.3.1.3. Modeler does not run when AWS ECR registry is used

If the initialization of the vflow pod fails with a message like the one below, your SDH deployment suffers from a bug that prevents it from using the AWS IAM Role for authentication against the AWS ECR Registry.

# oc logs $(oc get pods -o name -l vora-component=vflow | head -n 1)
....
2019-07-15 12:23:03.147231|+0000|INFO |Statistics Publisher started with publication interval 30s ms|vflow|statistic|38|loop|statistics_monitor.go(89)
2019-07-15 12:23:30.446482|+0000|INFO |connecting to vrep at vsystem-vrep.sdh:8738|vflow|container|1|NewImageFactory|factory.go(131)
2019-07-15 12:23:30.446993|+0000|INFO |Creating AWS ECR Repository 'sdh26/vora/vflow-node-482f9340ff573d1a7a03108d18556792bb70ae2a'|vflow|container|1|assertRepositoryExists|ecr.go(106)
2019-07-15 12:23:35.001030|+0000|ERROR|API node execution is failed: cannot instantiate docker registry client: failed to assert repository existance: Error creating AWS ECR repository 'sdh26/vora/vflow-node-482f9340ff573d1a7a03108d18556792bb70ae2a': NoCredentialProviders: no valid providers in chain. Deprecated.
        For verbose messaging see aws.Config.CredentialsChainVerboseErrors
failed to create image factory
main.runMaster
        /data/xmake/prod-build7010/w/velocity/.../vflow/src/cmd/vflow/main.go:386
main.main
        /data/xmake/prod-build7010/w/velocity/.../vflow/src/cmd/vflow/main.go:357
runtime.main
        /data/xmake/tools/xmake-tools/FA/org.golang.download.go/go/1.11.4-bin/go/src/runtime/proc.go:201
runtime.goexit
        /data/xmake/tools/xmake-tools/FA/org.golang.download.go/go/1.11.4-bin/go/src/runtime/asm_amd64.s:1333|vflow|vflow|1|main|main.go(359)

The work-around is to use a registry pull secret.

6.3.2. Fluentd pods cannot parse container logs on nodes

If no container logs are visible in SDH's Kibana, most probably the fluentd pods cannot parse logs on the nodes. The default logging format on OpenShift 4 is plain text while SDH's fluentd pods are configured to parse json. SDH Observer can be deployed to apply the needed configuration changes.


  1. The environment variable $KUBECONFIG shall be set instead. ↩︎

  2. This setting assumes that all Data Hub services are accessed under the same name using NodePort. However, using OpenShift Ingress Controller, each service will be assigned a different hostname. Therefor, for production environment, it is necessary to provide signed certificates for these routes. You may consider configuring a custom wildcard certificate for master default subdomain↩︎

  3. Only available for the manual installation. ↩︎

Comments