Chapter 7. Configuring your cluster logging deployment

7.1. About configuring cluster logging

After installing cluster logging into your OpenShift Container Platform cluster, you can make the following configurations.

Note

You must set cluster logging to Unmanaged state before performing these configurations, unless otherwise noted. For more information, see Changing the cluster logging management state.

Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades. For more information, see Support policy for unmanaged Operators.

7.1.1. About deploying and configuring cluster logging

OpenShift Container Platform cluster logging is designed to be used with the default configuration, which is tuned for small to medium sized OpenShift Container Platform clusters.

The installation instructions that follow include a sample ClusterLogging custom resource (CR), which you can use to create a cluster logging instance and configure your cluster logging deployment.

If you want to use the default cluster logging install, you can use the sample CR directly.

If you want to customize your deployment, make changes to the sample CR as needed. The following describes the configurations you can make when installing your cluster logging instance or modify after installation. See the Configuring sections for more information on working with each component, including modifications you can make outside of the ClusterLogging custom resource.

7.1.1.1. Configuring and Tuning Cluster Logging

You can configure your cluster logging environment by modifying the ClusterLogging custom resource deployed in the openshift-logging project.

You can modify any of the following components upon install or after install:

Memory and CPU
You can adjust both the CPU and memory limits for each component by modifying the resources block with valid memory and CPU values:
spec:
  logStore:
    elasticsearch:
      resources:
        limits:
          cpu:
          memory: 16Gi
        requests:
          cpu: 500m
          memory: 16Gi
      type: "elasticsearch"
  collection:
    logs:
      fluentd:
        resources:
          limits:
            cpu:
            memory:
          requests:
            cpu:
            memory:
        type: "fluentd"
  visualization:
    kibana:
      resources:
        limits:
          cpu:
          memory:
        requests:
          cpu:
          memory:
     type: kibana
  curation:
    curator:
      resources:
        limits:
          memory: 200Mi
        requests:
          cpu: 200m
          memory: 200Mi
      type: "curator"
Elasticsearch storage
You can configure a persistent storage class and size for the Elasticsearch cluster using the storageClass name and size parameters. The Cluster Logging Operator creates a PersistentVolumeClaim for each data node in the Elasticsearch cluster based on these parameters.
  spec:
    logStore:
      type: "elasticsearch"
      elasticsearch:
        nodeCount: 3
        storage:
          storageClassName: "gp2"
          size: "200G"

This example specifies each data node in the cluster will be bound to a PersistentVolumeClaim that requests "200G" of "gp2" storage. Each primary shard will be backed by a single replica.

Note

Omitting the storage block results in a deployment that includes ephemeral storage only.

  spec:
    logStore:
      type: "elasticsearch"
      elasticsearch:
        nodeCount: 3
        storage: {}
Elasticsearch replication policy

You can set the policy that defines how Elasticsearch shards are replicated across data nodes in the cluster:

  • FullRedundancy. The shards for each index are fully replicated to every data node.
  • MultipleRedundancy. The shards for each index are spread over half of the data nodes.
  • SingleRedundancy. A single copy of each shard. Logs are always available and recoverable as long as at least two data nodes exist.
  • ZeroRedundancy. No copies of any shards. Logs may be unavailable (or lost) in the event a node is down or fails.
Curator schedule
You specify the schedule for Curator in the cron format.
  spec:
    curation:
    type: "curator"
    resources:
    curator:
      schedule: "30 3 * * *"

7.1.1.2. Sample modified ClusterLogging custom resource

The following is an example of a ClusterLogging custom resource modified using the options previously described.

Sample modified ClusterLogging custom resource

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    elasticsearch:
      nodeCount: 3
      resources:
        limits:
          memory: 32Gi
        requests:
          cpu: 3
          memory: 32Gi
      storage: {}
      redundancyPolicy: "SingleRedundancy"
  visualization:
    type: "kibana"
    kibana:
      resources:
        limits:
          memory: 1Gi
        requests:
          cpu: 500m
          memory: 1Gi
      replicas: 1
  curation:
    type: "curator"
    curator:
      resources:
        limits:
          memory: 200Mi
        requests:
          cpu: 200m
          memory: 200Mi
      schedule: "*/5 * * * *"
  collection:
    logs:
      type: "fluentd"
      fluentd:
        resources:
          limits:
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 1Gi

7.2. Changing cluster logging management state

In order to modify certain components managed by the Cluster Logging Operator or the Elasticsearch Operator, you must set the operator to the unmanaged state.

In unmanaged state, the operators do not respond to changes in the CRs. The administrator assumes full control of individual component configurations and upgrades when in unmanaged state.

Important

Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades. For more information, see Support policy for unmanaged Operators.

In managed state, the Cluster Logging Operator (CLO) responds to changes in the ClusterLogging custom resource (CR) and adjusts the logging deployment accordingly.

The OpenShift Container Platform documentation indicates in a prerequisite step when you must set the OpenShift Container Platform cluster to Unmanaged.

Note

If you set the Elasticsearch Operator (EO) to unmanaged and leave the Cluster Logging Operator (CLO) as managed, the CLO will revert changes you make to the EO, as the EO is managed by the CLO.

7.2.1. Changing the cluster logging management state

You must set the operator to the unmanaged state in order to modify the components managed by the Cluster Logging Operator:

  • the Curator CronJob,
  • the Elasticsearch CR,
  • the Kibana Deployment,
  • the log collector DaemonSet.

If you make changes to these components in managed state, the Cluster Logging Operator reverts those changes.

Note

An unmanaged cluster logging environment does not receive updates until you return the Cluster Logging Operator to Managed state.

Prerequisites

  • The Cluster Logging Operator must be installed.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    $ oc edit ClusterLogging instance
    
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
      managementState: "Managed" 1
    1
    Specify the management state as Managed or Unmanaged.

7.2.2. Changing the Elasticsearch management state

You must set the operator to the unmanaged state in order to modify the Elasticsearch deployment files, which are managed by the Elasticsearch Operator.

If you make changes to these components in managed state, the Elasticsearch Operator reverts those changes.

Note

An unmanaged Elasticsearch cluster does not receive updates until you return the Elasticsearch Operator to Managed state.

Prerequisite

  • The Elasticsearch Operator must be installed.
  • Have the name of the Elasticsearch CR, in the openshift-logging project:

    $ oc get -n openshift-logging Elasticsearch
    NAME            AGE
    elasticsearch   28h

Procedure

Edit the Elasticsearch(CR) in the openshift-logging project:

$ oc edit Elasticsearch elasticsearch

apiVersion: logging.openshift.io/v1
kind: Elasticsearch
metadata:
  name: elasticsearch


....

spec:
  managementState: "Managed" 1
1
Specify the management state as Managed or Unmanaged.
Note

If you set the Elasticsearch Operator (EO) to unmanaged and leave the Cluster Logging Operator (CLO) as managed, the CLO will revert changes you make to the EO, as the EO is managed by the CLO.

7.3. Configuring cluster logging

Cluster logging is configurable using a ClusterLogging custom resource (CR) deployed in the openshift-logging project.

The Cluster Logging Operator watches for changes to Cluster Logging CRs, creates any missing logging components, and adjusts the logging deployment accordingly.

The Cluster Logging CR is based on the ClusterLogging custom resource Definition (CRD), which defines a complete cluster logging deployment and includes all the components of the logging stack to collect, store and visualize logs.

Sample ClusterLogging custom resource (CR)

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  creationTimestamp: '2019-03-20T18:07:02Z'
  generation: 1
  name: instance
  namespace: openshift-logging
spec:
  collection:
    logs:
      fluentd:
        resources: null
      type: fluentd
  curation:
    curator:
      resources: null
      schedule: 30 3 * * *
    type: curator
  logStore:
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: SingleRedundancy
      resources:
        limits:
          cpu:
          memory:
        requests:
          cpu:
          memory:
      storage: {}
    type: elasticsearch
  managementState: Managed
  visualization:
    kibana:
      proxy:
        resources: null
      replicas: 1
      resources: null
    type: kibana

You can configure the following for cluster logging:

  • You can place cluster logging into an unmanaged state that allows an administrator to assume full control of individual component configurations and upgrades.
  • You can overwrite the image for each cluster logging component by modifying the appropriate environment variable in the cluster-logging-operator Deployment.
  • You can specify specific nodes for the logging components using node selectors.

7.3.1. Understanding the cluster logging component images

There are several components in cluster logging, each one implemented with one or more images. Each image is specified by an environment variable defined in the cluster-logging-operator deployment in the openshift-logging project and should not be changed.

You can view the images by running the following command:

$ oc -n openshift-logging set env deployment/cluster-logging-operator --list | grep _IMAGE
ELASTICSEARCH_IMAGE=registry.redhat.io/openshift4/ose-logging-elasticsearch5:v4.3 1
FLUENTD_IMAGE=registry.redhat.io/openshift4/ose-logging-fluentd:v4.3 2
KIBANA_IMAGE=registry.redhat.io/openshift4/ose-logging-kibana5:v4.3 3
CURATOR_IMAGE=registry.redhat.io/openshift4/ose-logging-curator5:v4.3 4
OAUTH_PROXY_IMAGE=registry.redhat.io/openshift4/ose-oauth-proxy:v4.3 5
1
ELASTICSEARCH_IMAGE deploys Elasticsearch.
2
FLUENTD_IMAGE deploys Fluentd.
3
KIBANA_IMAGE deploys Kibana.
4
CURATOR_IMAGE deploys Curator.
5
OAUTH_PROXY_IMAGE defines OAUTH for OpenShift Container Platform.

The values might be different depending on your environment.

Important

The logging routes are managed by the Cluster Logging Operator and cannot be modified by the user.

7.4. Configuring Elasticsearch to store and organize log data

OpenShift Container Platform uses Elasticsearch (ES) to store and organize the log data.

Some of the modifications you can make to your log store include:

  • storage for your Elasticsearch cluster;
  • how shards are replicated across data nodes in the cluster, from full replication to no replication;
  • allowing external access to Elasticsearch data.
Note

Scaling down Elasticsearch nodes is not supported. When scaling down, Elasticsearch pods can be accidentally deleted, possibly resulting in shards not being allocated and replica shards being lost.

Elasticsearch is a memory-intensive application. Each Elasticsearch node needs 16G of memory for both memory requests and limits, unless you specify otherwise in the ClusterLogging custom resource. The initial set of OpenShift Container Platform nodes might not be large enough to support the Elasticsearch cluster. You must add additional nodes to the OpenShift Container Platform cluster to run with the recommended or higher memory.

Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production environments.

Note

If you set the Elasticsearch Operator (EO) to unmanaged and leave the Cluster Logging Operator (CLO) as managed, the CLO will revert changes you make to the EO, as the EO is managed by the CLO.

7.4.1. Configuring Elasticsearch CPU and memory requests

Each component specification allows for adjustments to both the CPU and memory requests. You should not have to manually adjust these values as the Elasticsearch Operator sets values sufficient for your environment.

Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments. For production use, you should have no less than the default 16Gi allocated to each pod. Preferably you should allocate as much as possible, up to 64Gi per pod.

Prerequisites

  • Cluster logging and Elasticsearch must be installed.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    ....
    spec:
        logStore:
          type: "elasticsearch"
          elasticsearch:
            resources: 1
              limits:
                memory: 16Gi
              requests:
                cpu: 500m
                memory: 16Gi
    1
    Specify the CPU and memory requests as needed. If you leave these values blank, the Elasticsearch Operator sets default values that should be sufficient for most deployments.

    If you adjust the amount of Elasticsearch memory, you must change both the request value and the limit value.

    For example:

          resources:
            limits:
              memory: "32Gi"
            requests:
              cpu: "8"
              memory: "32Gi"

    Kubernetes generally adheres the node configuration and does not allow Elasticsearch to use the specified limits. Setting the same value for the requests and limits ensures that Elasticsearch can use the memory you want, assuming the node has the CPU and memory available.

7.4.2. Configuring Elasticsearch replication policy

You can define how Elasticsearch shards are replicated across data nodes in the cluster.

Prerequisites

  • Cluster logging and Elasticsearch must be installed.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit clusterlogging instance
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
      logStore:
        type: "elasticsearch"
        elasticsearch:
          redundancyPolicy: "SingleRedundancy" 1
    1
    Specify a redundancy policy for the shards. The change is applied upon saving the changes.
    • FullRedundancy. Elasticsearch fully replicates the primary shards for each index to every data node. This provides the highest safety, but at the cost of the highest amount of disk required and the poorest performance.
    • MultipleRedundancy. Elasticsearch fully replicates the primary shards for each index to half of the data nodes. This provides a good tradeoff between safety and performance.
    • SingleRedundancy. Elasticsearch makes one copy of the primary shards for each index. Logs are always available and recoverable as long as at least two data nodes exist. Better performance than MultipleRedundancy, when using 5 or more nodes. You cannot apply this policy on deployments of single Elasticsearch node.
    • ZeroRedundancy. Elasticsearch does not make copies of the primary shards. Logs might be unavailable or lost in the event a node is down or fails. Use this mode when you are more concerned with performance than safety, or have implemented your own disk/PVC backup/restore strategy.
Note

The number of primary shards for the index templates is equal to the number of Elasticsearch data nodes.

7.4.3. Configuring Elasticsearch storage

Elasticsearch requires persistent storage. The faster the storage, the faster the Elasticsearch performance.

Warning

Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.

Prerequisites

  • Cluster logging and Elasticsearch must be installed.

Procedure

  1. Edit the ClusterLogging CR to specify that each data node in the cluster is bound to a Persistent Volume Claim.

    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
     spec:
        logStore:
          type: "elasticsearch"
          elasticsearch:
            nodeCount: 3
            storage:
              storageClassName: "gp2"
              size: "200G"

This example specifies each data node in the cluster is bound to a Persistent Volume Claim that requests "200G" of AWS General Purpose SSD (gp2) storage.

7.4.4. Configuring Elasticsearch for emptyDir storage

You can use emptyDir with Elasticsearch, which creates an ephemeral deployment in which all of a pod’s data is lost upon restart.

Note

When using emptyDir, if Elasticsearch is restarted or redeployed, you will lose data.

Prerequisites

  • Cluster logging and Elasticsearch must be installed.

Procedure

  1. Edit the ClusterLogging CR to specify emptyDir:

     spec:
        logStore:
          type: "elasticsearch"
          elasticsearch:
            nodeCount: 3
            storage: {}

7.4.5. Exposing Elasticsearch as a route

By default, Elasticsearch deployed with cluster logging is not accessible from outside the logging cluster. You can enable a route with re-encryption termination for external access to Elasticsearch for those tools that access its data.

Externally, you can access Elasticsearch by creating a reencrypt route, your OpenShift Container Platform token and the installed Elasticsearch CA certificate. Then, access an Elasticsearch node with a cURL request that contains:

Internally, you can access Elastiscearch using the Elasticsearch cluster IP:

You can get the Elasticsearch cluster IP using either of the following commands:

$ oc get service elasticsearch -o jsonpath={.spec.clusterIP} -n openshift-logging

172.30.183.229
oc get service elasticsearch -n openshift-logging

NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
elasticsearch   ClusterIP   172.30.183.229   <none>        9200/TCP   22h

$ oc exec elasticsearch-cdm-oplnhinv-1-5746475887-fj2f8 -n openshift-logging -- curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://172.30.183.229:9200/_cat/health"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    29  100    29    0     0    108      0 --:--:-- --:--:-- --:--:--   108

Prerequisites

  • Cluster logging and Elasticsearch must be installed.
  • You must have access to the project in order to be able to access to the logs.

Procedure

To expose Elasticsearch externally:

  1. Change to the openshift-logging project:

    $ oc project openshift-logging
  2. Extract the CA certificate from Elasticsearch and write to the admin-ca file:

    $ oc extract secret/elasticsearch --to=. --keys=admin-ca
    
    admin-ca
  3. Create the route for the Elasticsearch service as a YAML file:

    1. Create a YAML file with the following:

      apiVersion: route.openshift.io/v1
      kind: Route
      metadata:
        name: elasticsearch
        namespace: openshift-logging
      spec:
        host:
        to:
          kind: Service
          name: elasticsearch
        tls:
          termination: reencrypt
          destinationCACertificate: | 1
      1
      Add the Elasticsearch CA certifcate or use the command in the next step. You do not have to set the spec.tls.key, spec.tls.certificate, and spec.tls.caCertificate parameters required by some reencrypt routes.
    2. Run the following command to add the Elasticsearch CA certificate to the route YAML you created:

      $ cat ./admin-ca | sed -e "s/^/      /" >> <file-name>.yaml
    3. Create the route:

      $ oc create -f <file-name>.yaml
      
      route.route.openshift.io/elasticsearch created
  4. Check that the Elasticsearch service is exposed:

    1. Get the token of this ServiceAccount to be used in the request:

      $ token=$(oc whoami -t)
    2. Set the elasticsearch route you created as an environment variable.

      $ routeES=`oc get route elasticsearch -o jsonpath={.spec.host}`
    3. To verify the route was successfully created, run the following command that accesses Elasticsearch through the exposed route:

      $ curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}/.operations.*/_search?size=1" | jq

      The response appears similar to the following:

        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100   944  100   944    0     0     62      0  0:00:15  0:00:15 --:--:--   204
      {
        "took": 441,
        "timed_out": false,
        "_shards": {
          "total": 3,
          "successful": 3,
          "skipped": 0,
          "failed": 0
        },
        "hits": {
          "total": 89157,
          "max_score": 1,
          "hits": [
            {
              "_index": ".operations.2019.03.15",
              "_type": "com.example.viaq.common",
              "_id": "ODdiNWIyYzAtMjg5Ni0TAtNWE3MDY1MjMzNTc3",
              "_score": 1,
              "_source": {
                "_SOURCE_MONOTONIC_TIMESTAMP": "673396",
                "systemd": {
                  "t": {
                    "BOOT_ID": "246c34ee9cdeecb41a608e94",
                    "MACHINE_ID": "e904a0bb5efd3e36badee0c",
                    "TRANSPORT": "kernel"
                  },
                  "u": {
                    "SYSLOG_FACILITY": "0",
                    "SYSLOG_IDENTIFIER": "kernel"
                  }
                },
                "level": "info",
                "message": "acpiphp: Slot [30] registered",
                "hostname": "localhost.localdomain",
                "pipeline_metadata": {
                  "collector": {
                    "ipaddr4": "10.128.2.12",
                    "ipaddr6": "fe80::xx:xxxx:fe4c:5b09",
                    "inputname": "fluent-plugin-systemd",
                    "name": "fluentd",
                    "received_at": "2019-03-15T20:25:06.273017+00:00",
                    "version": "1.3.2 1.6.0"
                  }
                },
                "@timestamp": "2019-03-15T20:00:13.808226+00:00",
                "viaq_msg_id": "ODdiNWIyYzAtMYTAtNWE3MDY1MjMzNTc3"
              }
            }
          ]
        }
      }

7.4.6. About Elasticsearch alerting rules

You can view these alerting rules in Prometheus.

AlertDescriptionSeverity

ElasticsearchClusterNotHealthy

Cluster health status has been RED for at least 2m. Cluster does not accept writes, shards may be missing or master node hasn’t been elected yet.

critical

ElasticsearchClusterNotHealthy

Cluster health status has been YELLOW for at least 20m. Some shard replicas are not allocated.

warning

ElasticsearchBulkRequestsRejectionJumps

High Bulk Rejection Ratio at node in cluster. This node may not be keeping up with the indexing speed.

warning

ElasticsearchNodeDiskWatermarkReached

Disk Low Watermark Reached at node in cluster. Shards can not be allocated to this node anymore. You should consider adding more disk space to the node.

alert

ElasticsearchNodeDiskWatermarkReached

Disk High Watermark Reached at node in cluster. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node.

high

ElasticsearchJVMHeapUseHigh

JVM Heap usage on the node in cluster is <value>

alert

AggregatedLoggingSystemCPUHigh

System CPU usage on the node in cluster is <value>

alert

ElasticsearchProcessCPUHigh

ES process CPU usage on the node in cluster is <value>

alert

7.5. Configuring Kibana

OpenShift Container Platform uses Kibana to display the log data collected by Fluentd and indexed by Elasticsearch.

You can scale Kibana for redundancy and configure the CPU and memory for your Kibana nodes.

Note

You must set cluster logging to Unmanaged state before performing these configurations, unless otherwise noted. For more information, see Changing the cluster logging management state.

Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades. For more information, see Support policy for unmanaged Operators.

7.5.1. Configure Kibana CPU and memory limits

Each component specification allows for adjustments to both the CPU and memory limits.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
        visualization:
          type: "kibana"
          kibana:
            replicas:
          resources:  1
            limits:
              memory: 1Gi
            requests:
              cpu: 500m
              memory: 1Gi
          proxy:  2
            resources:
              limits:
                memory: 100Mi
              requests:
                cpu: 100m
                memory: 100Mi
    1
    Specify the CPU and memory limits to allocate for each node.
    2
    Specify the CPU and memory limits to allocate to the Kibana proxy.

7.5.2. Scaling Kibana for redundancy

You can scale the Kibana deployment for redundancy.

..Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    $ oc edit ClusterLogging instance
    
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
        visualization:
          type: "kibana"
          kibana:
            replicas: 1 1
    1
    Specify the number of Kibana nodes.

7.5.3. Using tolerations to control the Kibana pod placement

You can control which nodes the Kibana pods run on and prevent other workloads from using those nodes by using tolerations on the pods.

You apply tolerations to the Kibana pods through the ClusterLogging custom resource (CR) and apply taints to a node through the node specification. A taint on a node is a key:value pair that instructs the node to repel all pods that do not tolerate the taint. Using a specific key:value pair that is not on other pods ensures only the Kibana pod can run on that node.

Prerequisites

  • Cluster logging and Elasticsearch must be installed.

Procedure

  1. Use the following command to add a taint to a node where you want to schedule the Kibana pod:

    $ oc adm taint nodes <node-name> <key>=<value>:<effect>

    For example:

    $ oc adm taint nodes node1 kibana=node:NoExecute

    This example places a taint on node1 that has key kibana, value node, and taint effect NoExecute. You must use the NoExecute taint effect. NoExecute schedules only pods that match the taint and remove existing pods that do not match.

  2. Edit the visualization section of the ClusterLogging custom resource (CR) to configure a toleration for the Kibana pod:

      visualization:
        type: "kibana"
        kibana:
          tolerations:
          - key: "kibana"  1
            operator: "Exists"  2
            effect: "NoExecute"  3
            tolerationSeconds: 6000 4
    1
    Specify the key that you added to the node.
    2
    Specify the Exists operator to require the key/value/effect parameters to match.
    3
    Specify the NoExecute effect.
    4
    Optionally, specify the tolerationSeconds parameter to set how long a pod can remain bound to a node before being evicted.

This toleration matches the taint created by the oc adm taint command. A pod with this toleration would be able to schedule onto node1.

7.5.4. Installing the Kibana Visualize tool

Kibana’s Visualize tab enables you to create visualizations and dashboards for monitoring container logs, allowing administrator users (cluster-admin or cluster-reader) to view logs by deployment, namespace, pod, and container.

Procedure

To load dashboards and other Kibana UI objects:

  1. If necessary, get the Kibana route, which is created by default upon installation of the Cluster Logging Operator:

    $ oc get routes -n openshift-logging
    
    NAMESPACE                  NAME                       HOST/PORT                                                            PATH     SERVICES                   PORT    TERMINATION          WILDCARD
    openshift-logging          kibana                     kibana-openshift-logging.apps.openshift.com                                   kibana                     <all>   reencrypt/Redirect   None
  2. Get the name of your Elasticsearch pods.

    $ oc get pods -l component=elasticsearch
    
    NAME                                            READY   STATUS    RESTARTS   AGE
    elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6k    2/2     Running   0          22h
    elasticsearch-cdm-5ceex6ts-2-f799564cb-l9mj7    2/2     Running   0          22h
    elasticsearch-cdm-5ceex6ts-3-585968dc68-k7kjr   2/2     Running   0          22h
  3. Create the necessary per-user configuration that this procedure requires:

    1. Log in to the Kibana dashboard as the user you want to add the dashboards to.

      https://kibana-openshift-logging.apps.openshift.com 1
      1
      Where the URL is Kibana route.
    2. If the Authorize Access page appears, select all permissions and click Allow selected permissions.
    3. Log out of the Kibana dashboard.
  4. Run the following command from the project where the pod is located using the name of any of your Elastiscearch pods:

    $ oc exec <es-pod> -- es_load_kibana_ui_objects <user-name>

    For example:

    $ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6k -- es_load_kibana_ui_objects <user-name>
Note

The metadata of the Kibana objects such as visualizations, dashboards, and so forth are stored in Elasticsearch with the .kibana.{user_hash} index format. You can obtain the user_hash using the userhash=$(echo -n $username | sha1sum | awk '{print $1}') command. By default, the Kibana shared_ops index mode enables all users with cluster admin roles to share the index, and this Kibana object metadata is saved to the .kibana index.

Any custom dashboard can be imported for a particular user either by using the import/export feature or by inserting the metadata onto the Elasticsearch index using the curl command.

7.6. Curation of Elasticsearch Data

The Elasticsearch Curator tool performs scheduled maintenance operations on a global and/or on a per-project basis. Curator performs actions based on its configuration.

The Cluster Logging Operator installs Curator and its configuration. You can configure the Curator cron schedule using the ClusterLogging custom resource and further configuration options can be found in the Curator ConfigMap, curator in the openshift-logging project, which incorporates the Curator configuration file, curator5.yaml and an OpenShift Container Platform custom configuration file, config.yaml.

OpenShift Container Platform uses the config.yaml internally to generate the Curator action file.

Optionally, you can use the action file, directly. Editing this file allows you to use any action that Curator has available to it to be run periodically. However, this is only recommended for advanced users as modifying the file can be destructive to the cluster and can cause removal of required indices/settings from Elasticsearch. Most users only must modify the Curator configuration map and never edit the action file.

7.6.1. Configuring the Curator schedule

You can specify the schedule for Curator using the cluster logging Custom Resource created by the cluster logging installation.

Prerequisites

  • Cluster logging and Elasticsearch must be installed.

Procedure

To configure the Curator schedule:

  1. Edit the ClusterLogging custom resource in the openshift-logging project:

    $ oc edit clusterlogging instance
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ...
    
      curation:
        curator:
          schedule: 30 3 * * * 1
        type: curator
    1
    Specify the schedule for Curator in cron format.
    Note

    The time zone is set based on the host node where the Curator pod runs.

7.6.2. Configuring Curator index deletion

You can configure Curator to delete Elasticsearch data based on retention settings. You can configure per-project and global settings. Global settings apply to any project not specified. Per-project settings override global settings.

Prerequisite

  • Cluster logging must be installed.

Procedure

To delete indices:

  1. Edit the OpenShift Container Platform custom Curator configuration file:

    $ oc edit configmap/curator
  2. Set the following parameters as needed:

    config.yaml: |
      project_name:
        action
          unit:value

    The available parameters are:

    Table 7.1. Project options

    Variable NameDescription

    project_name

    The actual name of a project, such as myapp-devel. For OpenShift Container Platform operations logs, use the name .operations as the project name.

    action

    The action to take, currently only delete is allowed.

    unit

    The period to use for deletion, days, weeks, or months.

    value

    The number of units.

    Table 7.2. Filter options

    Variable NameDescription

    .defaults

    Use .defaults as the project_name to set the defaults for projects that are not specified.

    .regex

    The list of regular expressions that match project names.

    pattern

    The valid and properly escaped regular expression pattern enclosed by single quotation marks.

For example, to configure Curator to:

  • Delete indices in the myapp-dev project older than 1 day
  • Delete indices in the myapp-qe project older than 1 week
  • Delete operations logs older than 8 weeks
  • Delete all other projects indices after they are 31 days old
  • Delete indices older than 1 day that are matched by the ^project\..+\-dev.*$ regex
  • Delete indices older than 2 days that are matched by the ^project\..+\-test.*$ regex

Use:

  config.yaml: |
    .defaults:
      delete:
        days: 31

    .operations:
      delete:
        weeks: 8

    myapp-dev:
      delete:
        days: 1

    myapp-qe:
      delete:
        weeks: 1

    .regex:
      - pattern: '^project\..+\-dev\..*$'
        delete:
          days: 1
      - pattern: '^project\..+\-test\..*$'
        delete:
          days: 2
Important

When you use months as the $UNIT for an operation, Curator starts counting at the first day of the current month, not the current day of the current month. For example, if today is April 15, and you want to delete indices that are 2 months older than today (delete: months: 2), Curator does not delete indices that are dated older than February 15; it deletes indices older than February 1. That is, it goes back to the first day of the current month, then goes back two whole months from that date. If you want to be exact with Curator, it is best to use days (for example, delete: days: 30).

7.6.3. Troubleshooting Curator

You can use information in this section for debugging Curator. For example, if curator is in failed state, but the log messages do not provide a reason, you could increase the log level and trigger a new job, instead of waiting for another scheduled run of the cron job.

Prerequisites

Cluster logging and Elasticsearch must be installed.

Procedure

Enable the Curator debug log and trigger next Curator iteration manually

  1. Enable debug log of Curator:

    $ oc set env cronjob/curator CURATOR_LOG_LEVEL=DEBUG CURATOR_SCRIPT_LOG_LEVEL=DEBUG

    Specify the log level:

    • CRITICAL. Curator displays only critical messages.
    • ERROR. Curator displays only error and critical messages.
    • WARNING. Curator displays only error, warning, and critical messages.
    • INFO. Curator displays only informational, error, warning, and critical messages.
    • DEBUG. Curator displays only debug messages, in addition to all of the above.

      The default value is INFO.

      Note

      Cluster logging uses the OpenShift Container Platform custom environment variable CURATOR_SCRIPT_LOG_LEVEL in OpenShift Container Platform wrapper scripts (run.sh and convert.py). The environment variable takes the same values as CURATOR_LOG_LEVEL for script debugging, as needed.

  2. Trigger next curator iteration:

    $ oc create job --from=cronjob/curator <job_name>
  3. Use the following commands to control the CronJob:

    • Suspend a CronJob:

      $ oc patch cronjob curator -p '{"spec":{"suspend":true}}'
    • Resume a CronJob:

      $ oc patch cronjob curator -p '{"spec":{"suspend":false}}'
    • Change a CronJob schedule:

      $ oc patch cronjob curator -p '{"spec":{"schedule":"0 0 * * *"}}' 1
      1
      The schedule option accepts schedules in cron format.

7.6.4. Configuring Curator in scripted deployments

Use the information in this section if you must configure Curator in scripted deployments.

Prerequisites

  • Cluster logging and Elasticsearch must be installed.
  • Set cluster logging to the unmanaged state.

Procedure

Use the following snippets to configure Curator in your scripts:

  • For scripted deployments

    1. Create and modify the configuration:

      1. Copy the Curator configuration file and the OpenShift Container Platform custom configuration file from the Curator configuration map and create separate files for each:

        $ oc extract configmap/curator --keys=curator5.yaml,config.yaml --to=/my/config
      2. Edit the /my/config/curator5.yaml and /my/config/config.yaml files.
    2. Delete the existing Curator config map and add the edited YAML files to a new Curator config map.

      $ oc delete configmap curator ; sleep 1
      $ oc create configmap curator \
          --from-file=curator5.yaml=/my/config/curator5.yaml \
          --from-file=config.yaml=/my/config/config.yaml \
          ; sleep 1

      The next iteration will use this configuration.

  • If you are using the action file:

    1. Create and modify the configuration:

      1. Copy the Curator configuration file and the action file from the Curator configuration map and create separate files for each:

        $ oc extract configmap/curator --keys=curator5.yaml,actions.yaml --to=/my/config
      2. Edit the /my/config/curator5.yaml and /my/config/actions.yaml files.
    2. Delete the existing Curator config map and add the edited YAML files to a new Curator config map.

      $ oc delete configmap curator ; sleep 1
      $ oc create configmap curator \
          --from-file=curator5.yaml=/my/config/curator5.yaml \
          --from-file=actions.yaml=/my/config/actions.yaml \
          ; sleep 1

      The next iteration will use this configuration.

7.6.5. Using the Curator Action file

The Curator ConfigMap in the openshift-logging project includes a Curator action file where you configure any Curator action to be run periodically.

However, when you use the action file, OpenShift Container Platform ignores the config.yaml section of the curator ConfigMap, which is configured to ensure important internal indices do not get deleted by mistake. In order to use the action file, you should add an exclude rule to your configuration to retain these indices. You also must manually add all the other patterns following the steps in this topic.

Important

The actions and config.yaml are mutually-exclusive configuration files. Once the actions file exist, OpenShift Container Platform ignores the config.yaml file. Using the action file is recommended only for advanced users as using this file can be destructive to the cluster and can cause removal of required indices/settings from Elasticsearch.

Prerequisite

  • Cluster logging and Elasticsearch must be installed.
  • Set cluster logging to the unmanaged state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

Procedure

To configure Curator to delete indices:

  1. Edit the Curator ConfigMap:

    oc edit cm/curator -n openshift-logging
  2. Make the following changes to the action file:

    actions:
    1:
          action: delete_indices 1
          description: >-
            Delete .operations indices older than 30 days.
            Ignore the error if the filter does not
            result in an actionable list of indices (ignore_empty_list).
            See https://www.elastic.co/guide/en/elasticsearch/client/curator/5.2/ex_delete_indices.html
          options:
            # Swallow curator.exception.NoIndices exception
            ignore_empty_list: True
            # In seconds, default is 300
            timeout_override: ${CURATOR_TIMEOUT}
            # Don't swallow any other exceptions
            continue_if_exception: False
            # Optionally disable action, useful for debugging
            disable_action: False
          # All filters are bound by logical AND
          filters:            2
          - filtertype: pattern
            kind: regex
            value: '^\.operations\..*$'
            exclude: False    3
          - filtertype: age
            # Parse timestamp from index name
            source: name
            direction: older
            timestring: '%Y.%m.%d'
            unit: days
            unit_count: 30
            exclude: False
    1
    Specify delete_indices to delete the specified index.
    2
    Use the filers parameters to specify the index to be deleted. See the Elastic Search curator documentation for information on these parameters.
    3
    Specify false to allow the index to be deleted.

7.7. Configuring the logging collector

OpenShift Container Platform uses Fluentd to collect operations and application logs from your cluster and enriches the data with Kubernetes pod and project metadata.

You can configure log location, use an external log aggregator, and make other configurations for the log collector.

Note

You must set cluster logging to Unmanaged state before performing these configurations, unless otherwise noted. For more information, see Changing the cluster logging management state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades. For more information, see Support policy for unmanaged Operators.

7.7.1. Viewing logging collector pods

You can use the oc get pods --all-namespaces -o wide command to see the nodes where the Fluentd are deployed.

Procedure

Run the following command in the openshift-logging project:

$ oc get pods --all-namespaces -o wide | grep fluentd

NAME                         READY     STATUS    RESTARTS   AGE     IP            NODE                           NOMINATED NODE   READINESS GATES
fluentd-5mr28                1/1       Running   0          4m56s   10.129.2.12   ip-10-0-164-233.ec2.internal   <none>           <none>
fluentd-cnc4c                1/1       Running   0          4m56s   10.128.2.13   ip-10-0-155-142.ec2.internal   <none>           <none>
fluentd-nlp8z                1/1       Running   0          4m56s   10.131.0.13   ip-10-0-138-77.ec2.internal    <none>           <none>
fluentd-rknlk                1/1       Running   0          4m56s   10.128.0.33   ip-10-0-128-130.ec2.internal   <none>           <none>
fluentd-rsm49                1/1       Running   0          4m56s   10.129.0.37   ip-10-0-163-191.ec2.internal   <none>           <none>
fluentd-wjt8s                1/1       Running   0          4m56s   10.130.0.42   ip-10-0-156-251.ec2.internal   <none>           <none>

7.7.2. Configure log collector CPU and memory limits

The log collector allows for adjustments to both the CPU and memory limits.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    $ oc edit ClusterLogging instance
    
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
      collection:
        logs:
          fluentd:
            resources:
              limits: 1
                memory: 736Mi
              requests:
                cpu: 100m
                memory: 736Mi
    1
    Specify the CPU and memory limits and requests as needed. The values shown are the default values.

7.7.3. Configuring Buffer Chunk Limiting for Fluentd

If the Fluentd log collector is unable to keep up with a high number of logs, Fluentd performs file buffering to reduce memory usage and prevent data loss.

Fluentd file buffering stores records in chunks. Chunks are stored in buffers.

You can tune file buffering in your cluster by editing environment variables in the Fluentd daemon set:

Note

To modify the FILE_BUFFER_LIMIT or BUFFER_SIZE_LIMIT parameters in the Fluentd daemon set, you must set cluster logging to the unmanaged state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

  • BUFFER_SIZE_LIMIT. This parameter determines the maximum size of each chunk file before Fluentd creates a new chunk. The default is 8M. This parameter sets the Fluentd chunk_limit_size variable.

    A high BUFFER_SIZE_LIMIT can collect more records per chunk file. However, bigger records take longer to be sent to the logstore.

  • FILE_BUFFER_LIMIT. This parameter determines the file buffer size per logging output. This value is only a request based on the available space on the node where a Fluentd pod is scheduled. OpenShift Container Platform does not allow Fluentd to exceed the node capacity. The default is 256Mi.

    A high FILE_BUFFER_LIMIT could translate to a higher BUFFER_QUEUE_LIMIT based the number of outputs. However, if the node’s space is under pressure, Fluentd can fail.

    By default, the number_of_outputs is 1 if all the logs are sent to a single resource, and is incremented by 1 for each additional resource. You might have multiple outputs if you use the Log Forwarding API, the Fluentd Forward protocol, or syslog protocol to forward logs to external locations.

    The permanent volume size must be larger than FILE_BUFFER_LIMIT multiplied by the number of outputs.

  • BUFFER_QUEUE_LIMIT. This parameter is the maximum number of buffer chunks allowed. The BUFFER_QUEUE_LIMIT parameter is not directly tunable. OpenShift Container Platform calculates this value based on the number of logging outputs, the chunk size, and the filesystem space available. The default is 32 chunks. To change the BUFFER_QUEUE_LIMIT, you must change the value of FILE_BUFFER_LIMIT. The BUFFER_QUEUE_LIMIT parameter sets the Fluentd queue_limit_length parameter.

    OpenShift Container Platform calculates the BUFFER_QUEUE_LIMIT as (FILE_BUFFER_LIMIT / (number_of_outputs * BUFFER_SIZE_LIMIT)).

    Using the default set of values, the value of BUFFER_QUEUE_LIMIT is 32:

    • FILE_BUFFER_LIMIT = 256Mi
    • number_of_outputs = 1
    • BUFFER_SIZE_LIMIT = 8Mi

OpenShift Container Platform uses the Fluentd file buffer plug-in to configure how the chunks are stored. You can see the location of the buffer file using the following command:

$ oc get cm fluentd -o json | jq -r '.data."fluent.conf"'
<buffer>
   @type file 1
   path '/var/lib/flunetd/retry-elasticsearch' 2
1
The Fluentd file buffer plugin. Do not change this value.
2
The path where buffer chunks are stored.

Prerequisite

  • Set cluster logging to the unmanaged state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

Procedure

To configure Buffer Chunk Limiting:

  1. Edit either of the following parameters in the fluentd daemon set.

    spec:
      template:
        spec:
          containers:
              env:
              - name: FILE_BUFFER_LIMIT 1
                value: "256"
              - name: BUFFER_SIZE_LIMIT 2
                value: 8Mi
    1
    Specify the Fluentd file buffer size per output.
    2
    Specify the maximum size of each Fluentd buffer chunk.

7.7.4. Configuring the logging collector using environment variables

You can use environment variables to modify the configuration of the Fluentd log collector.

See the Fluentd README in Github for lists of the available environment variables.

Prerequisite

  • Set cluster logging to the unmanaged state. Operators in an unmanaged state are unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

Procedure

Set any of the Fluentd environment variables as needed:

oc set env ds/fluentd <env-var>=<value>

For example:

oc set env ds/fluentd LOGGING_FILE_AGE=30

7.7.5. About logging collector alerts

The following alerts are generated by the logging collector and can be viewed on the Alerts tab of the Prometheus UI.

All the logging collector alerts are listed on the MonitoringAlerts page of the OpenShift Container Platform web console. Alerts are in one of the following states:

  • Firing. The alert condition is true for the duration of the timeout. Click the Options menu at the end of the firing alert to view more information or silence the alert.
  • Pending The alert condition is currently true, but the timeout has not been reached.
  • Not Firing. The alert is not currently triggered.

Table 7.3. Fluentd Prometheus alerts

AlertMessageDescriptionSeverity

FluentdErrorsHigh

In the last minute, <value> errors reported by fluentd <instance>.

Fluentd is reporting a higher number of issues than the specified number, default 10.

Critical

FluentdNodeDown

Prometheus could not scrape fluentd <instance> for more than 10m.

Fluentd is reporting that Prometheus could not scrape a specific Fluentd instance.

Critical

FluentdQueueLengthBurst

In the last minute, fluentd <instance> buffer queue length increased more than 32. Current value is <value>.

Fluentd is reporting that it is overwhelmed.

Warning

FluentdQueueLengthIncreasing

In the last 12h, fluentd <instance> buffer queue length constantly increased more than 1. Current value is <value>.

Fluentd is reporting queue usage issues.

Critical

7.8. Collecting and storing Kubernetes events

The OpenShift Container Platform Event Router is a pod that watches Kubernetes events and logs them for collection by cluster logging. You must manually deploy the Event Router.

The Event Router collects events from all projects and writes them to STDOUT. Fluentd collects those events and forwards them into the OpenShift Container Platform Elasticsearch instance. Elasticsearch indexes the events to the infra index.

Important

The Event Router adds additional load to Fluentd and can impact the number of other log messages that can be processed.

7.8.1. Deploying and configuring the Event Router

Use the following steps to deploy the Event Router into your cluster. You should always deploy the Event Router to the openshift-logging project to ensure it collects events from across the cluster.

The following Template object creates the service account, cluster role, and cluster role binding required for the Event Router. The template also configures and deploys the Event Router pod. You can use this template without making changes, or change the Deployment object CPU and memory requests.

Prerequisites

  • You need proper permissions to create service accounts and update cluster role bindings. For example, you can run the following template with a user that has the cluster-admin role.
  • Cluster logging must be installed.

Procedure

  1. Create a template for the Event Router:

    kind: Template
    apiVersion: v1
    metadata:
      name: eventrouter-template
      annotations:
        description: "A pod forwarding kubernetes events to cluster logging stack."
        tags: "events,EFK,logging,cluster-logging"
    objects:
      - kind: ServiceAccount 1
        apiVersion: v1
        metadata:
          name: eventrouter
          namespace: ${NAMESPACE}
      - kind: ClusterRole 2
        apiVersion: v1
        metadata:
          name: event-reader
        rules:
        - apiGroups: [""]
          resources: ["events"]
          verbs: ["get", "watch", "list"]
      - kind: ClusterRoleBinding  3
        apiVersion: v1
        metadata:
          name: event-reader-binding
        subjects:
        - kind: ServiceAccount
          name: eventrouter
          namespace: ${NAMESPACE}
        roleRef:
          kind: ClusterRole
          name: event-reader
      - kind: ConfigMap 4
        apiVersion: v1
        metadata:
          name: eventrouter
          namespace: ${NAMESPACE}
        data:
          config.json: |-
            {
              "sink": "stdout"
            }
      - kind: Deployment 5
        apiVersion: apps/v1
        metadata:
          name: eventrouter
          namespace: ${NAMESPACE}
          labels:
            component: eventrouter
            logging-infra: eventrouter
            provider: openshift
        spec:
          selector:
            matchLabels:
              component: eventrouter
              logging-infra: eventrouter
              provider: openshift
          replicas: 1
          template:
            metadata:
              labels:
                component: eventrouter
                logging-infra: eventrouter
                provider: openshift
              name: eventrouter
            spec:
              serviceAccount: eventrouter
              containers:
                - name: kube-eventrouter
                  image: ${IMAGE}
                  imagePullPolicy: IfNotPresent
                  resources:
                    requests:
                      cpu: ${CPU}
                      memory: ${MEMORY}
                  volumeMounts:
                  - name: config-volume
                    mountPath: /etc/eventrouter
              volumes:
                - name: config-volume
                  configMap:
                    name: eventrouter
    parameters:
      - name: IMAGE
        displayName: Image
        value: "registry.redhat.io/openshift4/ose-logging-eventrouter:latest"
      - name: CPU  6
        displayName: CPU
        value: "100m"
      - name: MEMORY 7
        displayName: Memory
        value: "128Mi"
      - name: NAMESPACE
        displayName: Namespace
        value: "openshift-logging" 8
    1
    Creates a Service Account in the openshift-logging project for the Event Router.
    2
    Creates a ClusterRole to monitor for events in the cluster.
    3
    Creates a ClusterRoleBinding to bind the ClusterRole to the ServiceAccount.
    4
    Creates a ConfigMap in the openshift-logging project to generate the required config.json file.
    5
    Creates a Deployment object in the openshift-logging project to generate and configure the Event Router pod.
    6
    Specifies the minimum amount of memory to allocate to the Event Router pod. Defaults to 128Mi.
    7
    Specifies the minimum amount of CPU to allocate to the Event Router pod. Defaults to 100m.
    8
    Specifies the openshift-logging project to install objects in.
  2. Use the following command to process and apply the template:

    $ oc process -f <templatefile> | oc apply -n openshift-logging -f -

    For example:

    $ oc process -f eventrouter.yaml | oc apply -n openshift-logging -f -

    Example output

    serviceaccount/logging-eventrouter created
    clusterrole.authorization.openshift.io/event-reader created
    clusterrolebinding.authorization.openshift.io/event-reader-binding created
    configmap/logging-eventrouter created
    deployment.apps/logging-eventrouter created

  3. Validate that the Event Router installed in the openshift-logging project:

    1. View the new Event Router Pod:

      $ oc get pods --selector  component=eventrouter -o name -n openshift-logging

      Example output

      pod/cluster-logging-eventrouter-d649f97c8-qvv8r

    2. View the events collected by the Event Router:

      $ oc logs <cluster_logging_eventrouter_pod> -n openshift-logging

      For example:

      $ oc logs cluster-logging-eventrouter-d649f97c8-qvv8r -n openshift-logging

      Example output

      {"verb":"ADDED","event":{"metadata":{"name":"openshift-service-catalog-controller-manager-remover.1632d931e88fcd8f","namespace":"openshift-service-catalog-removed","selfLink":"/api/v1/namespaces/openshift-service-catalog-removed/events/openshift-service-catalog-controller-manager-remover.1632d931e88fcd8f","uid":"787d7b26-3d2f-4017-b0b0-420db4ae62c0","resourceVersion":"21399","creationTimestamp":"2020-09-08T15:40:26Z"},"involvedObject":{"kind":"Job","namespace":"openshift-service-catalog-removed","name":"openshift-service-catalog-controller-manager-remover","uid":"fac9f479-4ad5-4a57-8adc-cb25d3d9cf8f","apiVersion":"batch/v1","resourceVersion":"21280"},"reason":"Completed","message":"Job completed","source":{"component":"job-controller"},"firstTimestamp":"2020-09-08T15:40:26Z","lastTimestamp":"2020-09-08T15:40:26Z","count":1,"type":"Normal"}}

      You can also use Kibana to view events by creating an index pattern using the Elasticsearch infra index.

7.9. Using tolerations to control cluster logging pod placement

You can use taints and tolerations to ensure that cluster logging pods run on specific nodes and that no other workload can run on those nodes.

Taints and tolerations are simple key:value pair. A taint on a node instructs the node to repel all pods that do not tolerate the taint.

The key is any string, up to 253 characters and the value is any string up to 63 characters. The string must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.

Sample cluster logging CR with tolerations

apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: openshift-logging
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    elasticsearch:
      nodeCount: 1
      tolerations: 1
      - key: "logging"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 6000
      resources:
        limits:
          memory: 8Gi
        requests:
          cpu: 100m
          memory: 1Gi
      storage: {}
      redundancyPolicy: "ZeroRedundancy"
  visualization:
    type: "kibana"
    kibana:
      tolerations: 2
      - key: "logging"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 6000
      resources:
        limits:
          memory: 2Gi
        requests:
          cpu: 100m
          memory: 1Gi
      replicas: 1
  curation:
    type: "curator"
    curator:
      tolerations: 3
      - key: "logging"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 6000
      resources:
        limits:
          memory: 200Mi
        requests:
          cpu: 100m
          memory: 100Mi
      schedule: "*/5 * * * *"
  collection:
    logs:
      type: "fluentd"
      fluentd:
        tolerations: 4
        - key: "logging"
          operator: "Exists"
          effect: "NoExecute"
          tolerationSeconds: 6000
        resources:
          limits:
            memory: 2Gi
          requests:
            cpu: 100m
            memory: 1Gi

1
This toleration is added to the Elasticsearch pods.
2
This toleration is added to the Kibana pod.
3
This toleration is added to the Curator pod.
4
This toleration is added to the logging collector pods.

7.9.1. Using tolerations to control the Elasticsearch pod placement

You can control which nodes the Elasticsearch pods runs on and prevent other workloads from using those nodes by using tolerations on the pods.

You apply tolerations to Elasticsearch pods through the ClusterLogging custom resource (CR) and apply taints to a node through the node specification. A taint on a node is a key:value pair that instructs the node to repel all pods that do not tolerate the taint. Using a specific key:value pair that is not on other pods ensures only Elasticsearch pods can run on that node.

By default, the Elasticsearch pods have the following toleration:

tolerations:
- effect: "NoExecute"
  key: "node.kubernetes.io/disk-pressure"
  operator: "Exists"

Prerequisites

  • Cluster logging and Elasticsearch must be installed.

Procedure

  1. Use the following command to add a taint to a node where you want to schedule the cluster logging pods:

    $ oc adm taint nodes <node-name> <key>=<value>:<effect>

    For example:

    $ oc adm taint nodes node1 elasticsearch=node:NoExecute

    This example places a taint on node1 that has key elasticsearch, value node, and taint effect NoExecute. Nodes with the NoExecute effect schedule only pods that match the taint and remove existing pods that do not match.

  2. Edit the logstore section of the ClusterLogging custom resource (CR) to configure a toleration for the Elasticsearch pods:

      logStore:
        type: "elasticsearch"
        elasticsearch:
          nodeCount: 1
          tolerations:
          - key: "elasticsearch"  1
            operator: "Exists"  2
            effect: "NoExecute"  3
            tolerationSeconds: 6000  4
    1
    Specify the key that you added to the node.
    2
    Specify the Exists operator to require a taint with the key elasticsearch to be present on the Node.
    3
    Specify the NoExecute effect.
    4
    Optionally, specify the tolerationSeconds parameter to set how long a pod can remain bound to a node before being evicted.

This toleration matches the taint created by the oc adm taint command. A pod with this toleration could be scheduled onto node1.

7.9.2. Using tolerations to control the Kibana pod placement

You can control which nodes the Kibana pods run on and prevent other workloads from using those nodes by using tolerations on the pods.

You apply tolerations to the Kibana pods through the ClusterLogging custom resource (CR) and apply taints to a node through the node specification. A taint on a node is a key:value pair that instructs the node to repel all pods that do not tolerate the taint. Using a specific key:value pair that is not on other pods ensures only the Kibana pod can run on that node.

Prerequisites

  • Cluster logging and Elasticsearch must be installed.

Procedure

  1. Use the following command to add a taint to a node where you want to schedule the Kibana pod:

    $ oc adm taint nodes <node-name> <key>=<value>:<effect>

    For example:

    $ oc adm taint nodes node1 kibana=node:NoExecute

    This example places a taint on node1 that has key kibana, value node, and taint effect NoExecute. You must use the NoExecute taint effect. NoExecute schedules only pods that match the taint and remove existing pods that do not match.

  2. Edit the visualization section of the ClusterLogging custom resource (CR) to configure a toleration for the Kibana pod:

      visualization:
        type: "kibana"
        kibana:
          tolerations:
          - key: "kibana"  1
            operator: "Exists"  2
            effect: "NoExecute"  3
            tolerationSeconds: 6000 4
    1
    Specify the key that you added to the node.
    2
    Specify the Exists operator to require the key/value/effect parameters to match.
    3
    Specify the NoExecute effect.
    4
    Optionally, specify the tolerationSeconds parameter to set how long a pod can remain bound to a node before being evicted.

This toleration matches the taint created by the oc adm taint command. A pod with this toleration would be able to schedule onto node1.

7.9.3. Using tolerations to control the Curator Pod placement

You can control which node the Curator Pod runs on and prevent other workloads from using those nodes by using tolerations on the Pod.

You apply tolerations to the Curator Pod through the ClusterLogging custom resource (CR) and apply taints to a node through the node specification. A taint on a node is a key:value pair that instructs the node to repel all Pods that do not tolerate the taint. Using a specific key:value pair that is not on other Pods ensures only the Curator Pod can run on that node.

Prerequisites

  • Cluster logging and Elasticsearch must be installed.

Procedure

  1. Use the following command to add a taint to a node where you want to schedule the Curator Pod:

    $ oc adm taint nodes <node-name> <key>=<value>:<effect>

    For example:

    $ oc adm taint nodes node1 curator=node:NoExecute

    This example places a taint on node1 that has key curator, value node, and taint effect NoExecute. You must use the NoExecute taint effect. NoExecute schedules only Pods that match the taint and remove existing Pods that do not match.

  2. Edit the curation section of the ClusterLogging custom resource (CR) to configure a toleration for the Curator Pod:

      curation:
        type: "curator"
        curator:
          tolerations:
          - key: "curator"  1
            operator: "Exists"  2
            effect: "NoExecute"  3
            tolerationSeconds: 6000  4
    1
    Specify the key that you added to the node.
    2
    Specify the Exists operator to require the key/value/effect parameters to match.
    3
    Specify the NoExecute effect.
    4
    Optionally, specify the tolerationSeconds parameter to set how long a Pod can remain bound to a node before being evicted.

This toleration matches the taint that is created by the oc adm taint command. A Pod with this toleration would be able to schedule onto node1.

7.9.4. Using tolerations to control the log collector pod placement

You can ensure which nodes the logging collector pods run on and prevent other workloads from using those nodes by using tolerations on the pods.

You apply tolerations to logging collector pods through the ClusterLogging custom resource (CR) and apply taints to a node through the node specification. You can use taints and tolerations to ensure the pod does not get evicted for things like memory and CPU issues.

By default, the logging collector pods have the following toleration:

tolerations:
- key: "node-role.kubernetes.io/master"
  operator: "Exists"
  effect: "NoExecute"

Prerequisites

  • Cluster logging and Elasticsearch must be installed.

Procedure

  1. Use the following command to add a taint to a node where you want logging collector pods to schedule logging collector pods:

    $ oc adm taint nodes <node-name> <key>=<value>:<effect>

    For example:

    $ oc adm taint nodes node1 collector=node:NoExecute

    This example places a taint on node1 that has key collector, value node, and taint effect NoExecute. You must use the NoExecute taint effect. NoExecute schedules only pods that match the taint and removes existing pods that do not match.

  2. Edit the collection section of the ClusterLogging custom resource (CR) to configure a toleration for the logging collector pods:

      collection:
        logs:
          type: "fluentd"
          rsyslog:
            tolerations:
            - key: "collector"  1
              operator: "Exists"  2
              effect: "NoExecute"  3
              tolerationSeconds: 6000  4
    1
    Specify the key that you added to the node.
    2
    Specify the Exists operator to require the key/value/effect parameters to match.
    3
    Specify the NoExecute effect.
    4
    Optionally, specify the tolerationSeconds parameter to set how long a pod can remain bound to a node before being evicted.

This toleration matches the taint created by the oc adm taint command. A pod with this toleration would be able to schedule onto node1.

7.9.5. Additional resources

For more information about taints and tolerations, see Controlling pod placement using node taints.

7.10. Forward logs to third party systems

By default, OpenShift Container Platform cluster logging sends logs to the default internal Elasticsearch logstore, defined in the ClusterLogging custom resource.

You can configure cluster logging to send logs to destinations outside of your OpenShift Container Platform cluster instead of the default Elasticsearch logstore using the following methods:

  • Sending logs using the Fluentd forward protocol. You can create a Configmap to use the Fluentd forward protocol to securely send logs to an external logging aggregator that accepts the Fluent forward protocol.
  • Sending logs using syslog. You can create a Configmap to use the syslog protocol to send logs to an external syslog (RFC 3164) server.

Alternatively, you can use the Log Forwarding API, currently in Technology Preview. The Log Forwarding API, which is easier to configure than the Fluentd protocol and syslog, exposes configuration for sending logs to the internal Elasticsearch logstore and to external Fluentd log aggregation solutions.

Important

The Log Forwarding API is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

The methods for forwarding logs using a ConfigMap are deprecated and will be replaced by the Log Forwarding API in a future release.

7.10.1. Forwarding logs using the Fluentd forward protocol

You can use the Fluentd forward protocol to send a copy of your logs to an external log aggregator, instead of the default Elasticsearch logstore. On the OpenShift Container Platform cluster, you use the Fluentd forward protocol to send logs to a server configured to accept the protocol. You are responsible to configure the external log aggregator to receive the logs from OpenShift Container Platform.

Note

This method for forwarding logs is deprecated in OpenShift Container Platform and will be replaced by the Log Forwarding API in a future release.

To configure OpenShift Container Platform to send logs using the Fluentd forward protocol, create a ConfigMap called secure-forward in the openshift-logging namespace that points to an external log aggregator.

Important

Starting with the OpenShift Container Platform 4.3, the process for using the Fluentd forward protocol has changed. You now need to create a ConfigMap, as described below.

Additionally, you can add any certificates required by your configuration to a secret named secure-forward that will be mounted to the Fluentd Pods.

Sample secure-forward.conf

<store>
  @type forward
  <security>
    self_hostname ${hostname} # ${hostname} is a placeholder.
    shared_key "fluent-receiver"
  </security>
  transport tls
  tls_verify_hostname false           # Set false to ignore server cert hostname.

  tls_cert_path '/etc/ocp-forward/ca-bundle.crt'
  <buffer>
    @type file
    path '/var/lib/fluentd/secureforwardlegacy'
    queued_chunks_limit_size "#{ENV['BUFFER_QUEUE_LIMIT'] || '1024' }"
    chunk_limit_size "#{ENV['BUFFER_SIZE_LIMIT'] || '1m' }"
    flush_interval "#{ENV['FORWARD_FLUSH_INTERVAL'] || '5s'}"
    flush_at_shutdown "#{ENV['FLUSH_AT_SHUTDOWN'] || 'false'}"
    flush_thread_count "#{ENV['FLUSH_THREAD_COUNT'] || 2}"
    retry_max_interval "#{ENV['FORWARD_RETRY_WAIT'] || '300'}"
    retry_forever true
    # the systemd journald 0.0.8 input plugin will just throw away records if the buffer
    # queue limit is hit - 'block' will halt further reads and keep retrying to flush the
    # buffer to the remote - default is 'exception' because in_tail handles that case
    overflow_action "#{ENV['BUFFER_QUEUE_FULL_ACTION'] || 'exception'}"
  </buffer>
  <server>
    host fluent-receiver.openshift-logging.svc  # or IP
    port 24224
  </server>
</store>

Sample secure-forward ConfigMap based on the configuration

apiVersion: v1
data:
 secure-forward.conf: "<store>
     \ @type forward
     \ <security>
     \   self_hostname ${hostname} # ${hostname} is a placeholder.
     \   shared_key \"fluent-receiver\"
     \ </security>
     \ transport tls
     \ tls_verify_hostname false           # Set false to ignore server cert hostname.
     \ tls_cert_path '/etc/ocp-forward/ca-bundle.crt'
     \ <buffer>
     \   @type file
     \   path '/var/lib/fluentd/secureforwardlegacy'
     \   queued_chunks_limit_size \"#{ENV['BUFFER_QUEUE_LIMIT'] || '1024' }\"
     \   chunk_limit_size \"#{ENV['BUFFER_SIZE_LIMIT'] || '1m' }\"
     \   flush_interval \"#{ENV['FORWARD_FLUSH_INTERVAL'] || '5s'}\"
     \   flush_at_shutdown \"#{ENV['FLUSH_AT_SHUTDOWN'] || 'false'}\"
     \   flush_thread_count \"#{ENV['FLUSH_THREAD_COUNT'] || 2}\"
     \   retry_max_interval \"#{ENV['FORWARD_RETRY_WAIT'] || '300'}\"
     \   retry_forever true
     \   # the systemd journald 0.0.8 input plugin will just throw away records if the buffer
     \   # queue limit is hit - 'block' will halt further reads and keep retrying to flush the
     \   # buffer to the remote - default is 'exception' because in_tail handles that case
     \   overflow_action \"#{ENV['BUFFER_QUEUE_FULL_ACTION'] || 'exception'}\"
     \ </buffer>
     \ <server>
     \   host fluent-receiver.openshift-logging.svc  # or IP
     \   port 24224
     \ </server>
     </store>"
kind: ConfigMap
metadata:
  creationTimestamp: "2020-01-15T18:56:04Z"
  name: secure-forward
  namespace: openshift-logging
  resourceVersion: "19148"
  selfLink: /api/v1/namespaces/openshift-logging/configmaps/secure-forward
  uid: 6fd83202-93ab-d851b1d0f3e8

Procedure

To configure OpenShift Container Platform to forward logs using the Fluentd forward protocol:

  1. Create a configuration file named secure-forward.conf for the forward parameters:

    1. Configure the secrets and TLS information:

       <store>
        @type forward
      
        self_hostname ${hostname} 1
        shared_key <SECRET_STRING> 2
      
        transport tls 3
      
        tls_verify_hostname true 4
        tls_cert_path <path_to_file> 5
      1
      Specify the default value of the auto-generated certificate common name (CN).
      2
      Enter the Shared key between nodes
      3
      Specify tls to enable TLS validation.
      4
      Set to true to verify the server cert hostname. Set to false to ignore server cert hostname.
      5
      Specify the path to private CA certificate file as /etc/ocp-forward/ca_cert.pem.

      To use mTLS, see the Fluentd documentation for information about client certificate, key parameters, and other settings.

    2. Configure the name, host, and port for your external Fluentd server:

        <server>
          name 1
          host 2
          hostlabel 3
          port 4
        </server>
        <server> 5
          name
          host
        </server>
      1
      Optionally, enter a name for this server.
      2
      Specify the host name or IP of the server.
      3
      Specify the host label of the server.
      4
      Specify the port of the server.
      5
      Optionally, add additional servers. If you specify two or more servers, forward uses these server nodes in a round-robin order.

      For example:

        <server>
          name externalserver1
          host 192.168.1.1
          hostlabel externalserver1.example.com
          port 24224
        </server>
        <server>
          name externalserver2
          host externalserver2.example.com
          port 24224
        </server>
        </store>
  2. Create a ConfigMap named secure-forward in the openshift-logging namespace from the configuration file:

    $ oc create configmap secure-forward --from-file=secure-forward.conf -n openshift-logging
  3. Optional: Import any secrets required for the receiver:

    $ oc create secret generic secure-forward --from-file=<arbitrary-name-of-key1>=cert_file_from_fluentd_receiver --from-literal=shared_key=value_from_fluentd_receiver

    For example:

    $ oc create secret generic secure-forward --from-file=ca-bundle.crt=ca-for-fluentd-receiver/ca.crt --from-literal=shared_key=fluentd-receiver
  4. Refresh the fluentd Pods to apply the secure-forward secret and secure-forward ConfigMap:

    $ oc delete pod --selector logging-infra=fluentd
  5. Configure the external log aggregator to accept messages securely from OpenShift Container Platform.

7.10.2. Forwarding logs using the syslog protocol

You can use the syslog protocol to send a copy of your logs to an external syslog server, instead of the default Elasticsearch logstore. Note the following about this syslog protocol:

  • uses syslog protocol (RFC 3164), not RFC 5424;
  • does not support TLS and thus, is not secure;
  • does not provide Kubernetes metadata, systemd data, or other metadata.
Note

This method for forwarding logs is deprecated in OpenShift Container Platform and will be replaced by the Log Forwarding API in a future release.

There are two versions of the syslog protocol:

  • out_syslog: The non-buffered implementation, which communicates through UDP, does not buffer data and writes out results immediately.
  • out_syslog_buffered: The buffered implementation, which communicates through TCP, buffers data into chunks.

To configure log forwarding using the syslog protocol, create a configuration file, called syslog.conf, with the information needed to forward the logs. Then use that file to create a ConfigMap called syslog in the openshift-logging namespace, which OpenShift Container Platform uses when forwarding the logs. You are responsible to configure your syslog server to receive the logs from OpenShift Container Platform.

Important

Starting with the OpenShift Container Platform 4.3, the process for using the syslog protocol has changed. You now need to create a ConfigMap, as described below.

You can forward logs to multiple syslog servers by specifying separate <store> stanzas in the configuration file.

Sample syslog.conf

<store>
@type syslog_buffered 1
remote_syslog rsyslogserver.openshift-logging.svc.cluster.local 2
port 514 3
hostname ${hostname} 4
remove_tag_prefix tag 5
tag_key ident,systemd.u.SYSLOG_IDENTIFIER 6
facility local0 7
severity info 8
use_record true 9
payload_key message 10
</store>

1
The syslog protocol, either: syslog or syslog_buffered.
2
The fully qualified domain name (FQDN) or IP address of the syslog server.
3
The port number to connect on. Defaults to 514.
4
The name of the syslog server.
5
Removes the prefix from the tag. Defaults to '' (empty).
6
The field to set the syslog key.
7
The syslog log facility or source.
8
The syslog log severity.
9
Determines whether to use the severity and facility from the record if available.
10
Optional. The key to set the payload of the syslog message. Defaults to message.
Note

Configuring the payload_key parameter prevents other parameters from being forwarded to the syslog.

Sample syslog ConfigMap based on the sample syslog.conf

kind: ConfigMap
apiVersion: v1
metadata:
  name: syslog
  namespace: openshift-logging
data:
  syslog.conf: |
    <store>
     @type syslog_buffered
     remote_syslog syslogserver.openshift-logging.svc.cluster.local
     port 514
     hostname ${hostname}
     remove_tag_prefix tag
     tag_key ident,systemd.u.SYSLOG_IDENTIFIER
     facility local0
     severity info
     use_record true
     payload_key message
    </store>

Procedure

To configure OpenShift Container Platform to forward logs using the syslog protocol:

  1. Create a configuration file named syslog.conf that contains the following parameters within the <store> stanza:

    1. Specify the syslog protocol type:

      @type syslog_buffered 1
      1
      Specify the protocol to use, either: syslog or syslog_buffered.
    2. Configure the name, host, and port for your external syslog server:

      remote_syslog <remote> 1
      port <number> 2
      hostname <name> 3
      1
      Specify the FQDN or IP address of the syslog server.
      2
      Specify the port of the syslog server.
      3
      Specify a name for this syslog server.

      For example:

      remote_syslog syslogserver.openshift-logging.svc.cluster.local
      port 514
      hostname fluentd-server
    3. Configure the other syslog variables as needed:

      remove_tag_prefix 1
      tag_key <key> 2
      facility <value>  3
      severity <value>  4
      use_record <value> 5
      payload_key message 6
      1
      Add this parameter to remove the tag field from the syslog prefix.
      2
      Specify the field to set the syslog key.
      3
      Specify the syslog log facility or source. For values, see RTF 3164.
      4
      Specify the syslog log severity. For values, see link:RTF 3164.
      5
      Specify true to use the severity and facility from the record if available. If true, the container_name, namespace_name, and pod_name are included in the output content.
      6
      Specify the key to set the payload of the syslog message. Defaults to message.

      For example:

      facility local0
      severity info

      The configuration file appears similar to the following:

      <store>
      @type syslog_buffered
      remote_syslog syslogserver.openshift-logging.svc.cluster.local
      port 514
      hostname ${hostname}
      tag_key ident,systemd.u.SYSLOG_IDENTIFIER
      facility local0
      severity info
      use_record false
      </store>
  2. Create a ConfigMap named syslog in the openshift-logging namespace from the configuration file:

    $ oc create configmap syslog --from-file=syslog.conf -n openshift-logging

    The Cluster Logging Operator redeploys the Fluentd Pods. If the Pods do not redeploy, you can delete the Fluentd Pods to force them to redeploy.

    $ oc delete pod --selector logging-infra=fluentd

7.10.3. Forwarding logs using the Log Forwarding API

The Log Forwarding API enables you to configure custom pipelines to send container and node logs to specific endpoints within or outside of your cluster. You can send logs by type to the internal OpenShift Container Platform Elasticsearch instance and to remote destinations not managed by OpenShift Container Platform cluster logging, such as an existing logging service, an external Elasticsearch cluster, external log aggregation solutions, or a Security Information and Event Management (SIEM) system.

Important

The Log Fowarding API is currently a Technology Preview feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs), might not be functionally complete, and Red Hat does not recommend to use them for production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

See the Red Hat Technology Preview features support scope for more information.

You can send different types of logs to different systems allowing you to control who in your organization can access each type. Optional TLS support ensures that you can send logs using secure communication as required by your organization.

Note

Using the Log Forwarding API is optional. If you want to forward logs to only the internal OpenShift Container Platform Elasticsearch instance, do not configure the Log Forwarding API.

7.10.3.1. Understanding the Log Forwarding API

Forwarding cluster logs using the Log Forwarding API requires a combination of outputs and pipelines to send logs to specific endpoints inside and outside of your OpenShift Container Platform cluster.

Note

If you want to use only the default internal OpenShift Container Platform Elasticsearch logstore, do not configure the Log Forwarding feature.

By default, the Cluster Logging Operator sends logs to the default internal Elasticsearch logstore, as defined in the ClusterLogging custom resource. To use the Log Forwarding feature, you create a custom logforwarding configuration file to send logs to endpoints you specify.

An output is the destination for log data and a pipeline defines simple routing for one source to one or more outputs.

An output can be either:

  • elasticsearch to forward logs to an external Elasticsearch v5.x cluster and/or the internal OpenShift Container Platform Elasticsearch instance.
  • forward to forward logs to an external log aggregation solution. This option uses the Fluentd forward protocols.
Note

The endpoint must be a server name or FQDN, not an IP Address, if the cluster-wide proxy using the CIDR annotation is enabled.

A pipeline associates the source of the data to an output. The source of the data is one of the following:

  • logs.app - Container logs generated by user applications running in the cluster, except infrastructure container applications.
  • logs.infra - Logs generated by infrastructure components running in the cluster and OpenShift Container Platform nodes, such as journal logs. Infrastructure components are pods that run in the openshift*, kube*, or default projects.
  • logs.audit - Logs generated by the node audit system (auditd), which are stored in the /var/log/audit/audit.log file, and the audit logs from the Kubernetes apiserver and the OpenShift apiserver.

Note the following:

  • The internal OpenShift Container Platform Elasticsearch instance does not provide secure storage for audit logs. We recommend you ensure that the system to which you forward audit logs is compliant with your organizational and governmental regulations and is properly secured. OpenShift Container Platform cluster logging does not comply with those regulations.
  • An output supports TLS communication using a secret. Secrets must have keys of: tls.crt, tls.key, and ca-bundler.crt which point to the respective certificates for which they represent. Secrets must have the key shared_key for use when using forward in a secure manner.
  • You are responsible to create and maintain any additional configurations that external destinations might require, such as keys and secrets, service accounts, port opening, or global proxy configuration.

The following example creates three outputs:

  • the internal OpenShift Container Platform Elasticsearch instance,
  • an unsecured externally-managed Elasticsearch instance,
  • a secured external log aggregator using the forward protocols.

Three pipelines send:

  • the application logs to the internal OpenShift Container Platform Elasticsearch,
  • the infrastructure logs to an external Elasticsearch instance,
  • the audit logs to the secured device over the forward protocols.

Sample log forwarding outputs and pipelines

apiVersion: "logging.openshift.io/v1alpha1"
kind: "LogForwarding"
metadata:
  name: instance 1
  namespace: openshift-logging
spec:
  disableDefaultForwarding: true 2
  outputs: 3
   - name: elasticsearch 4
     type: "elasticsearch"  5
     endpoint: elasticsearch.openshift-logging.svc:9200 6
     secret: 7
        name: fluentd
   - name: elasticsearch-insecure
     type: "elasticsearch"
     endpoint: elasticsearch-insecure.svc.messaging.cluster.local
     insecure: true 8
   - name: secureforward-offcluster
     type: "forward"
     endpoint: https://secureforward.offcluster.com:24224
     secret:
        name: secureforward
  pipelines: 9
   - name: container-logs 10
     inputSource: logs.app 11
     outputRefs: 12
     - elasticsearch
     - secureforward-offcluster
   - name: infra-logs
     inputSource: logs.infra
     outputRefs:
     - elasticsearch-insecure
   - name: audit-logs
     inputSource: logs.audit
     outputRefs:
     - secureforward-offcluster

1
The name of the log forwarding CR must be instance.
2
Parameter to disable the default log forwarding behavior.
3
Configuration for the outputs.
4
A name to describe the output.
5
The type of output, either elasticsearch or forward.
6
Enter the endpoint, either the server name, FQDN, or IP address. If the cluster-wide proxy using the CIDR annotation is enabled, the endpoint must be a server name or FQDN, not an IP Address. For the internal OpenShift Container Platform Elasticsearch instance, specify elasticsearch.openshift-logging.svc:9200.
7
Optional name of the secret required by the endpoint for TLS communication. The secret must exist in the openshift-logging project.
8
Optional setting if the endpoint does not use a secret, resulting in insecure communication.
9
Configuration for the pipelines.
10
A name to describe the pipeline.
11
The data source: logs.app, logs.infra, or logs.audit.
12
The name of one or more outputs configured in the CR.
Fluentd log handling when the external log aggregator is unavailable

If your external logging aggregator becomes unavailable and cannot receive logs, Fluentd continues to collect logs and stores them in a buffer. When the log aggregator becomes available, log forwarding resumes, including the buffered logs. If the buffer fills completely, Fluentd stops collecting logs. OpenShift Container Platform rotates the logs and deletes them. You cannot adjust the buffer size or add a persistent volume claim (PVC) to the Fluentd daemon set or pods.

7.10.3.2. Enabling the Log Forwarding API

You must enable the Log Forwarding API before you can forward logs using the API.

Procedure

To enable the Log Forwarding API:

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
  2. Add the clusterlogging.openshift.io/logforwardingtechpreview annotation and set to enabled:

    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      annotations:
        clusterlogging.openshift.io/logforwardingtechpreview: enabled 1
      name: "instance"
      namespace: "openshift-logging"
    spec:
    
    ...
    
      collection: 2
        logs:
          type: "fluentd"
          fluentd: {}
    1
    Enables and disables the Log Forwarding API. Set to enabled to use log forwarding. To use the only the OpenShift Container Platform Elasticsearch instance, set to disabled or do not add the annotation.
    2
    The spec.collection block must be defined to use Fluentd in the ClusterLogging CR.

7.10.3.3. Configuring log forwarding using the Log Forwarding API

To configure the Log Forwarding, edit the ClusterLogging custom resource (CR) to add the clusterlogging.openshift.io/logforwardingtechpreview: enabled annotation and create a `LogForwarding`to specify the outputs, pipelines, and enable log forwarding.

If you enable Log Forwarding, you should define a pipeline all for three source types: logs.app, logs.infra, and logs.audit. The logs from any undefined source type are dropped. For example, if you specify a pipeline for the logs.app and log-audit types, but do not specify a pipeline for the logs.infra type, logs.infra logs are dropped.

Procedure

To configure log forwarding using the API:

  1. Create a Log Forwarding CR YAML file similar to the following:

    apiVersion: "logging.openshift.io/v1alpha1"
    kind: "LogForwarding"
    metadata:
      name: instance 1
      namespace: openshift-logging 2
    spec:
      disableDefaultForwarding: true 3
      outputs: 4
       - name: elasticsearch
         type: "elasticsearch"
         endpoint: elasticsearch.openshift-logging.svc:9200
         secret:
            name: fluentd
       - name: elasticsearch-insecure
         type: "elasticsearch"
         endpoint: elasticsearch-insecure.svc.messaging.cluster.local
         insecure: true
       - name: secureforward-offcluster
         type: "forward"
         endpoint: https://secureforward.offcluster.com:24224
         secret:
            name: secureforward
      pipelines: 5
       - name: container-logs
         inputSource: logs.app
         outputRefs:
         - elasticsearch
         - secureforward-offcluster
       - name: infra-logs
         inputSource: logs.infra
         outputRefs:
         - elasticsearch-insecure
       - name: audit-logs
         inputSource: logs.audit
         outputRefs:
         - secureforward-offcluster
    1
    The name of the log forwarding CR must be instance.
    2
    The namespace for the log forwarding CR must be openshift-logging.
    3
    Set to true disable the default log forwarding behavior.
    4
    Add one or more endpoints:
    • Specify the type of output, either elasticsearch or forward.
    • Enter a name for the output.
    • Enter the endpoint, either the server name, FQDN, or IP address. If the cluster-wide proxy using the CIDR annotation is enabled, the endpoint must be a server name or FQDN, not an IP Address. For the internal OpenShift Container Platform Elasticsearch instance, specify elasticsearch.openshift-logging.svc:9200.
    • Optional: Enter the name of the secret required by the endpoint for TLS communication. The secret must exist in the openshift-logging project.
    • Specify insecure: true if the endpoint does not use a secret, resulting in insecure communication.
    5
    Add one or more pipelines:
    • Enter a name for the pipeline
    • Specify the source type: logs.app, logs.infra, or logs.audit.
    • Specify the name of one or more outputs configured in the CR.

      Note

      If you set disableDefaultForwarding: true you must configure a pipeline and output for all three types of logs, application, infrastructure, and audit. If you do not specify a pipeline and output for a log type, those logs are not stored and will be lost.

  2. Create the CR object:

    $ oc create -f <file-name>.yaml
7.10.3.3.1. Example log forwarding custom resources

A typical Log Forwarding configuration would be similar to the following examples.

The following Log Forwarding custom resource sends all logs to a secured external Elasticsearch logstore:

Sample custom resource to forward to an Elasticsearch logstore

apiVersion: logging.openshift.io/v1alpha1
kind: LogForwarding
metadata:
  name: instance
  namespace: openshift-logging
spec:
  disableDefaultForwarding: true
  outputs:
    - name: user-created-es
      type: elasticsearch
      endpoint: 'elasticsearch-server.openshift-logging.svc:9200'
      secret:
        name: piplinesecret
  pipelines:
    - name: app-pipeline
      inputSource: logs.app
      outputRefs:
        - user-created-es
    - name: infra-pipeline
      inputSource: logs.infra
      outputRefs:
        - user-created-es
    - name: audit-pipeline
      inputSource: logs.audit
      outputRefs:
        - user-created-es

The following Log Forwarding custom resource sends all logs to a secured Fluentd instance using the Fluentd forward protocol.

Sample custom resource to use the forward protocol

apiVersion: logging.openshift.io/v1alpha1
kind: LogForwarding
metadata:
  name: instance
  namespace: openshift-logging
spec:
  disableDefaultForwarding: true
  outputs:
    - name: fluentd-created-by-user
      type: forward
      endpoint: 'fluentdserver.openshift-logging.svc:24224'
      secret:
        name: fluentdserver
  pipelines:
    - name: app-pipeline
      inputSource: logs.app
      outputRefs:
        - fluentd-created-by-user
    - name: infra-pipeline
      inputSource: logs.infra
      outputRefs:
        - fluentd-created-by-user
    - name: clo-default-audit-pipeline
      inputSource: logs.audit
      outputRefs:
        - fluentd-created-by-user

7.10.3.4. Disabling the Log Forwarding API

To disable the Log Forwarding API and to stop forwarding logs to the speified endpoints, remove the metadata.annotations.clusterlogging.openshift.io/logforwardingtechpreview:enabled parameter from the ClusterLogging CR and delete the Log Forwarding CR. The container and node logs will be forwarded to the internal OpenShift Container Platform Elasticsearch instance.

Note

Setting disableDefaultForwarding=false prevents cluster logging from sending logs to the specified endpoints and to default internal OpenShift Container Platform Elasticsearch instance.

Procedure

To disable the Log Forwarding API:

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
  2. Remove the clusterlogging.openshift.io/logforwardingtechpreview annotation:

    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      annotations:
        clusterlogging.openshift.io/logforwardingtechpreview: enabled 1
      name: "instance"
      namespace: "openshift-logging"
    ....
    1
    Remove this annotation.
  3. Delete the Log Forwarding Custom Resource:

    $ oc delete LogForwarding instance -n openshift-logging

7.11. Configuring systemd-journald and Fluentd

Because Fluentd reads from the journal, and the journal default settings are very low, journal entries can be lost because the journal cannot keep up with the logging rate from system services.

We recommend setting RateLimitInterval=1s and RateLimitBurst=10000 (or even higher if necessary) to prevent the journal from losing entries.

7.11.1. Configuring systemd-journald for cluster logging

As you scale up your project, the default logging environment might need some adjustments.

For example, if you are missing logs, you might have to increase the rate limits for journald. You can adjust the number of messages to retain for a specified period of time to ensure that cluster logging does not use excessive resources without dropping logs.

You can also determine if you want the logs compressed, how long to retain logs, how or if the logs are stored, and other settings.

Procedure

  1. Create a journald.conf file with the required settings:

    Compress=yes 1
    ForwardToConsole=no 2
    ForwardToSyslog=no
    MaxRetentionSec=1month 3
    RateLimitBurst=10000 4
    RateLimitInterval=1s
    Storage=persistent 5
    SyncIntervalSec=1s 6
    SystemMaxUse=8g 7
    SystemKeepFree=20% 8
    SystemMaxFileSize=10M 9
    1
    Specify whether you want logs compressed before they are written to the file system. Specify yes to compress the message or no to not compress. The default is yes.
    2
    Configure whether to forward log messages. Defaults to no for each. Specify:
    • ForwardToConsole to forward logs to the system console.
    • ForwardToKsmg to forward logs to the kernel log buffer.
    • ForwardToSyslog to forward to a syslog daemon.
    • ForwardToWall to forward messages as wall messages to all logged-in users.
    3
    Specify the maximum time to store journal entries. Enter a number to specify seconds. Or include a unit: "year", "month", "week", "day", "h" or "m". Enter 0 to disable. The default is 1month.
    4
    Configure rate limiting. If, during the time interval defined by RateLimitIntervalSec, more logs than specified in RateLimitBurst are received, all further messages within the interval are dropped until the interval is over. It is recommended to set RateLimitInterval=1s and RateLimitBurst=10000, which are the defaults.
    5
    Specify how logs are stored. The default is persistent:
    • volatile to store logs in memory in /var/log/journal/.
    • persistent to store logs to disk in /var/log/journal/. systemd creates the directory if it does not exist.
    • auto to store logs in in /var/log/journal/ if the directory exists. If it does not exist, systemd temporarily stores logs in /run/systemd/journal.
    • none to not store logs. systemd drops all logs.
    6
    Specify the timeout before synchronizing journal files to disk for ERR, WARNING, NOTICE, INFO, and DEBUG logs. systemd immediately syncs after receiving a CRIT, ALERT, or EMERG log. The default is 1s.
    7
    Specify the maximum size the journal can use. The default is 8g.
    8
    Specify how much disk space systemd must leave free. The default is 20%.
    9
    Specify the maximum size for individual journal files stored persistently in /var/log/journal. The default is 10M.
    Note

    If you are removing the rate limit, you might see increased CPU utilization on the system logging daemons as it processes any messages that would have previously been throttled.

    For more information on systemd settings, see https://www.freedesktop.org/software/systemd/man/journald.conf.html. The default settings listed on that page might not apply to OpenShift Container Platform.

  2. Convert the journal.conf file to base64:

    $ export jrnl_cnf=$( cat /journald.conf | base64 -w0 )
  3. Create a new MachineConfig for master or worker and add the journal.conf parameters:

    For example:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 50-corp-journald
    spec:
      config:
        ignition:
          version: 2.2.0
        storage:
          files:
          - contents:
              source: data:text/plain;charset=utf-8;base64,${jrnl_cnf}
            mode: 0644 1
            overwrite: true
            path: /etc/systemd/journald.conf 2
    1
    Set the permissions for the journal.conf file. It is recommended to set 0644 permissions.
    2
    Specify the path to the base64-encoded journal.conf file.
  4. Create the MachineConfig:

    $ oc apply -f <filename>.yaml

    The controller detects the new MachineConfig and generates a new rendered-worker-<hash> version.

  5. Monitor the status of the rollout of the new rendered configuration to each node:

    $ oc describe machineconfigpool/worker
    
    
    Name:         worker
    Namespace:
    Labels:       machineconfiguration.openshift.io/mco-built-in=
    Annotations:  <none>
    API Version:  machineconfiguration.openshift.io/v1
    Kind:         MachineConfigPool
    
    ...
    
    Conditions:
      Message:
      Reason:                All nodes are updating to rendered-worker-913514517bcea7c93bd446f4830bc64e