Chapter 10. Log storage

10.1. About log storage

You can use an internal Loki or Elasticsearch log store on your cluster for storing logs, or you can use a ClusterLogForwarder custom resource (CR) to forward logs to an external store.

10.1.1. Log storage types

Loki is a horizontally scalable, highly available, multi-tenant log aggregation system offered as an alternative to Elasticsearch as a log store for the logging.

Elasticsearch indexes incoming log records completely during ingestion. Loki only indexes a few fixed labels during ingestion and defers more complex parsing until after the logs have been stored. This means Loki can collect logs more quickly.

10.1.1.1. About the Elasticsearch log store

The logging Elasticsearch instance is optimized and tested for short term storage, approximately seven days. If you want to retain your logs over a longer term, it is recommended you move the data to a third-party storage system.

Elasticsearch organizes the log data from Fluentd into datastores, or indices, then subdivides each index into multiple pieces called shards, which it spreads across a set of Elasticsearch nodes in an Elasticsearch cluster. You can configure Elasticsearch to make copies of the shards, called replicas, which Elasticsearch also spreads across the Elasticsearch nodes. The ClusterLogging custom resource (CR) allows you to specify how the shards are replicated to provide data redundancy and resilience to failure. You can also specify how long the different types of logs are retained using a retention policy in the ClusterLogging CR.

Note

The number of primary shards for the index templates is equal to the number of Elasticsearch data nodes.

The Red Hat OpenShift Logging Operator and companion OpenShift Elasticsearch Operator ensure that each Elasticsearch node is deployed using a unique deployment that includes its own storage volume. You can use a ClusterLogging custom resource (CR) to increase the number of Elasticsearch nodes, as needed. See the Elasticsearch documentation for considerations involved in configuring storage.

Note

A highly-available Elasticsearch environment requires at least three Elasticsearch nodes, each on a different host.

Role-based access control (RBAC) applied on the Elasticsearch indices enables the controlled access of the logs to the developers. Administrators can access all logs and developers can access only the logs in their projects.

10.1.2. Querying log stores

You can query Loki by using the LogQL log query language.

10.1.3. Additional resources

10.2. Installing log storage

You can use the OpenShift CLI (oc) or the OpenShift Container Platform web console to deploy a log store on your OpenShift Container Platform cluster.

Note

The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator.

10.2.1. Deploying a Loki log store

You can use the Loki Operator to deploy an internal Loki log store on your OpenShift Container Platform cluster. After install the Loki Operator, you must configure Loki object storage by creating a secret, and create a LokiStack custom resource (CR).

10.2.1.1. Deployment Sizing

Sizing for Loki follows the format of N<x>.<size> where the value <N> is number of instances and <size> specifies performance capabilities.

Note

1x.extra-small is for demo purposes only, and is not supported.

Table 10.1. Loki Sizing

 1x.extra-small1x.small1x.medium

Data transfer

Demo use only.

500GB/day

2TB/day

Queries per second (QPS)

Demo use only.

25-50 QPS at 200ms

25-75 QPS at 200ms

Replication factor

None

2

3

Total CPU requests

5 vCPUs

36 vCPUs

54 vCPUs

Total Memory requests

7.5Gi

63Gi

139Gi

Total Disk requests

150Gi

300Gi

450Gi

10.2.1.1.1. Supported API Custom Resource Definitions

LokiStack development is ongoing, not all APIs are supported currently supported.

CustomResourceDefinition (CRD)ApiVersionSupport state

LokiStack

lokistack.loki.grafana.com/v1

Supported in 5.5

RulerConfig

rulerconfig.loki.grafana/v1beta1

Technology Preview

AlertingRule

alertingrule.loki.grafana/v1beta1

Technology Preview

RecordingRule

recordingrule.loki.grafana/v1beta1

Technology Preview

Important

Usage of RulerConfig, AlertingRule and RecordingRule custom resource definitions (CRDs). is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

10.2.1.2. Installing the Loki Operator by using the OpenShift Container Platform web console

To install and configure logging on your OpenShift Container Platform cluster, additional Operators must be installed. This can be done from the Operator Hub within the web console.

OpenShift Container Platform Operators use custom resources (CR) to manage applications and their components. High-level configuration and settings are provided by the user within a CR. The Operator translates high-level directives into low-level actions, based on best practices embedded within the Operator’s logic. A custom resource definition (CRD) defines a CR and lists all the configurations available to users of the Operator. Installing an Operator creates the CRDs, which are then used to generate CRs.

Prerequisites

  • You have access to a supported object store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation).
  • You have administrator permissions.
  • You have access to the OpenShift Container Platform web console.

Procedure

  1. In the OpenShift Container Platform web console Administrator perspective, go to OperatorsOperatorHub.
  2. Type Loki Operator in the Filter by keyword field. Click Loki Operator in the list of available Operators, and then click Install.

    Important

    The Community Loki Operator is not supported by Red Hat.

  3. Select stable or stable-x.y as the Update channel.

    Note

    The stable channel only provides updates to the most recent release of logging. To continue receiving updates for prior releases, you must change your subscription channel to stable-x.y, where x.y represents the major and minor version of logging you have installed. For example, stable-5.7.

    The Loki Operator must be deployed to the global operator group namespace openshift-operators-redhat, so the Installation mode and Installed Namespace are already selected. If this namespace does not already exist, it is created for you.

  4. Select Enable operator-recommended cluster monitoring on this namespace.

    This option sets the openshift.io/cluster-monitoring: "true" label in the Namespace object. You must select this option to ensure that cluster monitoring scrapes the openshift-operators-redhat namespace.

  5. For Update approval select Automatic, then click Install.

    If the approval strategy in the subscription is set to Automatic, the update process initiates as soon as a new Operator version is available in the selected channel. If the approval strategy is set to Manual, you must manually approve pending updates.

Verification

  1. Go to OperatorsInstalled Operators.
  2. Make sure the openshift-logging project is selected.
  3. In the Status column, verify that you see green checkmarks with InstallSucceeded and the text Up to date.
Note

An Operator might display a Failed status before the installation finishes. If the Operator install completes with an InstallSucceeded message, refresh the page.

10.2.1.3. Creating a secret for Loki object storage by using the web console

To configure Loki object storage, you must create a secret. You can create a secret by using the OpenShift Container Platform web console.

Prerequisites

  • You have administrator permissions.
  • You have access to the OpenShift Container Platform web console.
  • You installed the Loki Operator.

Procedure

  1. Go to WorkloadsSecrets in the Administrator perspective of the OpenShift Container Platform web console.
  2. From the Create drop-down list, select From YAML.
  3. Create a secret that uses the access_key_id and access_key_secret fields to specify your credentials and the bucketnames, endpoint, and region fields to define the object storage location. AWS is used in the following example:

    Example Secret object

    apiVersion: v1
    kind: Secret
    metadata:
      name: logging-loki-s3
      namespace: openshift-logging
    stringData:
      access_key_id: AKIAIOSFODNN7EXAMPLE
      access_key_secret: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
      bucketnames: s3-bucket-name
      endpoint: https://s3.eu-central-1.amazonaws.com
      region: eu-central-1

Additional resources

10.2.1.4. Creating a LokiStack custom resource by using the web console

You can create a LokiStack custom resource (CR) by using the OpenShift Container Platform web console.

Prerequisites

  • You have administrator permissions.
  • You have access to the OpenShift Container Platform web console.
  • You installed the Loki Operator.

Procedure

  1. Go to the OperatorsInstalled Operators page. Click the All instances tab.
  2. From the Create new drop-down list, select LokiStack.
  3. Select YAML view, and then use the following template to create a LokiStack CR:

    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki 1
      namespace: openshift-logging
    spec:
      size: 1x.small 2
      storage:
        schemas:
        - version: v12
          effectiveDate: '2022-06-01'
        secret:
          name: logging-loki-s3 3
          type: s3 4
      storageClassName: <storage_class_name> 5
      tenants:
        mode: openshift-logging
    1
    Use the name logging-loki.
    2
    Select your Loki deployment size.
    3
    Specify the secret used for your log storage.
    4
    Specify the corresponding storage type.
    5
    Enter the name of a storage class for temporary storage. For best performance, specify a storage class that allocates block storage. Available storage classes for your cluster can be listed by using the oc get storageclasses command.

10.2.1.5. Installing Loki Operator by using the CLI

To install and configure logging on your OpenShift Container Platform cluster, additional Operators must be installed. This can be done from the OpenShift Container Platform CLI.

OpenShift Container Platform Operators use custom resources (CR) to manage applications and their components. High-level configuration and settings are provided by the user within a CR. The Operator translates high-level directives into low-level actions, based on best practices embedded within the Operator’s logic. A custom resource definition (CRD) defines a CR and lists all the configurations available to users of the Operator. Installing an Operator creates the CRDs, which are then used to generate CRs.

Prerequisites

  • You have administrator permissions.
  • You installed the OpenShift CLI (oc).
  • You have access to a supported object store. For example: AWS S3, Google Cloud Storage, Azure, Swift, Minio, or OpenShift Data Foundation.

Procedure

  1. Create a Subscription object:

    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: loki-operator
      namespace: openshift-operators-redhat 1
    spec:
      charsion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: loki-operator
      namespace: openshift-operators-redhat 2
    spec:
      channel: stable 3
      name: loki-operator
      source: redhat-operators 4
      sourceNamespace: openshift-marketplace
    1 2
    You must specify the openshift-operators-redhat namespace.
    3
    Specify stable, or stable-5.<y> as the channel.
    4
    Specify redhat-operators. If your OpenShift Container Platform cluster is installed on a restricted network, also known as a disconnected cluster, specify the name of the CatalogSource object you created when you configured the Operator Lifecycle Manager (OLM).
  2. Apply the Subscription object:

    $ oc apply -f <filename>.yaml

10.2.1.6. Creating a secret for Loki object storage by using the CLI

To configure Loki object storage, you must create a secret. You can do this by using the OpenShift CLI (oc).

Prerequisites

  • You have administrator permissions.
  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).

Procedure

  • Create a secret in the directory that contains your certificate and key files by running the following command:

    $ oc create secret generic -n openshift-logging <your_secret_name> \
     --from-file=tls.key=<your_key_file>
     --from-file=tls.crt=<your_crt_file>
     --from-file=ca-bundle.crt=<your_bundle_file>
     --from-literal=username=<your_username>
     --from-literal=password=<your_password>
Note

Use generic or opaque secrets for best results.

Verification

  • Verify that a secret was created by running the following command:

    $ oc get secrets

Additional resources

10.2.1.7. Creating a LokiStack custom resource by using the CLI

You can create a LokiStack custom resource (CR) by using the OpenShift CLI (oc).

Prerequisites

  • You have administrator permissions.
  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).

Procedure

  1. Create a LokiStack CR:

    Example LokiStack CR

    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki
      namespace: openshift-logging
    spec:
      size: 1x.small 1
      storage:
        schemas:
        - version: v12
          effectiveDate: "2022-06-01"
        secret:
          name: logging-loki-s3 2
          type: s3 3
      storageClassName: <storage_class_name> 4
      tenants:
        mode: openshift-logging

    1
    Supported size options for production instances of Loki are 1x.small and 1x.medium.
    2
    Enter the name of your log store secret.
    3
    Enter the type of your log store secret.
    4
    Enter the name of a storage class for temporary storage. For best performance, specify a storage class that allocates block storage. Available storage classes for your cluster can be listed by using oc get storageclasses.
  2. Apply the LokiStack CR:

    $ oc apply -f <filename>.yaml

Verification

  • Verify the installation by listing the pods in the openshift-logging project by running the following command and observing the output:

    $ oc get pods -n openshift-logging

    Confirm that you see several pods for components of the logging, similar to the following list:

    Example output

    NAME                                           READY   STATUS    RESTARTS   AGE
    cluster-logging-operator-78fddc697-mnl82       1/1     Running   0          14m
    collector-6cglq                                2/2     Running   0          45s
    collector-8r664                                2/2     Running   0          45s
    collector-8z7px                                2/2     Running   0          45s
    collector-pdxl9                                2/2     Running   0          45s
    collector-tc9dx                                2/2     Running   0          45s
    collector-xkd76                                2/2     Running   0          45s
    logging-loki-compactor-0                       1/1     Running   0          8m2s
    logging-loki-distributor-b85b7d9fd-25j9g       1/1     Running   0          8m2s
    logging-loki-distributor-b85b7d9fd-xwjs6       1/1     Running   0          8m2s
    logging-loki-gateway-7bb86fd855-hjhl4          2/2     Running   0          8m2s
    logging-loki-gateway-7bb86fd855-qjtlb          2/2     Running   0          8m2s
    logging-loki-index-gateway-0                   1/1     Running   0          8m2s
    logging-loki-index-gateway-1                   1/1     Running   0          7m29s
    logging-loki-ingester-0                        1/1     Running   0          8m2s
    logging-loki-ingester-1                        1/1     Running   0          6m46s
    logging-loki-querier-f5cf9cb87-9fdjd           1/1     Running   0          8m2s
    logging-loki-querier-f5cf9cb87-fp9v5           1/1     Running   0          8m2s
    logging-loki-query-frontend-58c579fcb7-lfvbc   1/1     Running   0          8m2s
    logging-loki-query-frontend-58c579fcb7-tjf9k   1/1     Running   0          8m2s
    logging-view-plugin-79448d8df6-ckgmx           1/1     Running   0          46s

10.2.2. Loki object storage

The Loki Operator supports AWS S3, as well as other S3 compatible object stores such as Minio and OpenShift Data Foundation. Azure, GCS, and Swift are also supported.

The recommended nomenclature for Loki storage is logging-loki-<your_storage_provider>.

The following table shows the type values within the LokiStack custom resource (CR) for each storage provider. For more information, see the section on your storage provider.

Table 10.2. Secret type quick reference

Storage providerSecret type value

AWS

s3

Azure

azure

Google Cloud

gcs

Minio

s3

OpenShift Data Foundation

s3

Swift

swift

10.2.2.1. AWS storage

Prerequisites

Procedure

  • Create an object storage secret with the name logging-loki-aws by running the following command:

    $ oc create secret generic logging-loki-aws \
      --from-literal=bucketnames="<bucket_name>" \
      --from-literal=endpoint="<aws_bucket_endpoint>" \
      --from-literal=access_key_id="<aws_access_key_id>" \
      --from-literal=access_key_secret="<aws_access_key_secret>" \
      --from-literal=region="<aws_region_of_your_bucket>"

10.2.2.2. Azure storage

Prerequisites

  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).
  • You created a bucket on Azure.

Procedure

  • Create an object storage secret with the name logging-loki-azure by running the following command:

    $ oc create secret generic logging-loki-azure \
      --from-literal=container="<azure_container_name>" \
      --from-literal=environment="<azure_environment>" \ 1
      --from-literal=account_name="<azure_account_name>" \
      --from-literal=account_key="<azure_account_key>"
    1
    Supported environment values are AzureGlobal, AzureChinaCloud, AzureGermanCloud, or AzureUSGovernment.

10.2.2.3. Google Cloud Platform storage

Prerequisites

  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).
  • You created a project on Google Cloud Platform (GCP).
  • You created a bucket in the same project.
  • You created a service account in the same project for GCP authentication.

Procedure

  1. Copy the service account credentials received from GCP into a file called key.json.
  2. Create an object storage secret with the name logging-loki-gcs by running the following command:

    $ oc create secret generic logging-loki-gcs \
      --from-literal=bucketname="<bucket_name>" \
      --from-file=key.json="<path/to/key.json>"

10.2.2.4. Minio storage

Prerequisites

  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).
  • You have Minio deployed on your cluster.
  • You created a bucket on Minio.

Procedure

  • Create an object storage secret with the name logging-loki-minio by running the following command:

    $ oc create secret generic logging-loki-minio \
      --from-literal=bucketnames="<bucket_name>" \
      --from-literal=endpoint="<minio_bucket_endpoint>" \
      --from-literal=access_key_id="<minio_access_key_id>" \
      --from-literal=access_key_secret="<minio_access_key_secret>"

10.2.2.5. OpenShift Data Foundation storage

Prerequisites

Procedure

  1. Create an ObjectBucketClaim custom resource in the openshift-logging namespace:

    apiVersion: objectbucket.io/v1alpha1
    kind: ObjectBucketClaim
    metadata:
      name: loki-bucket-odf
      namespace: openshift-logging
    spec:
      generateBucketName: loki-bucket-odf
      storageClassName: openshift-storage.noobaa.io
  2. Get bucket properties from the associated ConfigMap object by running the following command:

    BUCKET_HOST=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_HOST}')
    BUCKET_NAME=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_NAME}')
    BUCKET_PORT=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_PORT}')
  3. Get bucket access key from the associated secret by running the following command:

    ACCESS_KEY_ID=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d)
    SECRET_ACCESS_KEY=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)
  4. Create an object storage secret with the name logging-loki-odf by running the following command:

    $ oc create -n openshift-logging secret generic logging-loki-odf \
    --from-literal=access_key_id="<access_key_id>" \
    --from-literal=access_key_secret="<secret_access_key>" \
    --from-literal=bucketnames="<bucket_name>" \
    --from-literal=endpoint="https://<bucket_host>:<bucket_port>"

10.2.2.6. Swift storage

Prerequisites

  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).
  • You created a bucket on Swift.

Procedure

  • Create an object storage secret with the name logging-loki-swift by running the following command:

    $ oc create secret generic logging-loki-swift \
      --from-literal=auth_url="<swift_auth_url>" \
      --from-literal=username="<swift_usernameclaim>" \
      --from-literal=user_domain_name="<swift_user_domain_name>" \
      --from-literal=user_domain_id="<swift_user_domain_id>" \
      --from-literal=user_id="<swift_user_id>" \
      --from-literal=password="<swift_password>" \
      --from-literal=domain_id="<swift_domain_id>" \
      --from-literal=domain_name="<swift_domain_name>" \
      --from-literal=container_name="<swift_container_name>"
  • You can optionally provide project-specific data, region, or both by running the following command:

    $ oc create secret generic logging-loki-swift \
      --from-literal=auth_url="<swift_auth_url>" \
      --from-literal=username="<swift_usernameclaim>" \
      --from-literal=user_domain_name="<swift_user_domain_name>" \
      --from-literal=user_domain_id="<swift_user_domain_id>" \
      --from-literal=user_id="<swift_user_id>" \
      --from-literal=password="<swift_password>" \
      --from-literal=domain_id="<swift_domain_id>" \
      --from-literal=domain_name="<swift_domain_name>" \
      --from-literal=container_name="<swift_container_name>" \
      --from-literal=project_id="<swift_project_id>" \
      --from-literal=project_name="<swift_project_name>" \
      --from-literal=project_domain_id="<swift_project_domain_id>" \
      --from-literal=project_domain_name="<swift_project_domain_name>" \
      --from-literal=region="<swift_region>"

10.2.3. Deploying an Elasticsearch log store

You can use the OpenShift Elasticsearch Operator to deploy an internal Elasticsearch log store on your OpenShift Container Platform cluster.

Note

The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator.

10.2.3.1. Storage considerations for Elasticsearch

A persistent volume is required for each Elasticsearch deployment configuration. On OpenShift Container Platform this is achieved using persistent volume claims (PVCs).

Note

If you use a local volume for persistent storage, do not use a raw block volume, which is described with volumeMode: block in the LocalVolume object. Elasticsearch cannot use raw block volumes.

The OpenShift Elasticsearch Operator names the PVCs using the Elasticsearch resource name.

Fluentd ships any logs from systemd journal and /var/log/containers/*.log to Elasticsearch.

Elasticsearch requires sufficient memory to perform large merge operations. If it does not have enough memory, it becomes unresponsive. To avoid this problem, evaluate how much application log data you need, and allocate approximately double that amount of free storage capacity.

By default, when storage capacity is 85% full, Elasticsearch stops allocating new data to the node. At 90%, Elasticsearch attempts to relocate existing shards from that node to other nodes if possible. But if no nodes have a free capacity below 85%, Elasticsearch effectively rejects creating new indices and becomes RED.

Note

These low and high watermark values are Elasticsearch defaults in the current release. You can modify these default values. Although the alerts use the same default values, you cannot change these values in the alerts.

10.2.3.2. Installing the OpenShift Elasticsearch Operator by using the web console

The OpenShift Elasticsearch Operator creates and manages the Elasticsearch cluster used by OpenShift Logging.

Prerequisites

  • Elasticsearch is a memory-intensive application. Each Elasticsearch node needs at least 16GB of memory for both memory requests and limits, unless you specify otherwise in the ClusterLogging custom resource.

    The initial set of OpenShift Container Platform nodes might not be large enough to support the Elasticsearch cluster. You must add additional nodes to the OpenShift Container Platform cluster to run with the recommended or higher memory, up to a maximum of 64GB for each Elasticsearch node.

    Elasticsearch nodes can operate with a lower memory setting, though this is not recommended for production environments.

  • Ensure that you have the necessary persistent storage for Elasticsearch. Note that each Elasticsearch node requires its own storage volume.

    Note

    If you use a local volume for persistent storage, do not use a raw block volume, which is described with volumeMode: block in the LocalVolume object. Elasticsearch cannot use raw block volumes.

Procedure

  1. In the OpenShift Container Platform web console, click OperatorsOperatorHub.
  2. Click OpenShift Elasticsearch Operator from the list of available Operators, and click Install.
  3. Ensure that the All namespaces on the cluster is selected under Installation mode.
  4. Ensure that openshift-operators-redhat is selected under Installed Namespace.

    You must specify the openshift-operators-redhat namespace. The openshift-operators namespace might contain Community Operators, which are untrusted and could publish a metric with the same name as OpenShift Container Platform metric, which would cause conflicts.

  5. Select Enable operator recommended cluster monitoring on this namespace.

    This option sets the openshift.io/cluster-monitoring: "true" label in the Namespace object. You must select this option to ensure that cluster monitoring scrapes the openshift-operators-redhat namespace.

  6. Select stable-5.x as the Update channel.
  7. Select an Update approval strategy:

    • The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
    • The Manual strategy requires a user with appropriate credentials to approve the Operator update.
  8. Click Install.

Verification

  1. Verify that the OpenShift Elasticsearch Operator installed by switching to the OperatorsInstalled Operators page.
  2. Ensure that OpenShift Elasticsearch Operator is listed in all projects with a Status of Succeeded.

10.2.3.3. Installing the OpenShift Elasticsearch Operator by using the CLI

You can use the OpenShift CLI (oc) to install the OpenShift Elasticsearch Operator.

Prerequisites

  • Ensure that you have the necessary persistent storage for Elasticsearch. Note that each Elasticsearch node requires its own storage volume.

    Note

    If you use a local volume for persistent storage, do not use a raw block volume, which is described with volumeMode: block in the LocalVolume object. Elasticsearch cannot use raw block volumes.

    Elasticsearch is a memory-intensive application. By default, OpenShift Container Platform installs three Elasticsearch nodes with memory requests and limits of 16 GB. This initial set of three OpenShift Container Platform nodes might not have enough memory to run Elasticsearch within your cluster. If you experience memory issues that are related to Elasticsearch, add more Elasticsearch nodes to your cluster rather than increasing the memory on existing nodes.

  • You have administrator permissions.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Create a Namespace object as a YAML file:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: openshift-operators-redhat 1
      annotations:
        openshift.io/node-selector: ""
      labels:
        openshift.io/cluster-monitoring: "true" 2
    1
    You must specify the openshift-operators-redhat namespace. To prevent possible conflicts with metrics, configure the Prometheus Cluster Monitoring stack to scrape metrics from the openshift-operators-redhat namespace and not the openshift-operators namespace. The openshift-operators namespace might contain community Operators, which are untrusted and could publish a metric with the same name as metric, which would cause conflicts.
    2
    String. You must specify this label as shown to ensure that cluster monitoring scrapes the openshift-operators-redhat namespace.
  2. Apply the Namespace object by running the following command:

    $ oc apply -f <filename>.yaml
  3. Create an OperatorGroup object as a YAML file:

    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: openshift-operators-redhat
      namespace: openshift-operators-redhat 1
    spec: {}
    1
    You must specify the openshift-operators-redhat namespace.
  4. Apply the OperatorGroup object by running the following command:

    $ oc apply -f <filename>.yaml
  5. Create a Subscription object to subscribe the namespace to the OpenShift Elasticsearch Operator:

    Example Subscription

    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: elasticsearch-operator
      namespace: openshift-operators-redhat 1
    spec:
      channel: stable-x.y 2
      installPlanApproval: Automatic 3
      source: redhat-operators 4
      sourceNamespace: openshift-marketplace
      name: elasticsearch-operator

    1
    You must specify the openshift-operators-redhat namespace.
    2
    Specify stable, or stable-x.y as the channel. See the following note.
    3
    Automatic allows the Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available. Manual requires a user with appropriate credentials to approve the Operator update.
    4
    Specify redhat-operators. If your OpenShift Container Platform cluster is installed on a restricted network, also known as a disconnected cluster, specify the name of the CatalogSource object created when you configured the Operator Lifecycle Manager (OLM).
    Note

    Specifying stable installs the current version of the latest stable release. Using stable with installPlanApproval: "Automatic" automatically upgrades your Operators to the latest stable major and minor release.

    Specifying stable-x.y installs the current minor version of a specific major release. Using stable-x.y with installPlanApproval: "Automatic" automatically upgrades your Operators to the latest stable minor release within the major release.

  6. Apply the subscription by running the following command:

    $ oc apply -f <filename>.yaml

    The OpenShift Elasticsearch Operator is installed to the openshift-operators-redhat namespace and copied to each project in the cluster.

Verification

  1. Run the following command:

    $ oc get csv -n --all-namespaces
  2. Observe the output and confirm that pods for the OpenShift Elasticsearch Operator exist in each namespace

    Example output

    NAMESPACE                                          NAME                            DISPLAY                            VERSION          REPLACES                        PHASE
    default                                            elasticsearch-operator.v5.7.1   OpenShift Elasticsearch Operator   5.7.1            elasticsearch-operator.v5.7.0   Succeeded
    kube-node-lease                                    elasticsearch-operator.v5.7.1   OpenShift Elasticsearch Operator   5.7.1            elasticsearch-operator.v5.7.0   Succeeded
    kube-public                                        elasticsearch-operator.v5.7.1   OpenShift Elasticsearch Operator   5.7.1            elasticsearch-operator.v5.7.0   Succeeded
    kube-system                                        elasticsearch-operator.v5.7.1   OpenShift Elasticsearch Operator   5.7.1            elasticsearch-operator.v5.7.0   Succeeded
    non-destructive-test                               elasticsearch-operator.v5.7.1   OpenShift Elasticsearch Operator   5.7.1            elasticsearch-operator.v5.7.0   Succeeded
    openshift-apiserver-operator                       elasticsearch-operator.v5.7.1   OpenShift Elasticsearch Operator   5.7.1            elasticsearch-operator.v5.7.0   Succeeded
    openshift-apiserver                                elasticsearch-operator.v5.7.1   OpenShift Elasticsearch Operator   5.7.1            elasticsearch-operator.v5.7.0   Succeeded
    ...

10.2.4. Configuring log storage

You can configure which log storage type your logging uses by modifying the ClusterLogging custom resource (CR).

Prerequisites

  • You have administrator permissions.
  • You have installed the OpenShift CLI (oc).
  • You have installed the Red Hat OpenShift Logging Operator and an internal log store that is either the LokiStack or Elasticsearch.
  • You have created a ClusterLogging CR.
Note

The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator.

Procedure

  1. Modify the ClusterLogging CR logStore spec:

    ClusterLogging CR example

    apiVersion: logging.openshift.io/v1
    kind: ClusterLogging
    metadata:
    # ...
    spec:
    # ...
      logStore:
        type: <log_store_type> 1
        elasticsearch: 2
          nodeCount: <integer>
          resources: {}
          storage: {}
          redundancyPolicy: <redundancy_type> 3
        lokistack: 4
          name: {}
    # ...

    1
    Specify the log store type. This can be either lokistack or elasticsearch.
    2
    Optional configuration options for the Elasticsearch log store.
    3
    Specify the redundancy type. This value can be ZeroRedundancy, SingleRedundancy, MultipleRedundancy, or FullRedundancy.
    4
    Optional configuration options for LokiStack.

    Example ClusterLogging CR to specify LokiStack as the log store

    apiVersion: logging.openshift.io/v1
    kind: ClusterLogging
    metadata:
      name: instance
      namespace: openshift-logging
    spec:
      managementState: Managed
      logStore:
        type: lokistack
        lokistack:
          name: logging-loki
    # ...

  2. Apply the ClusterLogging CR by running the following command:

    $ oc apply -f <filename>.yaml

10.3. Configuring the LokiStack log store

In logging documentation, LokiStack refers to the logging supported combination of Loki and web proxy with OpenShift Container Platform authentication integration. LokiStack’s proxy uses OpenShift Container Platform authentication to enforce multi-tenancy. Loki refers to the log store as either the individual component or an external store.

10.3.1. Creating a new group for the cluster-admin user role

Important

Querying application logs for multiple namespaces as a cluster-admin user, where the sum total of characters of all of the namespaces in the cluster is greater than 5120, results in the error Parse error: input size too long (XXXX > 5120). For better control over access to logs in LokiStack, make the cluster-admin user a member of the cluster-admin group. If the cluster-admin group does not exist, create it and add the desired users to it.

Use the following procedure to create a new group for users with cluster-admin permissions.

Procedure

  1. Enter the following command to create a new group:

    $ oc adm groups new cluster-admin
  2. Enter the following command to add the desired user to the cluster-admin group:

    $ oc adm groups add-users cluster-admin <username>
  3. Enter the following command to add cluster-admin user role to the group:

    $ oc adm policy add-cluster-role-to-group cluster-admin cluster-admin

10.3.2. Enabling stream-based retention with Loki

With Logging version 5.6 and higher, you can configure retention policies based on log streams. Rules for these may be set globally, per tenant, or both. If you configure both, tenant rules apply before global rules.

  1. To enable stream-based retention, create a LokiStack custom resource (CR):

    Example global stream-based retention

    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki
      namespace: openshift-logging
    spec:
      limits:
        global: 1
          retention: 2
            days: 20
            streams:
            - days: 4
              priority: 1
              selector: '{kubernetes_namespace_name=~"test.+"}' 3
            - days: 1
              priority: 1
              selector: '{log_type="infrastructure"}'
      managementState: Managed
      replicationFactor: 1
      size: 1x.small
      storage:
        schemas:
        - effectiveDate: "2020-10-11"
          version: v11
        secret:
          name: logging-loki-s3
          type: aws
      storageClassName: standard
      tenants:
        mode: openshift-logging

    1
    Sets retention policy for all log streams. Note: This field does not impact the retention period for stored logs in object storage.
    2
    Retention is enabled in the cluster when this block is added to the CR.
    3
    Contains the LogQL query used to define the log stream.

    Example per-tenant stream-based retention

    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki
      namespace: openshift-logging
    spec:
      limits:
        global:
          retention:
            days: 20
        tenants: 1
          application:
            retention:
              days: 1
              streams:
                - days: 4
                  selector: '{kubernetes_namespace_name=~"test.+"}' 2
          infrastructure:
            retention:
              days: 5
              streams:
                - days: 1
                  selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
      managementState: Managed
      replicationFactor: 1
      size: 1x.small
      storage:
        schemas:
        - effectiveDate: "2020-10-11"
          version: v11
        secret:
          name: logging-loki-s3
          type: aws
      storageClassName: standard
      tenants:
        mode: openshift-logging

    1
    Sets retention policy by tenant. Valid tenant types are application, audit, and infrastructure.
    2
    Contains the LogQL query used to define the log stream.
  2. Apply the LokiStack CR:

    $ oc apply -f <filename>.yaml
Note

This is not for managing the retention for stored logs. Global retention periods for stored logs to a supported maximum of 30 days is configured with your object storage.

10.3.3. Troubleshooting Loki rate limit errors

If the Log Forwarder API forwards a large block of messages that exceeds the rate limit to Loki, Loki generates rate limit (429) errors.

These errors can occur during normal operation. For example, when adding the logging to a cluster that already has some logs, rate limit errors might occur while the logging tries to ingest all of the existing log entries. In this case, if the rate of addition of new logs is less than the total rate limit, the historical data is eventually ingested, and the rate limit errors are resolved without requiring user intervention.

In cases where the rate limit errors continue to occur, you can fix the issue by modifying the LokiStack custom resource (CR).

Important

The LokiStack CR is not available on Grafana-hosted Loki. This topic does not apply to Grafana-hosted Loki servers.

Conditions

  • The Log Forwarder API is configured to forward logs to Loki.
  • Your system sends a block of messages that is larger than 2 MB to Loki. For example:

    "values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
    \"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}
  • After you enter oc logs -n openshift-logging -l component=collector, the collector logs in your cluster show a line containing one of the following error messages:

    429 Too Many Requests Ingestion rate limit exceeded

    Example Vector error message

    2023-08-25T16:08:49.301780Z  WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true

    Example Fluentd error message

    2023-08-30 14:52:15 +0000 [warn]: [default_loki_infra] failed to flush the buffer. retry_times=2 next_retry_time=2023-08-30 14:52:19 +0000 chunk="604251225bf5378ed1567231a1c03b8b" error_class=Fluent::Plugin::LokiOutput::LogPostError error="429 Too Many Requests Ingestion rate limit exceeded for user infrastructure (limit: 4194304 bytes/sec) while attempting to ingest '4082' lines totaling '7820025' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased\n"

    The error is also visible on the receiving end. For example, in the LokiStack ingester pod:

    Example Loki ingester error message

    level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream

Procedure

  • Update the ingestionBurstSize and ingestionRate fields in the LokiStack CR:

    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki
      namespace: openshift-logging
    spec:
      limits:
        global:
          ingestion:
            ingestionBurstSize: 16 1
            ingestionRate: 8 2
    # ...
    1
    The ingestionBurstSize field defines the maximum local rate-limited sample size per distributor replica in MB. This value is a hard limit. Set this value to at least the maximum logs size expected in a single push request. Single requests that are larger than the ingestionBurstSize value are not permitted.
    2
    The ingestionRate field is a soft limit on the maximum amount of ingested samples per second in MB. Rate limit errors occur if the rate of logs exceeds the limit, but the collector retries sending the logs. As long as the total average is lower than the limit, the system recovers and errors are resolved without user intervention.

10.3.4. Additional Resources

10.4. Configuring the Elasticsearch log store

You can use Elasticsearch 6 to store and organize log data.

You can make modifications to your log store, including:

  • Storage for your Elasticsearch cluster
  • Shard replication across data nodes in the cluster, from full replication to no replication
  • External access to Elasticsearch data

10.4.1. Configuring log storage

You can configure which log storage type your logging uses by modifying the ClusterLogging custom resource (CR).

Prerequisites

  • You have administrator permissions.
  • You have installed the OpenShift CLI (oc).
  • You have installed the Red Hat OpenShift Logging Operator and an internal log store that is either the LokiStack or Elasticsearch.
  • You have created a ClusterLogging CR.
Note

The OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator.

Procedure

  1. Modify the ClusterLogging CR logStore spec:

    ClusterLogging CR example

    apiVersion: logging.openshift.io/v1
    kind: ClusterLogging
    metadata:
    # ...
    spec:
    # ...
      logStore:
        type: <log_store_type> 1
        elasticsearch: 2
          nodeCount: <integer>
          resources: {}
          storage: {}
          redundancyPolicy: <redundancy_type> 3
        lokistack: 4
          name: {}
    # ...

    1
    Specify the log store type. This can be either lokistack or elasticsearch.
    2
    Optional configuration options for the Elasticsearch log store.
    3
    Specify the redundancy type. This value can be ZeroRedundancy, SingleRedundancy, MultipleRedundancy, or FullRedundancy.
    4
    Optional configuration options for LokiStack.

    Example ClusterLogging CR to specify LokiStack as the log store

    apiVersion: logging.openshift.io/v1
    kind: ClusterLogging
    metadata:
      name: instance
      namespace: openshift-logging
    spec:
      managementState: Managed
      logStore:
        type: lokistack
        lokistack:
          name: logging-loki
    # ...

  2. Apply the ClusterLogging CR by running the following command:

    $ oc apply -f <filename>.yaml

10.4.2. Forwarding audit logs to the log store

By default, OpenShift Logging does not store audit logs in the internal OpenShift Container Platform Elasticsearch log store. You can send audit logs to this log store so, for example, you can view them in Kibana.

To send the audit logs to the default internal Elasticsearch log store, for example to view the audit logs in Kibana, you must use the Log Forwarding API.

Important

The internal OpenShift Container Platform Elasticsearch log store does not provide secure storage for audit logs. Verify that the system to which you forward audit logs complies with your organizational and governmental regulations and is properly secured. Logging does not comply with those regulations.

Procedure

To use the Log Forward API to forward audit logs to the internal Elasticsearch instance:

  1. Create or edit a YAML file that defines the ClusterLogForwarder CR object:

    • Create a CR to send all log types to the internal Elasticsearch instance. You can use the following example without making any changes:

      apiVersion: logging.openshift.io/v1
      kind: ClusterLogForwarder
      metadata:
        name: instance
        namespace: openshift-logging
      spec:
        pipelines: 1
        - name: all-to-default
          inputRefs:
          - infrastructure
          - application
          - audit
          outputRefs:
          - default
      1
      A pipeline defines the type of logs to forward using the specified output. The default output forwards logs to the internal Elasticsearch instance.
      Note

      You must specify all three types of logs in the pipeline: application, infrastructure, and audit. If you do not specify a log type, those logs are not stored and will be lost.

    • If you have an existing ClusterLogForwarder CR, add a pipeline to the default output for the audit logs. You do not need to define the default output. For example:

      apiVersion: "logging.openshift.io/v1"
      kind: ClusterLogForwarder
      metadata:
        name: instance
        namespace: openshift-logging
      spec:
        outputs:
         - name: elasticsearch-insecure
           type: "elasticsearch"
           url: http://elasticsearch-insecure.messaging.svc.cluster.local
           insecure: true
         - name: elasticsearch-secure
           type: "elasticsearch"
           url: https://elasticsearch-secure.messaging.svc.cluster.local
           secret:
             name: es-audit
         - name: secureforward-offcluster
           type: "fluentdForward"
           url: https://secureforward.offcluster.com:24224
           secret:
             name: secureforward
        pipelines:
         - name: container-logs
           inputRefs:
           - application
           outputRefs:
           - secureforward-offcluster
         - name: infra-logs
           inputRefs:
           - infrastructure
           outputRefs:
           - elasticsearch-insecure
         - name: audit-logs
           inputRefs:
           - audit
           outputRefs:
           - elasticsearch-secure
           - default 1
      1
      This pipeline sends the audit logs to the internal Elasticsearch instance in addition to an external instance.

10.4.3. Configuring log retention time

You can configure a retention policy that specifies how long the default Elasticsearch log store keeps indices for each of the three log sources: infrastructure logs, application logs, and audit logs.

To configure the retention policy, you set a maxAge parameter for each log source in the ClusterLogging custom resource (CR). The CR applies these values to the Elasticsearch rollover schedule, which determines when Elasticsearch deletes the rolled-over indices.

Elasticsearch rolls over an index, moving the current index and creating a new index, when an index matches any of the following conditions:

  • The index is older than the rollover.maxAge value in the Elasticsearch CR.
  • The index size is greater than 40 GB × the number of primary shards.
  • The index doc count is greater than 40960 KB × the number of primary shards.

Elasticsearch deletes the rolled-over indices based on the retention policy you configure. If you do not create a retention policy for any log sources, logs are deleted after seven days by default.

Prerequisites

  • The Red Hat OpenShift Logging Operator and the OpenShift Elasticsearch Operator must be installed.

Procedure

To configure the log retention time:

  1. Edit the ClusterLogging CR to add or modify the retentionPolicy parameter:

    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    ...
    spec:
      managementState: "Managed"
      logStore:
        type: "elasticsearch"
        retentionPolicy: 1
          application:
            maxAge: 1d
          infra:
            maxAge: 7d
          audit:
            maxAge: 7d
        elasticsearch:
          nodeCount: 3
    ...
    1
    Specify the time that Elasticsearch should retain each log source. Enter an integer and a time designation: weeks(w), hours(h/H), minutes(m) and seconds(s). For example, 1d for one day. Logs older than the maxAge are deleted. By default, logs are retained for seven days.
  2. You can verify the settings in the Elasticsearch custom resource (CR).

    For example, the Red Hat OpenShift Logging Operator updated the following Elasticsearch CR to configure a retention policy that includes settings to roll over active indices for the infrastructure logs every eight hours and the rolled-over indices are deleted seven days after rollover. OpenShift Container Platform checks every 15 minutes to determine if the indices need to be rolled over.

    apiVersion: "logging.openshift.io/v1"
    kind: "Elasticsearch"
    metadata:
      name: "elasticsearch"
    spec:
    ...
      indexManagement:
        policies: 1
          - name: infra-policy
            phases:
              delete:
                minAge: 7d 2
              hot:
                actions:
                  rollover:
                    maxAge: 8h 3
            pollInterval: 15m 4
    ...
    1
    For each log source, the retention policy indicates when to delete and roll over logs for that source.
    2
    When OpenShift Container Platform deletes the rolled-over indices. This setting is the maxAge you set in the ClusterLogging CR.
    3
    The index age for OpenShift Container Platform to consider when rolling over the indices. This value is determined from the maxAge you set in the ClusterLogging CR.
    4
    When OpenShift Container Platform checks if the indices should be rolled over. This setting is the default and cannot be changed.
    Note

    Modifying the Elasticsearch CR is not supported. All changes to the retention policies must be made in the ClusterLogging CR.

    The OpenShift Elasticsearch Operator deploys a cron job to roll over indices for each mapping using the defined policy, scheduled using the pollInterval.

    $ oc get cronjob

    Example output

    NAME                     SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
    elasticsearch-im-app     */15 * * * *   False     0        <none>          4s
    elasticsearch-im-audit   */15 * * * *   False     0        <none>          4s
    elasticsearch-im-infra   */15 * * * *   False     0        <none>          4s

10.4.4. Configuring CPU and memory requests for the log store

Each component specification allows for adjustments to both the CPU and memory requests. You should not have to manually adjust these values as the OpenShift Elasticsearch Operator sets values sufficient for your environment.

Note

In large-scale clusters, the default memory limit for the Elasticsearch proxy container might not be sufficient, causing the proxy container to be OOMKilled. If you experience this issue, increase the memory requests and limits for the Elasticsearch proxy.

Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments. For production use, you should have no less than the default 16Gi allocated to each pod. Preferably you should allocate as much as possible, up to 64Gi per pod.

Prerequisites

  • The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    ....
    spec:
        logStore:
          type: "elasticsearch"
          elasticsearch:1
            resources:
              limits: 2
                memory: "32Gi"
              requests: 3
                cpu: "1"
                memory: "16Gi"
            proxy: 4
              resources:
                limits:
                  memory: 100Mi
                requests:
                  memory: 100Mi
    1
    Specify the CPU and memory requests for Elasticsearch as needed. If you leave these values blank, the OpenShift Elasticsearch Operator sets default values that should be sufficient for most deployments. The default values are 16Gi for the memory request and 1 for the CPU request.
    2
    The maximum amount of resources a pod can use.
    3
    The minimum resources required to schedule a pod.
    4
    Specify the CPU and memory requests for the Elasticsearch proxy as needed. If you leave these values blank, the OpenShift Elasticsearch Operator sets default values that are sufficient for most deployments. The default values are 256Mi for the memory request and 100m for the CPU request.

When adjusting the amount of Elasticsearch memory, the same value should be used for both requests and limits.

For example:

      resources:
        limits: 1
          memory: "32Gi"
        requests: 2
          cpu: "8"
          memory: "32Gi"
1
The maximum amount of the resource.
2
The minimum amount required.

Kubernetes generally adheres the node configuration and does not allow Elasticsearch to use the specified limits. Setting the same value for the requests and limits ensures that Elasticsearch can use the memory you want, assuming the node has the memory available.

10.4.5. Configuring replication policy for the log store

You can define how Elasticsearch shards are replicated across data nodes in the cluster.

Prerequisites

  • The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit clusterlogging instance
    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    
    ....
    
    spec:
      logStore:
        type: "elasticsearch"
        elasticsearch:
          redundancyPolicy: "SingleRedundancy" 1
    1
    Specify a redundancy policy for the shards. The change is applied upon saving the changes.
    • FullRedundancy. Elasticsearch fully replicates the primary shards for each index to every data node. This provides the highest safety, but at the cost of the highest amount of disk required and the poorest performance.
    • MultipleRedundancy. Elasticsearch fully replicates the primary shards for each index to half of the data nodes. This provides a good tradeoff between safety and performance.
    • SingleRedundancy. Elasticsearch makes one copy of the primary shards for each index. Logs are always available and recoverable as long as at least two data nodes exist. Better performance than MultipleRedundancy, when using 5 or more nodes. You cannot apply this policy on deployments of single Elasticsearch node.
    • ZeroRedundancy. Elasticsearch does not make copies of the primary shards. Logs might be unavailable or lost in the event a node is down or fails. Use this mode when you are more concerned with performance than safety, or have implemented your own disk/PVC backup/restore strategy.
Note

The number of primary shards for the index templates is equal to the number of Elasticsearch data nodes.

10.4.6. Scaling down Elasticsearch pods

Reducing the number of Elasticsearch pods in your cluster can result in data loss or Elasticsearch performance degradation.

If you scale down, you should scale down by one pod at a time and allow the cluster to re-balance the shards and replicas. After the Elasticsearch health status returns to green, you can scale down by another pod.

Note

If your Elasticsearch cluster is set to ZeroRedundancy, you should not scale down your Elasticsearch pods.

10.4.7. Configuring persistent storage for the log store

Elasticsearch requires persistent storage. The faster the storage, the faster the Elasticsearch performance.

Warning

Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.

Prerequisites

  • The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.

Procedure

  1. Edit the ClusterLogging CR to specify that each data node in the cluster is bound to a Persistent Volume Claim.

    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
    # ...
    spec:
      logStore:
        type: "elasticsearch"
        elasticsearch:
          nodeCount: 3
          storage:
            storageClassName: "gp2"
            size: "200G"

This example specifies each data node in the cluster is bound to a Persistent Volume Claim that requests "200G" of AWS General Purpose SSD (gp2) storage.

Note

If you use a local volume for persistent storage, do not use a raw block volume, which is described with volumeMode: block in the LocalVolume object. Elasticsearch cannot use raw block volumes.

10.4.8. Configuring the log store for emptyDir storage

You can use emptyDir with your log store, which creates an ephemeral deployment in which all of a pod’s data is lost upon restart.

Note

When using emptyDir, if log storage is restarted or redeployed, you will lose data.

Prerequisites

  • The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.

Procedure

  1. Edit the ClusterLogging CR to specify emptyDir:

     spec:
        logStore:
          type: "elasticsearch"
          elasticsearch:
            nodeCount: 3
            storage: {}

10.4.9. Performing an Elasticsearch rolling cluster restart

Perform a rolling restart when you change the elasticsearch config map or any of the elasticsearch-* deployment configurations.

Also, a rolling restart is recommended if the nodes on which an Elasticsearch pod runs requires a reboot.

Prerequisites

  • The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.

Procedure

To perform a rolling cluster restart:

  1. Change to the openshift-logging project:

    $ oc project openshift-logging
  2. Get the names of the Elasticsearch pods:

    $ oc get pods -l component=elasticsearch
  3. Scale down the collector pods so they stop sending new logs to Elasticsearch:

    $ oc -n openshift-logging patch daemonset/collector -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-collector": "false"}}}}}'
  4. Perform a shard synced flush using the OpenShift Container Platform es_util tool to ensure there are no pending operations waiting to be written to disk prior to shutting down:

    $ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query="_flush/synced" -XPOST

    For example:

    $ oc exec -c elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6  -c elasticsearch -- es_util --query="_flush/synced" -XPOST

    Example output

    {"_shards":{"total":4,"successful":4,"failed":0},".security":{"total":2,"successful":2,"failed":0},".kibana_1":{"total":2,"successful":2,"failed":0}}

  5. Prevent shard balancing when purposely bringing down nodes using the OpenShift Container Platform es_util tool:

    $ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query="_cluster/settings" -XPUT -d '{ "persistent": { "cluster.routing.allocation.enable" : "primaries" } }'

    For example:

    $ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -c elasticsearch -- es_util --query="_cluster/settings" -XPUT -d '{ "persistent": { "cluster.routing.allocation.enable" : "primaries" } }'

    Example output

    {"acknowledged":true,"persistent":{"cluster":{"routing":{"allocation":{"enable":"primaries"}}}},"transient":

  6. After the command is complete, for each deployment you have for an ES cluster:

    1. By default, the OpenShift Container Platform Elasticsearch cluster blocks rollouts to their nodes. Use the following command to allow rollouts and allow the pod to pick up the changes:

      $ oc rollout resume deployment/<deployment-name>

      For example:

      $ oc rollout resume deployment/elasticsearch-cdm-0-1

      Example output

      deployment.extensions/elasticsearch-cdm-0-1 resumed

      A new pod is deployed. After the pod has a ready container, you can move on to the next deployment.

      $ oc get pods -l component=elasticsearch-

      Example output

      NAME                                            READY   STATUS    RESTARTS   AGE
      elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6k    2/2     Running   0          22h
      elasticsearch-cdm-5ceex6ts-2-f799564cb-l9mj7    2/2     Running   0          22h
      elasticsearch-cdm-5ceex6ts-3-585968dc68-k7kjr   2/2     Running   0          22h

    2. After the deployments are complete, reset the pod to disallow rollouts:

      $ oc rollout pause deployment/<deployment-name>

      For example:

      $ oc rollout pause deployment/elasticsearch-cdm-0-1

      Example output

      deployment.extensions/elasticsearch-cdm-0-1 paused

    3. Check that the Elasticsearch cluster is in a green or yellow state:

      $ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query=_cluster/health?pretty=true
      Note

      If you performed a rollout on the Elasticsearch pod you used in the previous commands, the pod no longer exists and you need a new pod name here.

      For example:

      $ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -c elasticsearch -- es_util --query=_cluster/health?pretty=true
      {
        "cluster_name" : "elasticsearch",
        "status" : "yellow", 1
        "timed_out" : false,
        "number_of_nodes" : 3,
        "number_of_data_nodes" : 3,
        "active_primary_shards" : 8,
        "active_shards" : 16,
        "relocating_shards" : 0,
        "initializing_shards" : 0,
        "unassigned_shards" : 1,
        "delayed_unassigned_shards" : 0,
        "number_of_pending_tasks" : 0,
        "number_of_in_flight_fetch" : 0,
        "task_max_waiting_in_queue_millis" : 0,
        "active_shards_percent_as_number" : 100.0
      }
      1
      Make sure this parameter value is green or yellow before proceeding.
  7. If you changed the Elasticsearch configuration map, repeat these steps for each Elasticsearch pod.
  8. After all the deployments for the cluster have been rolled out, re-enable shard balancing:

    $ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query="_cluster/settings" -XPUT -d '{ "persistent": { "cluster.routing.allocation.enable" : "all" } }'

    For example:

    $ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -c elasticsearch -- es_util --query="_cluster/settings" -XPUT -d '{ "persistent": { "cluster.routing.allocation.enable" : "all" } }'

    Example output

    {
      "acknowledged" : true,
      "persistent" : { },
      "transient" : {
        "cluster" : {
          "routing" : {
            "allocation" : {
              "enable" : "all"
            }
          }
        }
      }
    }

  9. Scale up the collector pods so they send new logs to Elasticsearch.

    $ oc -n openshift-logging patch daemonset/collector -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-collector": "true"}}}}}'

10.4.10. Exposing the log store service as a route

By default, the log store that is deployed with logging is not accessible from outside the logging cluster. You can enable a route with re-encryption termination for external access to the log store service for those tools that access its data.

Externally, you can access the log store by creating a reencrypt route, your OpenShift Container Platform token and the installed log store CA certificate. Then, access a node that hosts the log store service with a cURL request that contains:

Internally, you can access the log store service using the log store cluster IP, which you can get by using either of the following commands:

$ oc get service elasticsearch -o jsonpath={.spec.clusterIP} -n openshift-logging

Example output

172.30.183.229

$ oc get service elasticsearch -n openshift-logging

Example output

NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
elasticsearch   ClusterIP   172.30.183.229   <none>        9200/TCP   22h

You can check the cluster IP address with a command similar to the following:

$ oc exec elasticsearch-cdm-oplnhinv-1-5746475887-fj2f8 -n openshift-logging -- curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://172.30.183.229:9200/_cat/health"

Example output

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    29  100    29    0     0    108      0 --:--:-- --:--:-- --:--:--   108

Prerequisites

  • The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.
  • You must have access to the project to be able to access to the logs.

Procedure

To expose the log store externally:

  1. Change to the openshift-logging project:

    $ oc project openshift-logging
  2. Extract the CA certificate from the log store and write to the admin-ca file:

    $ oc extract secret/elasticsearch --to=. --keys=admin-ca

    Example output

    admin-ca

  3. Create the route for the log store service as a YAML file:

    1. Create a YAML file with the following:

      apiVersion: route.openshift.io/v1
      kind: Route
      metadata:
        name: elasticsearch
        namespace: openshift-logging
      spec:
        host:
        to:
          kind: Service
          name: elasticsearch
        tls:
          termination: reencrypt
          destinationCACertificate: | 1
      1
      Add the log store CA certifcate or use the command in the next step. You do not have to set the spec.tls.key, spec.tls.certificate, and spec.tls.caCertificate parameters required by some reencrypt routes.
    2. Run the following command to add the log store CA certificate to the route YAML you created in the previous step:

      $ cat ./admin-ca | sed -e "s/^/      /" >> <file-name>.yaml
    3. Create the route:

      $ oc create -f <file-name>.yaml

      Example output

      route.route.openshift.io/elasticsearch created

  4. Check that the Elasticsearch service is exposed:

    1. Get the token of this service account to be used in the request:

      $ token=$(oc whoami -t)
    2. Set the elasticsearch route you created as an environment variable.

      $ routeES=`oc get route elasticsearch -o jsonpath={.spec.host}`
    3. To verify the route was successfully created, run the following command that accesses Elasticsearch through the exposed route:

      curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}"

      The response appears similar to the following:

      Example output

      {
        "name" : "elasticsearch-cdm-i40ktba0-1",
        "cluster_name" : "elasticsearch",
        "cluster_uuid" : "0eY-tJzcR3KOdpgeMJo-MQ",
        "version" : {
        "number" : "6.8.1",
        "build_flavor" : "oss",
        "build_type" : "zip",
        "build_hash" : "Unknown",
        "build_date" : "Unknown",
        "build_snapshot" : true,
        "lucene_version" : "7.7.0",
        "minimum_wire_compatibility_version" : "5.6.0",
        "minimum_index_compatibility_version" : "5.0.0"
      },
        "<tagline>" : "<for search>"
      }

10.4.11. Removing unused components if you do not use the default Elasticsearch log store

As an administrator, in the rare case that you forward logs to a third-party log store and do not use the default Elasticsearch log store, you can remove several unused components from your logging cluster.

In other words, if you do not use the default Elasticsearch log store, you can remove the internal Elasticsearch logStore and Kibana visualization components from the ClusterLogging custom resource (CR). Removing these components is optional but saves resources.

Prerequisites

  • Verify that your log forwarder does not send log data to the default internal Elasticsearch cluster. Inspect the ClusterLogForwarder CR YAML file that you used to configure log forwarding. Verify that it does not have an outputRefs element that specifies default. For example:

    outputRefs:
    - default
Warning

Suppose the ClusterLogForwarder CR forwards log data to the internal Elasticsearch cluster, and you remove the logStore component from the ClusterLogging CR. In that case, the internal Elasticsearch cluster will not be present to store the log data. This absence can cause data loss.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    $ oc edit ClusterLogging instance
  2. If they are present, remove the logStore and visualization stanzas from the ClusterLogging CR.
  3. Preserve the collection stanza of the ClusterLogging CR. The result should look similar to the following example:

    apiVersion: "logging.openshift.io/v1"
    kind: "ClusterLogging"
    metadata:
      name: "instance"
      namespace: "openshift-logging"
    spec:
      managementState: "Managed"
      collection:
        type: "fluentd"
        fluentd: {}
  4. Verify that the collector pods are redeployed:

    $ oc get pods -l component=collector -n openshift-logging