Red Hat Training

A Red Hat training course is available for OpenShift Container Platform

Chapter 29. Aggregating Container Logs

29.1. Overview

As an OpenShift Container Platform cluster administrator, you can deploy the EFK stack to aggregate logs for a range of OpenShift Container Platform services. Application developers can view the logs of the projects for which they have view access. The EFK stack aggregates logs from hosts and applications, whether coming from multiple containers or even deleted pods.

The EFK stack is a modified version of the ELK stack and is comprised of:

  • Elasticsearch: An object store where all logs are stored.
  • Fluentd: Gathers logs from nodes and feeds them to Elasticsearch.
  • Kibana: A web UI for Elasticsearch.

After deployment in a cluster, the stack aggregates logs from all nodes and projects into Elasticsearch, and provides a Kibana UI to view any logs. Cluster administrators can view all logs, but application developers can only view logs for projects they have permission to view. The stack components communicate securely.

Note

Managing Docker Container Logs discusses the use of json-file logging driver options to manage container logs and prevent filling node disks.

Aggregated logging is only supported using the journald driver in Docker. See Updating Fluentd’s Log Source After a Docker Log Driver Update for more information.

29.2. Pre-deployment Configuration

  1. An Ansible playbook is available to deploy and upgrade aggregated logging. You should familiarize yourself with the advanced installation and configuration section. This provides information for preparing to use Ansible and includes information about configuration. Parameters are added to the Ansible inventory file to configure various areas of the EFK stack.
  2. Review the sizing guidelines to determine how best to configure your deployment.
  3. Ensure that you have deployed a router for the cluster.
  4. Ensure that you have the necessary storage for Elasticsearch. Note that each Elasticsearch replica requires its own storage volume. See Elasticsearch for more information.
  5. Determine if you need highly-available Elasticsearch. A highly-available environment requires multiple replicas of each shard. By default, OpenShift Container Platform creates one shard for each index and zero replicas of those shards. To create high availability, set the openshift_logging_es_number_of_replicas Ansible variable to a value higher than 1. High availability also requires at least three Elasticsearch nodes, each on a different host. See Elasticsearch for more information.
  6. Choose a project. Once deployed, the EFK stack collects logs for every project within your OpenShift Container Platform cluster. The examples in this section use the default project logging. The Ansible playbook creates the project for you if it does not already exist. You will only need to create a project if you want to specify a node-selector on it. Otherwise, the openshift-logging role will create a project.

    $ oc adm new-project logging --node-selector=""
    $ oc project logging
    Note

    Specifying an empty node selector on the project is recommended, as Fluentd should be deployed throughout the cluster and any selector would restrict where it is deployed. To control component placement, specify node selectors per component to be applied to their deployment configurations.

29.3. Specifying Logging Ansible Variables

Parameters for the EFK deployment may be specified to the inventory host file to override the default parameter values. Read the Elasticsearch and the Fluentd sections before choosing parameters:

Note

By default the Elasticsearch service uses port 9300 for TCP communication between nodes in a cluster.

ParameterDescription

openshift_logging_image_prefix

The prefix for logging component images. For example, setting the prefix to registry.access.redhat.com/openshift3/ creates registry.access.redhat.com/openshift3/logging-fluentd:latest.

openshift_logging_image_version

The version for logging component images. For example, setting the version to v3.5 creates registry.access.redhat.com/openshift3/logging-fluentd:v3.5.

openshift_logging_use_ops

If set to true, configures a second Elasticsearch cluster and Kibana for operations logs. Fluentd splits logs between the main cluster and a cluster reserved for operations logs (which consists of /var/log/messages on nodes and the logs from the projects default, openshift, and openshift-infra). This means a second Elasticsearch and Kibana are deployed. The deployments are distinguishable by the -ops suffix included in their names and have parallel deployment options listed below.

openshift_logging_master_url

The URL for the Kubernetes master, this does not need to be public facing but should be accessible from within the cluster.

openshift_logging_master_public_url

The public facing URL for the Kubernetes master. This is used for Authentication redirection by the Kibana proxy.

openshift_logging_namespace

The namespace where Aggregated Logging will be deployed.

openshift_logging_install_logging

Set to true to install logging. Set to false to uninstall logging.

openshift_logging_image_pull_secret

Specify the name of an existing pull secret to be used for pulling component images from an authenticated registry.

openshift_logging_curator_default_days

The default minimum age (in days) Curator uses for deleting log records.

openshift_logging_curator_run_hour

The hour of the day Curator will run.

openshift_logging_curator_run_minute

The minute of the hour Curator will run.

openshift_logging_curator_run_timezone

The timezone Curator uses for figuring out its run time. Provide the timezone as a string in the tzselect(8) or timedatectl(1) "Region/Locality" format, for example America/New_York or UTC.

openshift_logging_curator_script_log_level

The script log level for Curator.

openshift_logging_curator_log_level

The log level for the Curator process.

openshift_logging_curator_cpu_limit

The amount of CPU to allocate to Curator.

openshift_logging_curator_memory_limit

The amount of memory to allocate to Curator.

openshift_logging_curator_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Curator instances.

openshift_logging_curator_ops_cpu_limit

Equivalent to openshift_logging_curator_cpu_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_curator_ops_memory_limit

Equivalent to openshift_logging_curator_memory_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_hostname

The external host name for web clients to reach Kibana.

openshift_logging_kibana_cpu_limit

The amount of CPU to allocate to Kibana.

openshift_logging_kibana_memory_limit

The amount of memory to allocate to Kibana.

openshift_logging_kibana_proxy_debug

When true, set the Kibana Proxy log level to DEBUG.

openshift_logging_kibana_proxy_cpu_limit

The amount of CPU to allocate to Kibana proxy.

openshift_logging_kibana_proxy_memory_limit

The amount of memory to allocate to Kibana proxy.

openshift_logging_kibana_replica_count

The number of replicas to which Kibana should be scaled up.

openshift_logging_kibana_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Kibana instances.

openshift_logging_kibana_key

The public facing key to use when creating the kibana route.

openshift_logging_kibana_cert

The cert that matches the key when creating the kibana route.

openshift_logging_kibana_ca

Optional. The CA to goes with the key and cert used when creating the kibana route.

openshift_logging_kibana_ops_hostname

Equivalent to openshift_logging_kibana_hostname for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_cpu_limit

Equivalent to openshift_logging_kibana_cpu_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_memory_limit

Equivalent to openshift_logging_kibana_memory_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_proxy_debug

Equivalent to openshift_logging_kibana_proxy_debug for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_proxy_cpu_limit

Equivalent to openshift_logging_kibana_proxy_cpu_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_proxy_memory_limit

Equivalent to openshift_logging_kibana_proxy_memory_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_kibana_ops_replica_count

Equivalent to openshift_logging_kibana_replica_count for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_fluentd_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Fluentd instances. Any node where Fluentd should run (typically, all) must have this label before Fluentd will be able to run and collect logs.

When scaling up the Aggregated Logging cluster after installation, the openshift_logging role labels nodes provided by openshift_logging_fluentd_hosts with this node selector.

As part of the installation, it is recommended that you add the Fluentd node selector label to the list of persisted node labels.

openshift_logging_fluentd_cpu_limit

The CPU limit for Fluentd pods.

openshift_logging_fluentd_memory_limit

The memory limit for Fluentd pods.

openshift_logging_fluentd_use_journal

Set to true if Fluentd should read log entries from Journal. The default is empty space which will cause Fluentd to determine which log driver is being used.

openshift_logging_fluentd_journal_read_from_head

Set to true if Fluentd should read from the head of Journal when first starting up, using this may cause a delay in Elasticsearch receiving current log records.

openshift_logging_fluentd_hosts

List of nodes that should be labeled for Fluentd to be deployed.

openshift_logging_es_host

The name of the Elasticsearch service where Fluentd should send logs.

openshift_logging_es_port

The port for the Elasticsearch service where Fluentd should send logs.

openshift_logging_es_ca

The location of the CA Fluentd uses to communicate with openshift_logging_es_host.

openshift_logging_es_client_cert

The location of the client certificate Fluentd uses for openshift_logging_es_host.

openshift_logging_es_client_key

The location of the client key Fluentd uses for openshift_logging_es_host.

openshift_logging_es_cluster_size

Elasticsearch nodes to deploy. High availability requires at least three or more.

openshift_logging_es_cpu_limit

The amount of CPU limit for the Elasticsearch cluster.

openshift_logging_es_memory_limit

Amount of RAM to reserve per Elasticsearch instance. It must be at least 512M. Possible suffixes are G,g,M,m.

openshift_logging_es_number_of_replicas

The number of replicas per primary shard for each new index. Defaults to '0'. A minimum of 1 is advisable for production clusters. For a highly-available environment, set this value to 2 or higher and have at least three Elasticsearch nodes, each on a different host.

openshift_logging_es_number_of_shards

The number of primary shards for every new index created in ES. Defaults to '1'.

openshift_logging_es_pv_selector

A key/value map added to a PVC in order to select specific PVs.

openshift_logging_es_pvc_dynamic

If available for your cluster, set to true to have PVC claims annotated so that their backing storage is dynamically provisioned.

openshift_logging_es_pvc_size

Size of the persistent volume claim to create per ElasticSearch instance. For example, 100G. If omitted, no PVCs are created and ephemeral volumes are used instead.

openshift_logging_es_pvc_prefix

Prefix for the names of persistent volume claims to be used as storage for Elasticsearch nodes. A number is appended per node, such as logging-es-1. If they do not already exist, they are created with size es-pvc-size.

openshift_logging_es_recover_after_time

The amount of time Elasticsearch will wait before it tries to recover.

openshift_logging_es_storage_group

Number of a supplemental group ID for access to Elasticsearch storage volumes. Backing volumes should allow access by this group ID.

openshift_logging_es_nodeselector

A node selector specified as a map that determines which nodes are eligible targets for deploying Elasticsearch nodes. Use this map to place these instances on nodes that are reserved or optimized for running them. For example, the selector could be {"node-type":"infrastructure"}. At least one active node must have this label before Elasticsearch will deploy.

openshift_logging_es_ops_allow_cluster_reader

Set to true if cluster-reader role is allowed to read operation logs.

openshift_logging_es_ops_host

Equivalent to openshift_logging_es_host for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_port

Equivalent to openshift_logging_es_port for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_ca

Equivalent to openshift_logging_es_ca for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_client_cert

Equivalent to openshift_logging_es_client_cert for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_client_key

Equivalent to openshift_logging_es_client_key for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_cluster_size

Equivalent to openshift_logging_es_cluster_size for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_cpu_limit

Equivalent to openshift_logging_es_cpu_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_memory_limit

Equivalent to openshift_logging_es_memory_limit for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_pv_selector

Equivalent to openshift_logging_es_pv_selector for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_pvc_dynamic

Equivalent to openshift_logging_es_pvc_dynamic for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_pvc_size

Equivalent to openshift_logging_es_pvc_size for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_pvc_prefix

Equivalent to openshift_logging_es_pvc_prefix for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_recover_after_time

Equivalent to openshift_logging_es_recovery_after_time for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_storage_group

Equivalent to openshift_logging_es_storage_group for Ops cluster when openshift_logging_use_ops is set to true.

openshift_logging_es_ops_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Elasticsearch nodes. This can be used to place these instances on nodes reserved or optimized for running them. For example, the selector could be node-type=infrastructure. At least one active node must have this label before Elasticsearch will deploy.

openshift_logging_kibana_ops_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Kibana instances.

openshift_logging_curator_ops_nodeselector

A node selector that specifies which nodes are eligible targets for deploying Curator instances.

29.4. Deploying the EFK Stack

The EFK stack is deployed using an Ansible playbook to the EFK components. Run the playbook from the default OpenShift Ansible location using the default inventory file.

$ ansible-playbook [-i </path/to/inventory>] \
    /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml

Running the playbook deploys all resources needed to support the stack; such as Secrets, ServiceAccounts, and DeploymentConfigs. The playbook waits to deploy the component pods until the stack is running. If the wait steps fail, the deployment could still be successful; it may be retrieving the component images from the registry which can take up to a few minutes. You can watch the process with:

$ oc get pods -w

They will eventually enter Running status. For additional details about the status of the pods during deployment by retrieving associated events:

$ oc describe pods/<pod_name>

Check the logs if the pods do not run successfully:

$ oc logs -f <pod_name>

29.5. Understanding and Adjusting the Deployment

This section describes adjustments that you can make to deployed components.

29.5.1. Ops Cluster

Note

The logs for the default, openshift, and openshift-infra projects are automatically aggregated and grouped into the .operations item in the Kibana interface.

The project where you have deployed the EFK stack (logging, as documented here) is not aggregated into .operations and is found under its ID.

If you set openshift_logging_use_ops to true in your inventory file, Fluentd is configured to split logs between the main Elasticsearch cluster and another cluster reserved for operations logs, which are defined as node system logs and the projects default, openshift, and openshift-infra. Therefore, a separate Elasticsearch cluster, a separate Kibana, and a separate Curator are deployed to index, access, and manage operations logs. These deployments are set apart with names that include -ops. Keep these separate deployments in mind if you enable this option. Most of the following discussion also applies to the operations cluster if present, just with the names changed to include -ops.

29.5.2. Elasticsearch

Elasticsearch (ES) is an object store where all logs are stored.

Elasticsearch organizes the log data into datastores, each called an index. Elasticsearch subdivides each index into multiple pieces called shards, which it spreads across a set of Elasticsearch nodes in your cluster. You can configure Elasticsearch to make copies of the shards, called replicas. Elasticsearch also spreads replicas across the Elactisearch nodes. The combination of shards and replicas is intended to provide redundancy and resilience to failure. For example, if you configure three shards for the index with one replica, Elasticsearch generates a total of six shards for that index: three primary shards and three replicas as a backup.

The OpenShift Container Platform logging installer ensures each Elasticsearch node is deployed using a unique deployment configuration that includes its own storage volume. You can create an additional deployment configuration for each Elasticsearch node you add to the logging system. During installation, you can use the openshift_logging_es_cluster_size Ansible variable to specify the number of Elasticsearch nodes.

Alternatively, you can scale up your existing cluster by modifying by modifying the openshift_logging_es_cluster_size in the inventory file and re-running the logging playbook. Additional clustering parameters can be modified and are described in Specifying Logging Ansible Variables.

Refer to Elastic’s documentation for considerations involved in choosing storage and network location as directed below.

Note

A highly-available Elasticsearch environment requires at least three Elasticsearch nodes, each on a different host, and setting the openshift_logging_es_number_of_replicas Ansible variable to a value of 1 or higher to create replicas.

Viewing all Elasticsearch Deployments

To view all current Elasticsearch deployments:

$ oc get dc --selector logging-infra=elasticsearch

Configuring Elasticsearch for High Availability

A highly-available Elasticsearch environment requires at least three Elasticsearch nodes, each on a different host, and setting the openshift_logging_es_number_of_replicas Ansible variable to a value of 1 or higher to create replicas.

Use the following scenarios as a guide for an OpenShift Container Platform cluster with three Elasticsearch nodes:

  • If you can tolerate one Elasticsearch node going down, set openshift_logging_es_number_of_replicas to 1. This ensures that two nodes have a copy of all of the Elasticsearch data in the cluster.
  • If you must tolerate two Elasticsearch nodes going down, set openshift_logging_es_number_of_replicas to 2. This ensures that every node has a copy of all of the Elasticsearch data in the cluster.

Note that there is a trade-off between high availability and performance. For example, having openshift_logging_es_number_of_replicas=2 and openshift_logging_es_number_of_shards=3 requires Elasticsearch to spend significant resources replicating the shard data among the nodes in the cluster. Also, using a higher number of replicas requires doubling or tripling the data storage requirements on each node, so you must take that into account when planning persistent storage for Elasticsearch.

Considerations when Configuring the Number of Shards

For the openshift_logging_es_number_of_shards parameter, consider:

  • For higher performance, increase the number of shards. For example, in a three node cluster, set openshift_logging_es_number_of_shards=3. This will cause each index to be split into three parts (shards), and the load for processing the index will be spread out over all 3 nodes.
  • If you have a large number of projects, you might see performance degradation if you have more than a few thousand shards in the cluster. Either reduce the number of shards or reduce the curation time.
  • If you have a small number of very large indices, you might want to configure openshift_logging_es_number_of_shards=3 or higher. Elasticsearch recommends using a maximum shard size of less than 50 GB.

Node Selector

Because Elasticsearch can use a lot of resources, all members of a cluster should have low latency network connections to each other and to any remote storage. Ensure this by directing the instances to dedicated nodes, or a dedicated region within your cluster, using a node selector.

To configure a node selector, specify the openshift_logging_es_nodeselector configuration option in the inventory file. This applies to all Elasticsearch deployments; if you need to individualize the node selectors, you must manually edit each deployment configuration after deployment. The node selector is specified as a python compatible dict. For example, {"node-type":"infra", "region":"east"}.

Persistent Elasticsearch Storage

By default, the openshift_logging Ansible role creates an ephemeral deployment in which all of a pod’s data is lost upon restart. For production usage, specify a persistent storage volume for each Elasticsearch deployment configuration. You can create the necessary persistent volume claims before deploying or have them created for you. The PVCs must be named to match the openshift_logging_es_pvc_prefix setting, which defaults to logging-es-; each PVC name will have a sequence number added to it, so logging-es-1, logging-es-2, and so on. If a PVC needed for the deployment exists already, it is used; if not, and openshift_logging_es_pvc_size has been specified, it is created with a request for that size.

Warning

Using NFS storage as a volume or a persistent volume, or using NAS such as Gluster, is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur. If NFS storage is a requirement, you can allocate a large file on a volume to serve as a storage device and mount it locally on one host. For example, if your NFS storage volume is mounted at /nfs/storage:

$ truncate -s 1T /nfs/storage/elasticsearch-1
$ mkfs.xfs /nfs/storage/elasticsearch-1
$ mount -o loop /nfs/storage/elasticsearch-1 /usr/local/es-storage
$ chown 1000:1000 /usr/local/es-storage

Then, use /usr/local/es-storage as a host-mount as described below. Use a different backing file as storage for each Elasticsearch replica.

This loopback must be maintained manually outside of OpenShift Container Platform, on the node. You must not maintain it from inside a container.

It is possible to use a local disk volume (if available) on each node host as storage for an Elasticsearch replica. Doing so requires some preparation as follows.

  1. The relevant service account must be given the privilege to mount and edit a local volume:

    $ oc adm policy add-scc-to-user privileged  \
           system:serviceaccount:logging:aggregated-logging-elasticsearch 1
    1
    Use the project you created earlier (for example, logging) when running the logging playbook.
  2. Each Elasticsearch replica definition must be patched to claim that privilege, for example (change to --selector component=es-ops for Ops cluster):

    $ for dc in $(oc get deploymentconfig --selector component=es -o name); do
        oc scale $dc --replicas=0
        oc patch $dc \
           -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}'
      done
  3. The Elasticsearch replicas must be located on the correct nodes to use the local storage, and should not move around even if those nodes are taken down for a period of time. This requires giving each Elasticsearch replica a node selector that is unique to a node where an administrator has allocated storage for it. To configure a node selector, edit each Elasticsearch deployment configuration and add or edit the nodeSelector section to specify a unique label that you have applied for each desired node:

    apiVersion: v1
    kind: DeploymentConfig
    spec:
      template:
        spec:
          nodeSelector:
            logging-es-node: "1" 1
    1
    This label should uniquely identify a replica with a single node that bears that label, in this case logging-es-node=1. Use the oc label command to apply labels to nodes as needed.

    To automate applying the node selector you can instead use the oc patch command:

    $ oc patch dc/logging-es-<suffix> \
       -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-es-node":"1"}}}}}'
  4. Once these steps are taken, a local host mount can be applied to each replica as in this example (where we assume storage is mounted at the same path on each node) (change to --selector component=es-ops for Ops cluster):

    $ for dc in $(oc get deploymentconfig --selector component=es -o name); do
        oc set volume $dc \
              --add --overwrite --name=elasticsearch-storage \
              --type=hostPath --path=/usr/local/es-storage
        oc rollout latest $dc
        oc scale $dc --replicas=1
      done

Changing the Scale of Elasticsearch

If you need to scale up the number of Elasticsearch nodes in your cluster, you can create a deployment configuration for each Elasticsearch node you want to add.

Due to the nature of persistent volumes and how Elasticsearch is configured to store its data and recover the cluster, you cannot simply increase the replicas in an Elasticsearch deployment configuration.

The simplest way to change the scale of Elasticsearch is to modify the inventory host file a re-run the logging playbook as desribed previously. Assuming you have supplied persistent storage for the deployment, this should not be disruptive.

If you do not wish to reinstall, for instance because you have made customizations that you would like to preserve, then it is possible to add new Elasticsearch deployment configurations to the cluster using a template supplied by the deployer. This requires a more complicated procedure however.

Allowing cluster-reader to view operations logs

By default, only cluster-admin users are granted access in Elasticsearch and Kibana to view operations logs. To allow cluster-reader users to also view these logs, update the value of openshift.operations.allow_cluster_reader in the Elasticsearch configmap to true:

$ oc edit configmap/logging-elasticsearch

Please note that changes to the configmap might not appear until after redeploying the pods. Persisting these changes across deployments can be accomplished by setting openshift_logging_es_allows_cluster_reader to true in the inventory file.

29.5.3. Fluentd

Fluentd is deployed as a DaemonSet that deploys replicas according to a node label selector, which you can specify with the inventory parameter openshift_logging_fluentd_nodeselector and the default is logging-infra-fluentd. As part of the OpenShift cluster installation, it is recommended that you add the Fluentd node selector to the list of persisted node labels.

Having Fluentd Use the Systemd Journal as the Log Source

By default, Fluentd reads from /var/log/messages and /var/log/containers/<container>.log for system logs and container logs, respectively. You can instead use the systemd journal as the log source. There are three inventory parameters available:

ParameterDescription

openshift_logging_use_journal

The default is empty, which configures Fluentd to check which log driver Docker is using. If Docker is using --log-driver=journald, Fluentd reads from the systemd journal, otherwise, it assumes docker is using the json-file log driver and reads from the /var/log file sources. You can specify the openshift_logging_use_journal option as true or false to be explicit about which log source to use. Using the systemd journal requires docker-1.10 or later, and Docker must be configured to use --log-driver=journald.

Aggregated logging is only supported using the journald driver in Docker. See Updating Fluentd’s Log Source After a Docker Log Driver Update for more information.

openshift_logging_journal_read_from_head

If this setting is false, Fluentd starts reading from the end of the journal, ignoring historical logs. If this setting is true, Fluentd starts reading logs from the beginning of the journal.

Note

As of OpenShift Container Platform 3.3, Fluentd no longer reads historical log files when using the JSON file log driver. In situations where clusters have a large number of log files and are older than the EFK deployment, this avoids delays when pushing the most recent logs into Elasticsearch. Curator deleting logs are migrated soon after they are added to Elasticsearch.

Note

It may require several minutes, or hours, depending on the size of your journal, before any new log entries are available in Elasticsearch, when using openshift_logging_journal_read_from_head=true.

Warning

It is highly recommended that you use the default value for use-journal. In scenarios where upgrading OpenShift Container Platform changes the Docker log driver, if use-journal=False is explicitly specified as part of installation, Fluentd still expects to read logs generated using the json-file log driver. This results in a lack of log ingestion. If this has happened within your logging cluster, troubleshoot it.

Aggregated logging is only supported using the journald driver in Docker. See Updating Fluentd’s Log Source After a Docker Log Driver Update for more information.

Having Fluentd Send Logs to Another Elasticsearch

Note

The use of ES_COPY is being deprecated. To configure FluentD to send a copy of its logs to an external aggregator, use Fluentd Secure Forward instead.

You can configure Fluentd to send a copy of each log message to both the Elasticsearch instance included with OpenShift Container Platform aggregated logging, and to an external Elasticsearch instance. For example, if you already have an Elasticsearch instance set up for auditing purposes, or data warehousing, you can send a copy of each log message to that Elasticsearch.

This feature is controlled via environment variables on Fluentd, which can be modified as described below.

If its environment variable ES_COPY is true, Fluentd sends a copy of the logs to another Elasticsearch. The names for the copy variables are just like the current ES_HOST, OPS_HOST, and other variables, except that they add _COPY: ES_COPY_HOST, OPS_COPY_HOST, and so on. There are some additional parameters added:

  • ES_COPY_SCHEME, OPS_COPY_SCHEME - can use either http or https - defaults to https
  • ES_COPY_USERNAME, OPS_COPY_USERNAME - user name to use to authenticate to Elasticsearch using username/password auth
  • ES_COPY_PASSWORD, OPS_COPY_PASSWORD - password to use to authenticate to Elasticsearch using username/password auth
Note

Sending logs directly to an AWS Elasticsearch instance is not supported. Use Fluentd Secure Forward to direct logs to an instance of Fluentd that you control and that is configured with the fluent-plugin-aws-elasticsearch-service plug-in.

To set the parameters:

  1. Edit the DaemonSet for Fluentd:

    $ oc edit -n logging ds logging-fluentd

    Add or edit the environment variable ES_COPY to have the value "true" (with the quotes), and add or edit the COPY variables listed above.

Note

These changes will not be persisted across multiple runs of the logging playbook. You will need to edit the DaemonSet each time to update environment variables.

Configuring Fluentd to Send Logs to an External Log Aggregator

You can configure Fluentd to send a copy of its logs to an external log aggregator, and not the default Elasticsearch, using the secure-forward plug-in. From there, you can further process log records after the locally hosted Fluentd has processed them.

The logging deployment provides a secure-forward.conf section in the Fluentd configmap for configuring the external aggregator:

<store>
@type secure_forward
self_hostname pod-${HOSTNAME}
shared_key thisisasharedkey
secure yes
enable_strict_verification yes
ca_cert_path /etc/fluent/keys/your_ca_cert
ca_private_key_path /etc/fluent/keys/your_private_key
ca_private_key_passphrase passphrase
<server>
  host ose1.example.com
  port 24284
</server>
<server>
  host ose2.example.com
  port 24284
  standby
</server>
<server>
  host ose3.example.com
  port 24284
  standby
</server>
</store>

This can be updated using the oc edit command:

$ oc edit configmap/logging-fluentd

Certificates to be used in secure-forward.conf can be added to the existing secret that is mounted on the Fluentd pods. The your_ca_cert and your_private_key values must match what is specified in secure-forward.conf in configmap/logging-fluentd:

$ oc patch secrets/logging-fluentd --type=json \
  --patch "[{'op':'add','path':'/data/your_ca_cert','value':'$(base64 /path/to/your_ca_cert.pem)'}]"
$ oc patch secrets/logging-fluentd --type=json \
  --patch "[{'op':'add','path':'/data/your_private_key','value':'$(base64 /path/to/your_private_key.pem)'}]"
Note

Replace your_private_key with a generic name. This is a link to the JSON path, not a path on your host system

When configuring the external aggregator, it must be able to accept messages securely from Fluentd.

If the external aggregator is another Fluentd server, it must have the fluent-plugin-secure-forward plug-in installed and make use of the input plug-in it provides:

<source>
  @type secure_forward

  self_hostname ${HOSTNAME}
  bind 0.0.0.0
  port 24284

  shared_key thisisasharedkey

  secure yes
  cert_path        /path/for/certificate/cert.pem
  private_key_path /path/for/certificate/key.pem
  private_key_passphrase secret_foo_bar_baz
</source>

Further explanation of how to set up the fluent-plugin-secure-forward plug-in can be found here.

Throttling logs in Fluentd

For projects that are especially verbose, an administrator can throttle down the rate at which the logs are read in by Fluentd before being processed.

Warning

Throttling can contribute to log aggregation falling behind for the configured projects; log entries can be lost if a pod is deleted before Fluentd catches up.

Note

Throttling does not work when using the systemd journal as the log source. The throttling implementation depends on being able to throttle the reading of the individual log files for each project. When reading from the journal, there is only a single log source, no log files, so no file-based throttling is available. There is not a method of restricting the log entries that are read into the Fluentd process.

To tell Fluentd which projects it should be restricting, edit the throttle configuration in its ConfigMap after deployment:

$ oc edit configmap/logging-fluentd

The format of the throttle-config.yaml key is a YAML file that contains project names and the desired rate at which logs are read in on each node. The default is 1000 lines at a time per node. For example:

logging:
  read_lines_limit: 500

test-project:
  read_lines_limit: 10

.operations:
  read_lines_limit: 100

29.5.4. Kibana

To access the Kibana console from the OpenShift Container Platform web console, add the loggingPublicURL parameter in the /etc/origin/master/master-config.yaml file, with the URL of the Kibana console (the kibana-hostname parameter). The value must be an HTTPS URL:

...
assetConfig:
  ...
  loggingPublicURL: "https://kibana.example.com"
...

Setting the loggingPublicURL parameter creates a View Archive button on the OpenShift Container Platform web console under the BrowsePods<pod_name>Logs tab. This links to the Kibana console.

You can scale the Kibana deployment as usual for redundancy:

$ oc scale dc/logging-kibana --replicas=2
Note

To ensure the scale persists across multiple executions of the logging playbook, make sure to update the openshift_logging_kibana_replica_count in the inventory file.

You can see the user interface by visiting the site specified by the openshift_logging_kibana_hostname variable.

See the Kibana documentation for more information on Kibana.

29.5.5. Curator

Curator allows administrators to configure scheduled Elasticsearch maintenance operations to be performed automatically on a per-project basis. It is scheduled to perform actions daily based on its configuration. Only one Curator pod is recommended per Elasticsearch cluster. Curator is configured via a YAML configuration file with the following structure:

$PROJECT_NAME:
  $ACTION:
    $UNIT: $VALUE

$PROJECT_NAME:
  $ACTION:
    $UNIT: $VALUE
 ...

The available parameters are:

Variable NameDescription

$PROJECT_NAME

The actual name of a project, such as myapp-devel. For OpenShift Container Platform operations logs, use the name .operations as the project name.

$ACTION

The action to take, currently only delete is allowed.

$UNIT

One of days, weeks, or months.

$VALUE

An integer for the number of units.

.defaults

Use .defaults as the $PROJECT_NAME to set the defaults for projects that are not specified.

runhour

(Number) the hour of the day in 24-hour format at which to run the Curator jobs. For use with .defaults.

runminute

(Number) the minute of the hour at which to run the Curator jobs. For use with .defaults.

For example, to configure Curator to:

  • delete indices in the myapp-dev project older than 1 day
  • delete indices in the myapp-qe project older than 1 week
  • delete operations logs older than 8 weeks
  • delete all other projects indices after they are 30 days old
  • run the Curator jobs at midnight every day

Use:

myapp-dev:
 delete:
   days: 1

myapp-qe:
  delete:
    weeks: 1

.operations:
  delete:
    weeks: 8

.defaults:
  delete:
    days: 30
  runhour: 0
  runminute: 0
Important

When you use month as the $UNIT for an operation, Curator starts counting at the first day of the current month, not the current day of the current month. For example, if today is April 15, and you want to delete indices that are 2 months older than today (delete: months: 2), Curator does not delete indices that are dated older than February 15; it deletes indices older than February 1. That is, it goes back to the first day of the current month, then goes back two whole months from that date. If you want to be exact with Curator, it is best to use days (for example, delete: days: 30).

29.5.5.1. Creating the Curator Configuration

The openshift_logging Ansible role provides a ConfigMap from which Curator reads its configuration. You may edit or replace this ConfigMap to reconfigure Curator. Currently the logging-curator ConfigMap is used to configure both your ops and non-ops Curator instances. Any .operations configurations will be in the same location as your application logs configurations.

  1. To edit the provided ConfigMap to configure your Curator instances:

    $ oc edit configmap/logging-curator
  2. To replace the provided ConfigMap instead:

    $ create /path/to/mycuratorconfig.yaml
    $ oc create configmap logging-curator -o yaml \
      --from-file=config.yaml=/path/to/mycuratorconfig.yaml | \
      oc replace -f -
  3. After you make your changes, redeploy Curator:

    $ oc rollout latest dc/logging-curator
    $ oc rollout latest dc/logging-curator-ops

29.6. Cleanup

Remove everything generated during the deployment.

$ ansible-playbook [-i </path/to/inventory>] \
    /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml \
    -e openshift_logging_install_logging=False

29.7. Troubleshooting Kibana

Using the Kibana console with OpenShift Container Platform can cause problems that are easily solved, but are not accompanied with useful error messages. Check the following troubleshooting sections if you are experiencing any problems when deploying Kibana on OpenShift Container Platform:

Login Loop

The OAuth2 proxy on the Kibana console must share a secret with the master host’s OAuth2 server. If the secret is not identical on both servers, it can cause a login loop where you are continuously redirected back to the Kibana login page.

To fix this issue, delete the current OAuthClient, and use openshift-ansible to re-run the openshift_logging role:

$ oc delete oauthclient/kibana-proxy
$ ansible-playbook [-i </path/to/inventory>] \
    /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml

Cryptic Error When Viewing the Console

When attempting to visit the Kibana console, you may receive a browser error instead:

{"error":"invalid_request","error_description":"The request is missing a required parameter,
 includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed."}

This can be caused by a mismatch between the OAuth2 client and server. The return address for the client must be in a whitelist so the server can securely redirect back after logging in.

Fix this issue by replacing the OAuthClient entry:

$ oc delete oauthclient/kibana-proxy
$ ansible-playbook [-i </path/to/inventory>] \
    /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml

If the problem persists, check that you are accessing Kibana at a URL listed in the OAuth client. This issue can be caused by accessing the URL at a forwarded port, such as 1443 instead of the standard 443 HTTPS port. You can adjust the server whitelist by editing the OAuth client:

$ oc edit oauthclient/kibana-proxy

503 Error When Viewing the Console

If you receive a proxy error when viewing the Kibana console, it could be caused by one of two issues.

First, Kibana may not be recognizing pods. If Elasticsearch is slow in starting up, Kibana may timeout trying to reach it. Check whether the relevant service has any endpoints:

$ oc describe service logging-kibana
Name:                   logging-kibana
[...]
Endpoints:              <none>

If any Kibana pods are live, endpoints will be listed. If they are not, check the state of the Kibana pods and deployment. You may need to scale the deployment down and back up again.

The second possible issue may be caused if the route for accessing the Kibana service is masked. This can happen if you perform a test deployment in one project, then deploy in a different project without completely removing the first deployment. When multiple routes are sent to the same destination, the default router will only route to the first created. Check the problematic route to see if it is defined in multiple places:

$ oc get route  --all-namespaces --selector logging-infra=support

F-5 Load Balancer and X-Forwarded-For Enabled

If you are attempting to use a F-5 load balancer in front of Kibana with X-Forwarded-For enabled, this can cause an issue in which the Elasticsearch Searchguard plug-in is unable to correctly accept connections from Kibana.

Example Kibana Error Message

Kibana: Unknown error while connecting to Elasticsearch

Error: Unknown error while connecting to Elasticsearch
Error: UnknownHostException[No trusted proxies]

To configure Searchguard to ignore the extra header:

  1. Scale down all Fluentd pods.
  2. Scale down Elasticsearch after the Fluentd pods have terminated.
  3. Add searchguard.http.xforwardedfor.header: DUMMY to the Elasticsearch configuration section.

    $ oc edit configmap/logging-elasticsearch 1
    1
    This approach requires that Elasticsearch’s configurations are within a ConfigMap.
  4. Scale Elasticsearch back up.
  5. Scale up all Fluentd pods.

29.8. Sending Logs to an External Elasticsearch Instance

Fluentd sends logs to the value of the ES_HOST, ES_PORT, OPS_HOST, and OPS_PORT environment variables of the Elasticsearch deployment configuration. The application logs are directed to the ES_HOST destination, and operations logs to OPS_HOST.

Note

Sending logs directly to an AWS Elasticsearch instance is not supported. Use Fluentd Secure Forward to direct logs to an instance of Fluentd that you control and that is configured with the fluent-plugin-aws-elasticsearch-service plug-in.

To direct logs to a specific Elasticsearch instance, edit the deployment configuration and replace the value of the above variables with the desired instance:

$ oc edit dc/<deployment_configuration>

For an external Elasticsearch instance to contain both application and operations logs, you can set ES_HOST and OPS_HOST to the same destination, while ensuring that ES_PORT and OPS_PORT also have the same value.

If your externally hosted Elasticsearch instance does not use TLS, update the _CLIENT_CERT, _CLIENT_KEY, and _CA variables to be empty. If it does use TLS, but not mutual TLS, update the _CLIENT_CERT and _CLIENT_KEY variables to be empty and patch or recreate the logging-fluentd secret with the appropriate _CA value for communicating with your Elasticsearch instance. If it uses Mutual TLS as the provided Elasticsearch instance does, patch or recreate the logging-fluentd secret with your client key, client cert, and CA.

Note

If you are not using the provided Kibana and Elasticsearch images, you will not have the same multi-tenant capabilities and your data will not be restricted by user access to a particular project.

29.9. Performing Administrative Elasticsearch Operations

As of logging version 3.2.0, an administrator certificate, key, and CA that can be used to communicate with and perform administrative operations on Elasticsearch are provided within the logging-elasticsearch secret.

Note

To confirm whether or not your EFK installation provides these, run:

$ oc describe secret logging-elasticsearch

If they are not available, refer to Manual Upgrades to ensure you are on the latest version first.

  1. Connect to an Elasticsearch pod that is in the cluster on which you are attempting to perform maintenance.
  2. To find a pod in a cluster use either:

    $ oc get pods -l component=es -o name | head -1
    $ oc get pods -l component=es-ops -o name | head -1
  3. Connect to a pod:

    $ oc rsh <your_Elasticsearch_pod>
  4. Once connected to an Elasticsearch container, you can use the certificates mounted from the secret to communicate with Elasticsearch per its Indices APIs documentation.

    Fluentd sends its logs to Elasticsearch using the index format project.{project_name}.{project_uuid}.YYYY.MM.DD where YYYY.MM.DD is the date of the log record.

    For example, to delete all logs for the logging project with uuid 3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3 from June 15, 2016, we can run:

    $ curl --key /etc/elasticsearch/secret/admin-key \
      --cert /etc/elasticsearch/secret/admin-cert \
      --cacert /etc/elasticsearch/secret/admin-ca -XDELETE \
      "https://localhost:9200/project.logging.3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3.2016.06.15"

29.10. Updating Fluentd’s Log Source After a Docker Log Driver Update

If the Docker log driver has changed from json-file to journald and Fluentd was previously configured with USE_JOURNAL=False, then it will not be able to pick up any new logs that are created. When the Fluentd daemonset is configured with the default value for USE_JOURNAL, then it will detect the Docker log driver upon pod start-up, and configure itself to pull from the appropriate source.

To update Fluentd to detect the correct source upon start-up:

  1. Remove the label from nodes where Fluentd is deployed:

    $ oc label node --all logging-infra-fluentd- 1
    1
    This example assumes use of the default Fluentd node selector and it being deployed on all nodes.
  2. Update the daemonset/logging-fluentd USE_JOURNAL value to be empty:
$ oc patch daemonset/logging-fluentd \
     -p '{"spec":{"template":{"spec":{"containers":[{"name":"fluentd-elasticsearch","env":[{"name": "USE_JOURNAL", "value":""}]}]}}}}'
  1. Relabel your nodes to schedule Fluentd deployments:

    $ oc label node --all logging-infra-fluentd=true 1
    1
    This example assumes use of the default Fluentd node selector and it being deployed on all nodes.