Chapter 2. Logging 5.7
2.1. Logging 5.7 Release Notes
Logging is provided as an installable component, with a distinct release cycle from the core OpenShift Container Platform. The Red Hat OpenShift Container Platform Life Cycle Policy outlines release compatibility.
The stable channel only provides updates to the most recent release of logging. To continue receiving updates for prior releases, you must change your subscription channel to stable-X where X is the version of logging you have installed.
2.1.1. Logging 5.7.4
This release includes OpenShift Logging Bug Fix Release 5.7.4.
2.1.1.1. Bug fixes
-
Before this update, when forwarding logs to CloudWatch, a
namespaceUUIDvalue was not appended to thelogGroupNamefield. With this update, thenamespaceUUIDvalue is included, so alogGroupNamein CloudWatch appears aslogGroupName: vectorcw.b443fb9e-bd4c-4b6a-b9d3-c0097f9ed286. (LOG-2701) - Before this update, when forwarding logs over HTTP to an off-cluster destination, the Vector collector was unable to authenticate to the cluster-wide HTTP proxy even though correct credentials were provided in the proxy URL. With this update, the Vector log collector can now authenticate to the cluster-wide HTTP proxy. (LOG-3381)
- Before this update, the Operator would fail if the Fluentd collector was configured with Splunk as an output, due to this configuration being unsupported. With this update, configuration validation rejects unsupported outputs, resolving the issue. (LOG-4237)
-
Before this update, when the Vector collector was updated an
enabled = truevalue in the TLS configuration for AWS Cloudwatch logs and the GCP Stackdriver caused a configuration error. With this update,enabled = truevalue will be removed for these outputs, resolving the issue. (LOG-4242) -
Before this update, the Vector collector occasionally panicked with the following error message in its log:
thread 'vector-worker' panicked at 'all branches are disabled and there is no else branch', src/kubernetes/reflector.rs:26:9. With this update, the error has been resolved. (LOG-4275) -
Before this update, an issue in the Loki Operator caused the
alert-managerconfiguration for the application tenant to disappear if the Operator was configured with additional options for that tenant. With this update, the generated Loki configuration now contains both the custom and the auto-generated configuration. (LOG-4361) - Before this update, when multiple roles were used to authenticate using STS with AWS Cloudwatch forwarding, a recent update caused the credentials to be non-unique. With this update, multiple combinations of STS roles and static credentials can once again be used to authenticate with AWS Cloudwatch. (LOG-4368)
- Before this update, Loki filtered label values for active streams but did not remove duplicates, making Grafana’s Label Browser unusable. With this update, Loki filters out duplicate label values for active streams, resolving the issue. (LOG-4389)
2.1.1.2. CVEs
2.1.2. Logging 5.7.3
This release includes OpenShift Logging Bug Fix Release 5.7.3.
2.1.2.1. Bug fixes
- Before this update, when viewing logs within the OpenShift Container Platform web console, cached files caused the data to not refresh. With this update the bootstrap files are not cached, resolving the issue. (LOG-4100)
- Before this update, the Loki Operator reset errors in a way that made identifying configuration problems difficult to troubleshoot. With this update, errors persist until the configuration error is resolved. (LOG-4156)
-
Before this update, the LokiStack ruler did not restart after changes were made to the
RulerConfigcustom resource (CR). With this update, the Loki Operator restarts the ruler pods after theRulerConfigCR is updated. (LOG-4161) -
Before this update, the vector collector terminated unexpectedly when input match label values contained a
/character within theClusterLogForwarder. This update resolves the issue by quoting the match label, enabling the collector to start and collect logs. (LOG-4176) -
Before this update, the Loki Operator terminated unexpectedly when a
LokiStackCR defined tenant limits, but not global limits. With this update, the Loki Operator can processLokiStackCRs without global limits, resolving the issue. (LOG-4198) - Before this update, Fluentd did not send logs to an Elasticsearch cluster when the private key provided was passphrase-protected. With this update, Fluentd properly handles passphrase-protected private keys when establishing a connection with Elasticsearch. (LOG-4258)
-
Before this update, clusters with more than 8,000 namespaces caused Elasticsearch to reject queries because the list of namespaces was larger than the
http.max_header_sizesetting. With this update, the default value for header size has been increased, resolving the issue. (LOG-4277) -
Before this update, label values containing a
/character within theClusterLogForwarderCR would cause the collector to terminate unexpectedly. With this update, slashes are replaced with underscores, resolving the issue. (LOG-4095) -
Before this update, the Cluster Logging Operator terminated unexpectedly when set to an unmanaged state. With this update, a check to ensure that the
ClusterLoggingresource is in the correct Management state before initiating the reconciliation of theClusterLogForwarderCR, resolving the issue. (LOG-4177) - Before this update, when viewing logs within the OpenShift Container Platform web console, selecting a time range by dragging over the histogram didn’t work on the aggregated logs view inside the pod detail. With this update, the time range can be selected by dragging on the histogram in this view. (LOG-4108)
- Before this update, when viewing logs within the OpenShift Container Platform web console, queries longer than 30 seconds timed out. With this update, the timeout value can be configured in the configmap/logging-view-plugin. (LOG-3498)
- Before this update, when viewing logs within the OpenShift Container Platform web console, clicking the more data available option loaded more log entries only the first time it was clicked. With this update, more entries are loaded with each click. (OU-188)
- Before this update, when viewing logs within the OpenShift Container Platform web console, clicking the streaming option would only display the streaming logs message without showing the actual logs. With this update, both the message and the log stream are displayed correctly. (OU-166)
2.1.2.2. CVEs
2.1.3. Logging 5.7.2
This release includes OpenShift Logging Bug Fix Release 5.7.2.
2.1.3.1. Bug fixes
-
Before this update, it was not possible to delete the
openshift-loggingnamespace directly due to the presence of a pending finalizer. With this update, the finalizer is no longer utilized, enabling direct deletion of the namespace. (LOG-3316) -
Before this update, the
run.shscript would display an incorrectchunk_limit_sizevalue if it was changed according to the OpenShift Container Platform documentation. However, when setting thechunk_limit_sizevia the environment variable$BUFFER_SIZE_LIMIT, the script would show the correct value. With this update, therun.shscript now consistently displays the correctchunk_limit_sizevalue in both scenarios. (LOG-3330) - Before this update, the OpenShift Container Platform web console’s logging view plugin did not allow for custom node placement or tolerations. This update adds the ability to define node placement and tolerations for the logging view plugin. (LOG-3749)
-
Before this update, the Cluster Logging Operator encountered an Unsupported Media Type exception when trying to send logs to DataDog via the Fluentd HTTP Plugin. With this update, users can seamlessly assign the content type for log forwarding by configuring the HTTP header Content-Type. The value provided is automatically assigned to the
content_typeparameter within the plugin, ensuring successful log transmission. (LOG-3784) -
Before this update, when the
detectMultilineErrorsfield was set totruein theClusterLogForwardercustom resource (CR), PHP multi-line errors were recorded as separate log entries, causing the stack trace to be split across multiple messages. With this update, multi-line error detection for PHP is enabled, ensuring that the entire stack trace is included in a single log message. (LOG-3878) -
Before this update,
ClusterLogForwarderpipelines containing a space in their name caused the Vector collector pods to continuously crash. With this update, all spaces, dashes (-), and dots (.) in pipeline names are replaced with underscores (_). (LOG-3945) -
Before this update, the
log_forwarder_outputmetric did not include thehttpparameter. This update adds the missing parameter to the metric. (LOG-3997) - Before this update, Fluentd did not identify some multi-line JavaScript client exceptions when they ended with a colon. With this update, the Fluentd buffer name is prefixed with an underscore, resolving the issue. (LOG-4019)
- Before this update, when configuring log forwarding to write to a Kafka output topic which matched a key in the payload, logs dropped due to an error. With this update, Fluentd’s buffer name has been prefixed with an underscore, resolving the issue.(LOG-4027)
- Before this update, the LokiStack gateway returned label values for namespaces without applying the access rights of a user. With this update, the LokiStack gateway applies permissions to label value requests, resolving the issue. (LOG-4049)
Before this update, the Cluster Logging Operator API required a certificate to be provided by a secret when the
tls.insecureSkipVerifyoption was set totrue. With this update, the Cluster Logging Operator API no longer requires a certificate to be provided by a secret in such cases. The following configuration has been added to the Operator’s CR:tls.verify_certificate = false tls.verify_hostname = false
(LOG-3445)
-
Before this update, the LokiStack route configuration caused queries running longer than 30 seconds to timeout. With this update, the LokiStack global and per-tenant
queryTimeoutsettings affect the route timeout settings, resolving the issue. (LOG-4052) -
Before this update, a prior fix to remove defaulting of the
collection.typeresulted in the Operator no longer honoring the deprecated specs for resource, node selections, and tolerations. This update modifies the Operator behavior to always prefer thecollection.logsspec over those ofcollection. This varies from previous behavior that allowed using both the preferred fields and deprecated fields but would ignore the deprecated fields whencollection.typewas populated. (LOG-4185) - Before this update, the Vector log collector did not generate TLS configuration for forwarding logs to multiple Kafka brokers if the broker URLs were not specified in the output. With this update, TLS configuration is generated appropriately for multiple brokers. (LOG-4163)
- Before this update, the option to enable passphrase for log forwarding to Kafka was unavailable. This limitation presented a security risk as it could potentially expose sensitive information. With this update, users now have a seamless option to enable passphrase for log forwarding to Kafka. (LOG-3314)
-
Before this update, Vector log collector did not honor the
tlsSecurityProfilesettings for outgoing TLS connections. After this update, Vector handles TLS connection settings appropriately. (LOG-4011) -
Before this update, not all available output types were included in the
log_forwarder_output_infometrics. With this update, metrics contain Splunk and Google Cloud Logging data which was missing previously. (LOG-4098) -
Before this update, when
follow_inodeswas set totrue, the Fluentd collector could crash on file rotation. With this update, thefollow_inodessetting does not crash the collector. (LOG-4151) - Before this update, the Fluentd collector could incorrectly close files that should be watched because of how those files were tracked. With this update, the tracking parameters have been corrected. (LOG-4149)
Before this update, forwarding logs with the Vector collector and naming a pipeline in the
ClusterLogForwarderinstanceaudit,applicationorinfrastructureresulted in collector pods staying in theCrashLoopBackOffstate with the following error in the collector log:ERROR vector::cli: Configuration error. error=redefinition of table transforms.audit for key transforms.audit
After this update, pipeline names no longer clash with reserved input names, and pipelines can be named
audit,applicationorinfrastructure. (LOG-4218)-
Before this update, when forwarding logs to a syslog destination with the Vector collector and setting the
addLogSourceflag totrue, the following extra empty fields were added to the forwarded messages:namespace_name=,container_name=, andpod_name=. With this update, these fields are no longer added to journal logs. (LOG-4219) -
Before this update, when a
structuredTypeKeywas not found, and astructuredTypeNamewas not specified, log messages were still parsed into structured object. With this update, parsing of logs is as expected. (LOG-4220)
2.1.3.2. CVEs
- CVE-2021-26341
- CVE-2021-33655
- CVE-2021-33656
- CVE-2022-1462
- CVE-2022-1679
- CVE-2022-1789
- CVE-2022-2196
- CVE-2022-2663
- CVE-2022-3028
- CVE-2022-3239
- CVE-2022-3522
- CVE-2022-3524
- CVE-2022-3564
- CVE-2022-3566
- CVE-2022-3567
- CVE-2022-3619
- CVE-2022-3623
- CVE-2022-3625
- CVE-2022-3627
- CVE-2022-3628
- CVE-2022-3707
- CVE-2022-3970
- CVE-2022-4129
- CVE-2022-20141
- CVE-2022-25147
- CVE-2022-25265
- CVE-2022-30594
- CVE-2022-36227
- CVE-2022-39188
- CVE-2022-39189
- CVE-2022-41218
- CVE-2022-41674
- CVE-2022-42703
- CVE-2022-42720
- CVE-2022-42721
- CVE-2022-42722
- CVE-2022-43750
- CVE-2022-47929
- CVE-2023-0394
- CVE-2023-0461
- CVE-2023-1195
- CVE-2023-1582
- CVE-2023-2491
- CVE-2023-22490
- CVE-2023-23454
- CVE-2023-23946
- CVE-2023-25652
- CVE-2023-25815
- CVE-2023-27535
- CVE-2023-29007
2.1.4. Logging 5.7.1
This release includes: OpenShift Logging Bug Fix Release 5.7.1.
2.1.4.1. Bug fixes
- Before this update, the presence of numerous noisy messages within the Cluster Logging Operator pod logs caused reduced log readability, and increased difficulty in identifying important system events. With this update, the issue is resolved by significantly reducing the noisy messages within Cluster Logging Operator pod logs. (LOG-3482)
-
Before this update, the API server would reset the value for the
CollectorSpec.Typefield tovector, even when the custom resource used a different value. This update removes the default for theCollectorSpec.Typefield to restore the previous behavior. (LOG-4086) - Before this update, a time range could not be selected in the OpenShift Container Platform web console by clicking and dragging over the logs histogram. With this update, clicking and dragging can be used to successfully select a time range. (LOG-4501)
- Before this update, clicking on the Show Resources link in the OpenShift Container Platform web console did not produce any effect. With this update, the issue is resolved by fixing the functionality of the "Show Resources" link to toggle the display of resources for each log entry. (LOG-3218)
2.1.4.2. CVEs
2.1.5. Logging 5.7.0
This release includes OpenShift Logging Bug Fix Release 5.7.0.
2.1.5.1. Enhancements
With this update, you can enable logging to detect multi-line exceptions and reassemble them into a single log entry.
To enable logging to detect multi-line exceptions and reassemble them into a single log entry, ensure that the ClusterLogForwarder Custom Resource (CR) contains a detectMultilineErrors field, with a value of true.
2.1.5.2. Known Issues
None.
2.1.5.3. Bug fixes
-
Before this update, the
nodeSelectorattribute for the Gateway component of the LokiStack did not impact node scheduling. With this update, thenodeSelectorattribute works as expected. (LOG-3713)
2.1.5.4. CVEs
2.2. Getting started with logging 5.7
This overview of the logging deployment process is provided for ease of reference. It is not a substitute for full documentation. For new installations, Vector and LokiStack are recommended.
As of logging version 5.5, you have the option of choosing from Fluentd or Vector collector implementations, and Elasticsearch or LokiStack as log stores. Documentation for logging is in the process of being updated to reflect these underlying component changes.
Logging is provided as an installable component, with a distinct release cycle from the core OpenShift Container Platform. The Red Hat OpenShift Container Platform Life Cycle Policy outlines release compatibility.
Prerequisites
- Log store preference: Elasticsearch or LokiStack
- Collector implementation preference: Fluentd or Vector
- Credentials for your log forwarding outputs
As of logging version 5.4.3 the OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat will provide bug fixes and support for this feature during the current release lifecycle, but this feature will no longer receive enhancements and will be removed. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator.
Install the Operator for the log store you’d like to use.
- For Elasticsearch, install the OpenShift Elasticsearch Operator.
For LokiStack, install the Loki Operator.
NoteIf you have installed the Loki Operator, create a
LokiStackcustom resource (CR) instance as well.
- Install the Red Hat OpenShift Logging Operator.
Create a
ClusterLoggingCR instance.- Select your collector implementation.
2.3. Understanding logging architecture
The logging subsystem consists of these logical components:
-
Collector- Reads container log data from each node and forwards log data to configured outputs. -
Store- Stores log data for analysis; the default output for the forwarder. -
Visualization- Graphical interface for searching, querying, and viewing stored logs.
These components are managed by Operators and Custom Resource (CR) YAML files.
The logging subsystem for Red Hat OpenShift collects container logs and node logs. These are categorized into types:
-
application- Container logs generated by non-infrastructure containers. -
infrastructure- Container logs from namespaceskube-*andopenshift-\*, and node logs fromjournald. -
audit- Logs fromauditd,kube-apiserver,openshift-apiserver, andovnif enabled.
The logging collector is a daemonset that deploys pods to each OpenShift Container Platform node. System and infrastructure logs are generated by journald log messages from the operating system, the container runtime, and OpenShift Container Platform.
Container logs are generated by containers running in pods running on the cluster. Each container generates a separate log stream. The collector collects the logs from these sources and forwards them internally or externally as configured in the ClusterLogForwarder custom resource.
2.3.1. Support considerations for logging
Logging is provided as an installable component, with a distinct release cycle from the core OpenShift Container Platform. The Red Hat OpenShift Container Platform Life Cycle Policy outlines release compatibility.
The supported way of configuring the logging subsystem for Red Hat OpenShift is by configuring it using the options described in this documentation. Do not use other configurations, as they are unsupported. Configuration paradigms might change across OpenShift Container Platform releases, and such cases can only be handled gracefully if all configuration possibilities are controlled. If you use configurations other than those described in this documentation, your changes will disappear because the Operators reconcile any differences. The Operators reverse everything to the defined state by default and by design.
If you must perform configurations not described in the OpenShift Container Platform documentation, you must set your Red Hat OpenShift Logging Operator to Unmanaged. An unmanaged OpenShift Logging environment is not supported and does not receive updates until you return OpenShift Logging to Managed.
The following modifications are explicitly not supported:
- Deploying logging to namespaces not specified in the documentation.
- Installing custom Elasticsearch, Kibana, Fluentd, or Loki instances on OpenShift Container Platform.
- Changes to the Kibana Custom Resource (CR) or Elasticsearch CR.
- Changes to secrets or config maps not specified in the documentation.
The logging subsystem for Red Hat OpenShift is an opinionated collector and normalizer of application, infrastructure, and audit logs. It is intended to be used for forwarding logs to various supported systems.
The logging subsystem for Red Hat OpenShift is not:
- A high scale log collection system
- Security Information and Event Monitoring (SIEM) compliant
- Historical or long term log retention or storage
- A guaranteed log sink
- Secure storage - audit logs are not stored by default
Table 2.1. Logging 5.7 outputs
| Output | Protocol | Tested with | Fluentd | Vector |
|---|---|---|---|---|
| Cloudwatch | REST over HTTP(S) | ✓ | ✓ | |
| Elasticsearch v6 | v6.8.1 | ✓ | ✓ | |
| Elasticsearch v7 | v7.12.2, 7.17.7 | ✓ | ✓ | |
| Elasticsearch v8 | v8.4.3 | ✓ | ✓ | |
| Fluent Forward | Fluentd forward v1 | Fluentd 1.14.6, Logstash 7.10.1 | ✓ | |
| Google Cloud Logging | ✓ | |||
| HTTP | HTTP 1.1 | Fluentd 1.14.6, Vector 0.21 | ✓ | ✓ |
| Kafka | Kafka 0.11 | Kafka 2.4.1, 2.7.0, 3.3.1 | ✓ | ✓ |
| Loki | REST over HTTP(S) | Loki 2.3.0, 2.7 | ✓ | ✓ |
| Splunk | HEC | v8.2.9, 9.0.0 | ✓ | |
| Syslog | RFC3164, RFC5424 | Rsyslog 8.37.0-9.el7 | ✓ | ✓ |
2.4. Configuring your logging deployment
You can configure your logging subsystem deployment with Custom Resource (CR) YAML files implemented by each Operator.
Red Hat Openshift Logging Operator:
-
ClusterLogging(CL) - Deploys the collector and forwarder which currently are both implemented by a daemonset running on each node. -
ClusterLogForwarder(CLF) - Generates collector configuration to forward logs per user configuration.
Loki Operator:
-
LokiStack- Controls the Loki cluster as log store and the web proxy with OpenShift Container Platform authentication integration to enforce multi-tenancy.
OpenShift Elasticsearch Operator:
These CRs are generated and managed by the ClusterLogging Operator, manual changes cannot be made without being overwritten by the Operator.
-
ElasticSearch- Configure and deploy an Elasticsearch instance as the default log store. -
Kibana- Configure and deploy Kibana instance to search, query and view logs.
The supported way of configuring the logging subsystem for Red Hat OpenShift is by configuring it using the options described in this documentation. Do not use other configurations, as they are unsupported. Configuration paradigms might change across OpenShift Container Platform releases, and such cases can only be handled gracefully if all configuration possibilities are controlled. If you use configurations other than those described in this documentation, your changes will disappear because the Operators reconcile any differences. The Operators reverse everything to the defined state by default and by design.
If you must perform configurations not described in the OpenShift Container Platform documentation, you must set your Red Hat OpenShift Logging Operator to Unmanaged. An unmanaged OpenShift Logging environment is not supported and does not receive updates until you return OpenShift Logging to Managed.
2.4.1. Enabling stream-based retention with Loki
With Logging version 5.6 and higher, you can configure retention policies based on log streams. Rules for these may be set globally, per tenant, or both. If you configure both, tenant rules apply before global rules.
-
To enable stream-based retention, create or edit the
LokiStackcustom resource (CR):
oc create -f <file-name>.yaml
- You can refer to the examples below to configure your LokiStack CR.
Example global stream-based retention
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
limits:
global: 1
retention: 2
days: 20
streams:
- days: 4
priority: 1
selector: '{kubernetes_namespace_name=~"test.+"}' 3
- days: 1
priority: 1
selector: '{log_type="infrastructure"}'
managementState: Managed
replicationFactor: 1
size: 1x.small
storage:
schemas:
- effectiveDate: "2020-10-11"
version: v11
secret:
name: logging-loki-s3
type: aws
storageClassName: standard
tenants:
mode: openshift-logging
- 1
- Sets retention policy for all log streams. Note: This field does not impact the retention period for stored logs in object storage.
- 2
- Retention is enabled in the cluster when this block is added to the CR.
- 3
- Contains the LogQL query used to define the log stream.
Example per-tenant stream-based retention
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: logging-loki
namespace: openshift-logging
spec:
limits:
global:
retention:
days: 20
tenants: 1
application:
retention:
days: 1
streams:
- days: 4
selector: '{kubernetes_namespace_name=~"test.+"}' 2
infrastructure:
retention:
days: 5
streams:
- days: 1
selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
managementState: Managed
replicationFactor: 1
size: 1x.small
storage:
schemas:
- effectiveDate: "2020-10-11"
version: v11
secret:
name: logging-loki-s3
type: aws
storageClassName: standard
tenants:
mode: openshift-logging
- 1
- Sets retention policy by tenant. Valid tenant types are
application,audit, andinfrastructure. - 2
- Contains the LogQL query used to define the log stream.
- Then apply your configuration:
oc apply -f <file-name>.yaml
This is not for managing the retention for stored logs. Global retention periods for stored logs to a supported maximum of 30 days is configured with your object storage.
2.4.2. Enabling multi-line exception detection
Enables multi-line error detection of container logs.
Enabling this feature could have performance implications and may require additional computing resources or alternate logging solutions.
Log parsers often incorrectly identify separate lines of the same exception as separate exceptions. This leads to extra log entries and an incomplete or inaccurate view of the traced information.
Example java exception
java.lang.NullPointerException: Cannot invoke "String.toString()" because "<param1>" is null
at testjava.Main.handle(Main.java:47)
at testjava.Main.printMe(Main.java:19)
at testjava.Main.main(Main.java:10)
-
To enable logging to detect multi-line exceptions and reassemble them into a single log entry, ensure that the
ClusterLogForwarderCustom Resource (CR) contains adetectMultilineErrorsfield, with a value oftrue.
Example ClusterLogForwarder CR
apiVersion: logging.openshift.io/v1
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
pipelines:
- name: my-app-logs
inputRefs:
- application
outputRefs:
- default
detectMultilineErrors: true
2.4.2.1. Details
When log messages appear as a consecutive sequence forming an exception stack trace, they are combined into a single, unified log record. The first log message’s content is replaced with the concatenated content of all the message fields in the sequence.
Table 2.2. Supported languages per collector:
| Language | Fluentd | Vector |
|---|---|---|
| Java | ✓ | ✓ |
| JS | ✓ | ✓ |
| Ruby | ✓ | ✓ |
| Python | ✓ | ✓ |
| Golang | ✓ | ✓ |
| PHP | ✓ | |
| Dart | ✓ | ✓ |
2.4.2.2. Troubleshooting
When enabled, the collector configuration will include a new section with type: detect_exceptions
Example vector configuration section
[transforms.detect_exceptions_app-logs] type = "detect_exceptions" inputs = ["application"] languages = ["All"] group_by = ["kubernetes.namespace_name","kubernetes.pod_name","kubernetes.container_name"] expire_after_ms = 2000 multiline_flush_interval_ms = 1000
Example fluentd config section
<label @MULTILINE_APP_LOGS>
<match kubernetes.**>
@type detect_exceptions
remove_tag_prefix 'kubernetes'
message message
force_line_breaks true
multiline_flush_interval .2
</match>
</label>
2.4.3. Enabling log based alerts with Loki
Loki alerting rules use LogQL and follow Prometheus formatting. You can set log based alerts by creating an AlertingRule custom resource (CR). AlertingRule CRs may be created for application, audit, or infrastructure tenants.
| Tenant type | Valid namespaces |
|---|---|
| application | |
| audit |
|
| infrastructure |
|
Application, Audit, and Infrastructure alerts are sent to the Cluster Monitoring Operator (CMO) Alertmanager in the openshift-monitoring namespace by default unless you have disabled the local Alertmanager instance.
Application alerts are not sent to the CMO Alertmanager in the openshift-user-workload-monitoring namespace by default unless you have enabled a separate Alertmanager instance.
The AlertingRule CR contains a set of specifications and webhook validation definitions to declare groups of alerting rules for a single LokiStack instance. In addition, the webhook validation definition provides support for rule validation conditions:
-
If an
AlertingRuleCR includes an invalidintervalperiod, it is an invalid alerting rule -
If an
AlertingRuleCR includes an invalidforperiod, it is an invalid alerting rule. -
If an
AlertingRuleCR includes an invalid LogQLexpr, it is an invalid alerting rule. -
If an
AlertingRuleCR includes two groups with the same name, it is an invalid alerting rule. -
If none of above applies, an
AlertingRuleis considered a valid alerting rule.
Prerequisites
- Logging subsystem for Red Hat OpenShift Operator 5.7 and later
- OpenShift Container Platform 4.13 and later
Procedure
Create an AlertingRule CR:
oc create -f <file-name>.yaml
Populate your AlertingRule CR using the appropriate example below:
Example infrastructure AlertingRule CR
apiVersion: loki.grafana.com/v1 kind: AlertingRule metadata: name: loki-operator-alerts namespace: openshift-operators-redhat 1 labels: 2 openshift.io/cluster-monitoring: "true" spec: tenantID: "infrastructure" 3 groups: - name: LokiOperatorHighReconciliationError rules: - alert: HighPercentageError expr: | 4 sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job) / sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job) > 0.01 for: 10s labels: severity: critical 5 annotations: summary: High Loki Operator Reconciliation Errors 6 description: High Loki Operator Reconciliation Errors 7- 1
- The
namespacewhere this AlertingRule is created must have a label matching the LokiStackspec.rules.namespaceSelectordefinition. - 2
- The
labelsblock must match the LokiStackspec.rules.selectordefinition. - 3
- AlertingRules for
infrastructuretenants are only supported in theopenshift-*,kube-\*, ordefaultnamespaces. - 4
- Value for
kubernetes_namespace_name:must match the value formetadata.namespace. - 5
- Mandatory field. Must be
critical,warning, orinfo. - 6
- Mandatory field.
- 7
- Mandatory field.
Example application AlertingRule CR
apiVersion: loki.grafana.com/v1 kind: AlertingRule metadata: name: app-user-workload namespace: app-ns 1 labels: 2 openshift.io/cluster-monitoring: "true" spec: tenantID: "application" groups: - name: AppUserWorkloadHighError rules: - alert: expr: | 3 sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job) for: 10s labels: severity: critical 4 annotations: summary: 5 description: 6- 1
- The
namespacewhere this AlertingRule is created must have a label matching the LokiStackspec.rules.namespaceSelectordefinition. - 2
- The
labelsblock must match the LokiStackspec.rules.selectordefinition. - 3
- Value for
kubernetes_namespace_name:must match the value formetadata.namespace. - 4
- Mandatory field. Must be
critical,warning, orinfo. - 5
- Mandatory field. Summary of the rule.
- 6
- Mandatory field. Detailed description of the rule.
Apply the CR.
oc apply -f <file-name>.yaml
2.5. Administering your logging deployment
2.5.1. Deploying Red Hat OpenShift Logging Operator using the web console
You can use the OpenShift Container Platform web console to deploy the Red Hat OpenShift Logging Operator.
Logging is provided as an installable component, with a distinct release cycle from the core OpenShift Container Platform. The Red Hat OpenShift Container Platform Life Cycle Policy outlines release compatibility.
Procedure
To deploy the Red Hat OpenShift Logging Operator using the OpenShift Container Platform web console:
Install the Red Hat OpenShift Logging Operator:
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Type Logging in the Filter by keyword field.
- Choose Red Hat OpenShift Logging from the list of available Operators, and click Install.
Select stable or stable-5.y as the Update Channel.
NoteThe
stablechannel only provides updates to the most recent release of logging. To continue receiving updates for prior releases, you must change your subscription channel tostable-XwhereXis the version of logging you have installed.- Ensure that A specific namespace on the cluster is selected under Installation Mode.
- Ensure that Operator recommended namespace is openshift-logging under Installed Namespace.
- Select Enable Operator recommended cluster monitoring on this Namespace.
Select an option for Update approval.
- The Automatic option allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
- The Manual option requires a user with appropriate credentials to approve the Operator update.
- Select Enable or Disable for the Console plugin.
- Click Install.
Verify that the Red Hat OpenShift Logging Operator is installed by switching to the Operators → Installed Operators page.
- Ensure that Red Hat OpenShift Logging is listed in the openshift-logging project with a Status of Succeeded.
Create a ClusterLogging instance.
NoteThe form view of the web console does not include all available options. The YAML view is recommended for completing your setup.
In the collection section, select a Collector Implementation.
NoteAs of logging version 5.6 Fluentd is deprecated and is planned to be removed in a future release. Red Hat will provide bug fixes and support for this feature during the current release lifecycle, but this feature will no longer receive enhancements and will be removed. As an alternative to Fluentd, you can use Vector instead.
In the logStore section, select a type.
NoteAs of logging version 5.4.3 the OpenShift Elasticsearch Operator is deprecated and is planned to be removed in a future release. Red Hat will provide bug fixes and support for this feature during the current release lifecycle, but this feature will no longer receive enhancements and will be removed. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator.
- Click Create.
2.5.2. Deploying the Loki Operator using the web console
You can use the OpenShift Container Platform web console to install the Loki Operator.
Prerequisites
- Supported Log Store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation)
Procedure
To install the Loki Operator using the OpenShift Container Platform web console:
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
Type Loki in the Filter by keyword field.
- Choose Loki Operator from the list of available Operators, and click Install.
Select stable or stable-5.y as the Update Channel.
NoteThe
stablechannel only provides updates to the most recent release of logging. To continue receiving updates for prior releases, you must change your subscription channel tostable-XwhereXis the version of logging you have installed.- Ensure that All namespaces on the cluster is selected under Installation Mode.
- Ensure that openshift-operators-redhat is selected under Installed Namespace.
Select Enable Operator recommended cluster monitoring on this Namespace.
This option sets the
openshift.io/cluster-monitoring: "true"label in the Namespace object. You must select this option to ensure that cluster monitoring scrapes theopenshift-operators-redhatnamespace.Select an option for Update approval.
- The Automatic option allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
- The Manual option requires a user with appropriate credentials to approve the Operator update.
- Click Install.
Verify that the LokiOperator installed by switching to the Operators → Installed Operators page.
- Ensure that LokiOperator is listed with Status as Succeeded in all the projects.
Create a
SecretYAML file that uses theaccess_key_idandaccess_key_secretfields to specify your credentials andbucketnames,endpoint, andregionto define the object storage location. AWS is used in the following example:apiVersion: v1 kind: Secret metadata: name: logging-loki-s3 namespace: openshift-logging stringData: access_key_id: AKIAIOSFODNN7EXAMPLE access_key_secret: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY bucketnames: s3-bucket-name endpoint: https://s3.eu-central-1.amazonaws.com region: eu-central-1
Select Create instance under LokiStack on the Details tab. Then select YAML view. Paste in the following template, subsituting values where appropriate.
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki 1 namespace: openshift-logging spec: size: 1x.small 2 storage: schemas: - version: v12 effectiveDate: '2022-06-01' secret: name: logging-loki-s3 3 type: s3 4 storageClassName: <storage_class_name> 5 tenants: mode: openshift-logging- 1
- Name should be
logging-loki. - 2
- Select your Loki deployment size.
- 3
- Define the secret used for your log storage.
- 4
- Define corresponding storage type.
- 5
- Enter the name of an existing storage class for temporary storage. For best performance, specify a storage class that allocates block storage. Available storage classes for your cluster can be listed using
oc get storageclasses.Apply the configuration:
oc apply -f logging-loki.yaml
Create or edit a
ClusterLoggingCR:apiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: name: instance namespace: openshift-logging spec: managementState: Managed logStore: type: lokistack lokistack: name: logging-loki collection: type: vectorApply the configuration:
oc apply -f cr-lokistack.yaml
2.5.3. Installing from OperatorHub using the CLI
Instead of using the OpenShift Container Platform web console, you can install an Operator from OperatorHub by using the CLI. Use the oc command to create or update a Subscription object.
Prerequisites
-
Access to an OpenShift Container Platform cluster using an account with
cluster-adminpermissions. -
Install the
occommand to your local system.
Procedure
View the list of Operators available to the cluster from OperatorHub:
$ oc get packagemanifests -n openshift-marketplace
Example output
NAME CATALOG AGE 3scale-operator Red Hat Operators 91m advanced-cluster-management Red Hat Operators 91m amq7-cert-manager Red Hat Operators 91m ... couchbase-enterprise-certified Certified Operators 91m crunchy-postgres-operator Certified Operators 91m mongodb-enterprise Certified Operators 91m ... etcd Community Operators 91m jaeger Community Operators 91m kubefed Community Operators 91m ...
Note the catalog for your desired Operator.
Inspect your desired Operator to verify its supported install modes and available channels:
$ oc describe packagemanifests <operator_name> -n openshift-marketplace
An Operator group, defined by an
OperatorGroupobject, selects target namespaces in which to generate required RBAC access for all Operators in the same namespace as the Operator group.The namespace to which you subscribe the Operator must have an Operator group that matches the install mode of the Operator, either the
AllNamespacesorSingleNamespacemode. If the Operator you intend to install uses theAllNamespaces, then theopenshift-operatorsnamespace already has an appropriate Operator group in place.However, if the Operator uses the
SingleNamespacemode and you do not already have an appropriate Operator group in place, you must create one.NoteThe web console version of this procedure handles the creation of the
OperatorGroupandSubscriptionobjects automatically behind the scenes for you when choosingSingleNamespacemode.Create an
OperatorGroupobject YAML file, for exampleoperatorgroup.yaml:Example
OperatorGroupobjectapiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: <operatorgroup_name> namespace: <namespace> spec: targetNamespaces: - <namespace>
Create the
OperatorGroupobject:$ oc apply -f operatorgroup.yaml
Create a
Subscriptionobject YAML file to subscribe a namespace to an Operator, for examplesub.yaml:Example
SubscriptionobjectapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: <subscription_name> namespace: openshift-operators 1 spec: channel: <channel_name> 2 name: <operator_name> 3 source: redhat-operators 4 sourceNamespace: openshift-marketplace 5 config: env: 6 - name: ARGS value: "-v=10" envFrom: 7 - secretRef: name: license-secret volumes: 8 - name: <volume_name> configMap: name: <configmap_name> volumeMounts: 9 - mountPath: <directory_name> name: <volume_name> tolerations: 10 - operator: "Exists" resources: 11 requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m" nodeSelector: 12 foo: bar
- 1
- For default
AllNamespacesinstall mode usage, specify theopenshift-operatorsnamespace. Alternatively, you can specify a custom global namespace, if you have created one. Otherwise, specify the relevant single namespace forSingleNamespaceinstall mode usage. - 2
- Name of the channel to subscribe to.
- 3
- Name of the Operator to subscribe to.
- 4
- Name of the catalog source that provides the Operator.
- 5
- Namespace of the catalog source. Use
openshift-marketplacefor the default OperatorHub catalog sources. - 6
- The
envparameter defines a list of Environment Variables that must exist in all containers in the pod created by OLM. - 7
- The
envFromparameter defines a list of sources to populate Environment Variables in the container. - 8
- The
volumesparameter defines a list of Volumes that must exist on the pod created by OLM. - 9
- The
volumeMountsparameter defines a list of VolumeMounts that must exist in all containers in the pod created by OLM. If avolumeMountreferences avolumethat does not exist, OLM fails to deploy the Operator. - 10
- The
tolerationsparameter defines a list of Tolerations for the pod created by OLM. - 11
- The
resourcesparameter defines resource constraints for all the containers in the pod created by OLM. - 12
- The
nodeSelectorparameter defines aNodeSelectorfor the pod created by OLM.
Create the
Subscriptionobject:$ oc apply -f sub.yaml
At this point, OLM is now aware of the selected Operator. A cluster service version (CSV) for the Operator should appear in the target namespace, and APIs provided by the Operator should be available for creation.
2.5.4. Deleting Operators from a cluster using the web console
Cluster administrators can delete installed Operators from a selected namespace by using the web console.
Prerequisites
-
You have access to an OpenShift Container Platform cluster web console using an account with
cluster-adminpermissions.
Procedure
- Navigate to the Operators → Installed Operators page.
- Scroll or enter a keyword into the Filter by name field to find the Operator that you want to remove. Then, click on it.
On the right side of the Operator Details page, select Uninstall Operator from the Actions list.
An Uninstall Operator? dialog box is displayed.
Select Uninstall to remove the Operator, Operator deployments, and pods. Following this action, the Operator stops running and no longer receives updates.
NoteThis action does not remove resources managed by the Operator, including custom resource definitions (CRDs) and custom resources (CRs). Dashboards and navigation items enabled by the web console and off-cluster resources that continue to run might need manual clean up. To remove these after uninstalling the Operator, you might need to manually delete the Operator CRDs.
2.5.5. Deleting Operators from a cluster using the CLI
Cluster administrators can delete installed Operators from a selected namespace by using the CLI.
Prerequisites
-
Access to an OpenShift Container Platform cluster using an account with
cluster-adminpermissions. -
occommand installed on workstation.
Procedure
Check the current version of the subscribed Operator (for example,
jaeger) in thecurrentCSVfield:$ oc get subscription jaeger -n openshift-operators -o yaml | grep currentCSV
Example output
currentCSV: jaeger-operator.v1.8.2
Delete the subscription (for example,
jaeger):$ oc delete subscription jaeger -n openshift-operators
Example output
subscription.operators.coreos.com "jaeger" deleted
Delete the CSV for the Operator in the target namespace using the
currentCSVvalue from the previous step:$ oc delete clusterserviceversion jaeger-operator.v1.8.2 -n openshift-operators
Example output
clusterserviceversion.operators.coreos.com "jaeger-operator.v1.8.2" deleted