Chapter 6. File Integrity Operator

6.1. File Integrity Operator release notes

The File Integrity Operator for OpenShift Container Platform continually runs file integrity checks on RHCOS nodes.

These release notes track the development of the File Integrity Operator in the OpenShift Container Platform.

For an overview of the File Integrity Operator, see Understanding the File Integrity Operator.

To access the latest release, see Updating the File Integrity Operator.

6.1.1. OpenShift File Integrity Operator 1.3.3

The following advisory is available for the OpenShift File Integrity Operator 1.3.3:

This update addresses a CVE in an underlying dependency.

6.1.1.1. New features and enhancements

  • You can install and use the File Integrity Operator in an OpenShift Container Platform cluster running in FIPS mode.
Important

To enable FIPS mode for your cluster, you must run the installation program from a RHEL computer configured to operate in FIPS mode. For more information about configuring FIPS mode on RHEL, see (Installing the system in FIPS mode)

6.1.1.2. Bug fixes

6.1.2. OpenShift File Integrity Operator 1.3.2

The following advisory is available for the OpenShift File Integrity Operator 1.3.2:

This update addresses a CVE in an underlying dependency.

6.1.3. OpenShift File Integrity Operator 1.3.1

The following advisory is available for the OpenShift File Integrity Operator 1.3.1:

6.1.3.1. New features and enhancements

  • FIO now includes kubelet certificates as default files, excluding them from issuing warnings when they’re managed by OpenShift Container Platform. (OCPBUGS-14348)
  • FIO now correctly directs email to the address for Red Hat Technical Support. (OCPBUGS-5023)

6.1.3.2. Bug fixes

  • Previously, FIO would not clean up FileIntegrityNodeStatus CRDs when nodes are removed from the cluster. FIO has been updated to correctly clean up node status CRDs on node removal. (OCPBUGS-4321)
  • Previously, FIO would also erroneously indicate that new nodes failed integrity checks. FIO has been updated to correctly show node status CRDs when adding new nodes to the cluster. This provides correct node status notifications. (OCPBUGS-8502)
  • Previously, when FIO was reconciling FileIntegrity CRDs, it would pause scanning until the reconciliation was done. This caused an overly aggressive re-initiatization process on nodes not impacted by the reconciliation. This problem also resulted in unnecessary daemonsets for machine config pools which are unrelated to the FileIntegrity being changed. FIO correctly handles these cases and only pauses AIDE scanning for nodes that are affected by file integrity changes. (CMP-1097)

6.1.3.3. Known Issues

In FIO 1.3.1, increasing nodes in IBM Z® clusters might result in Failed File Integrity node status. For more information, see Adding nodes in IBM Power® clusters can result in failed File Integrity node status.

6.1.4. OpenShift File Integrity Operator 1.2.1

The following advisory is available for the OpenShift File Integrity Operator 1.2.1:

6.1.5. OpenShift File Integrity Operator 1.2.0

The following advisory is available for the OpenShift File Integrity Operator 1.2.0:

6.1.5.1. New features and enhancements

6.1.6. OpenShift File Integrity Operator 1.0.0

The following advisory is available for the OpenShift File Integrity Operator 1.0.0:

6.1.7. OpenShift File Integrity Operator 0.1.32

The following advisory is available for the OpenShift File Integrity Operator 0.1.32:

6.1.7.1. Bug fixes

  • Previously, alerts issued by the File Integrity Operator did not set a namespace, making it difficult to understand from which namespace the alert originated. Now, the Operator sets the appropriate namespace, providing more information about the alert. (BZ#2112394)
  • Previously, The File Integrity Operator did not update the metrics service on Operator startup, causing the metrics targets to be unreachable. With this release, the File Integrity Operator now ensures the metrics service is updated on Operator startup. (BZ#2115821)

6.1.8. OpenShift File Integrity Operator 0.1.30

The following advisory is available for the OpenShift File Integrity Operator 0.1.30:

6.1.8.1. New features and enhancements

  • The File Integrity Operator is now supported on the following architectures:

    • IBM Power®
    • IBM Z® and IBM® LinuxONE

6.1.8.2. Bug fixes

  • Previously, alerts issued by the File Integrity Operator did not set a namespace, making it difficult to understand where the alert originated. Now, the Operator sets the appropriate namespace, increasing understanding of the alert. (BZ#2101393)

6.1.9. OpenShift File Integrity Operator 0.1.24

The following advisory is available for the OpenShift File Integrity Operator 0.1.24:

6.1.9.1. New features and enhancements

  • You can now configure the maximum number of backups stored in the FileIntegrity Custom Resource (CR) with the config.maxBackups attribute. This attribute specifies the number of AIDE database and log backups left over from the re-init process to keep on the node. Older backups beyond the configured number are automatically pruned. The default is set to five backups.

6.1.9.2. Bug fixes

  • Previously, upgrading the Operator from versions older than 0.1.21 to 0.1.22 could cause the re-init feature to fail. This was a result of the Operator failing to update configMap resource labels. Now, upgrading to the latest version fixes the resource labels. (BZ#2049206)
  • Previously, when enforcing the default configMap script contents, the wrong data keys were compared. This resulted in the aide-reinit script not being updated properly after an Operator upgrade, and caused the re-init process to fail. Now,daemonSets run to completion and the AIDE database re-init process executes successfully. (BZ#2072058)

6.1.10. OpenShift File Integrity Operator 0.1.22

The following advisory is available for the OpenShift File Integrity Operator 0.1.22:

6.1.10.1. Bug fixes

  • Previously, a system with a File Integrity Operator installed might interrupt the OpenShift Container Platform update, due to the /etc/kubernetes/aide.reinit file. This occurred if the /etc/kubernetes/aide.reinit file was present, but later removed prior to the ostree validation. With this update, /etc/kubernetes/aide.reinit is moved to the /run directory so that it does not conflict with the OpenShift Container Platform update. (BZ#2033311)

6.1.11. OpenShift File Integrity Operator 0.1.21

The following advisory is available for the OpenShift File Integrity Operator 0.1.21:

6.1.11.1. New features and enhancements

  • The metrics related to FileIntegrity scan results and processing metrics are displayed on the monitoring dashboard on the web console. The results are labeled with the prefix of file_integrity_operator_.
  • If a node has an integrity failure for more than 1 second, the default PrometheusRule provided in the operator namespace alerts with a warning.
  • The following dynamic Machine Config Operator and Cluster Version Operator related filepaths are excluded from the default AIDE policy to help prevent false positives during node updates:

    • /etc/machine-config-daemon/currentconfig
    • /etc/pki/ca-trust/extracted/java/cacerts
    • /etc/cvo/updatepayloads
    • /root/.kube
  • The AIDE daemon process has stability improvements over v0.1.16, and is more resilient to errors that might occur when the AIDE database is initialized.

6.1.11.2. Bug fixes

  • Previously, when the Operator automatically upgraded, outdated daemon sets were not removed. With this release, outdated daemon sets are removed during the automatic upgrade.

6.1.12. Additional resources

6.2. Installing the File Integrity Operator

6.2.1. Installing the File Integrity Operator using the web console

Prerequisites

  • You must have admin privileges.

Procedure

  1. In the OpenShift Container Platform web console, navigate to OperatorsOperatorHub.
  2. Search for the File Integrity Operator, then click Install.
  3. Keep the default selection of Installation mode and namespace to ensure that the Operator will be installed to the openshift-file-integrity namespace.
  4. Click Install.

Verification

To confirm that the installation is successful:

  1. Navigate to the OperatorsInstalled Operators page.
  2. Check that the Operator is installed in the openshift-file-integrity namespace and its status is Succeeded.

If the Operator is not installed successfully:

  1. Navigate to the OperatorsInstalled Operators page and inspect the Status column for any errors or failures.
  2. Navigate to the WorkloadsPods page and check the logs in any pods in the openshift-file-integrity project that are reporting issues.

6.2.2. Installing the File Integrity Operator using the CLI

Prerequisites

  • You must have admin privileges.

Procedure

  1. Create a Namespace object YAML file by running:

    $ oc create -f <file-name>.yaml

    Example output

    apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        openshift.io/cluster-monitoring: "true"
        pod-security.kubernetes.io/enforce: privileged 1
      name: openshift-file-integrity

    1
    In OpenShift Container Platform 4.14, the pod security label must be set to privileged at the namespace level.
  2. Create the OperatorGroup object YAML file:

    $ oc create -f <file-name>.yaml

    Example output

    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: file-integrity-operator
      namespace: openshift-file-integrity
    spec:
      targetNamespaces:
      - openshift-file-integrity

  3. Create the Subscription object YAML file:

    $ oc create -f <file-name>.yaml

    Example output

    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: file-integrity-operator
      namespace: openshift-file-integrity
    spec:
      channel: "stable"
      installPlanApproval: Automatic
      name: file-integrity-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace

Verification

  1. Verify the installation succeeded by inspecting the CSV file:

    $ oc get csv -n openshift-file-integrity
  2. Verify that the File Integrity Operator is up and running:

    $ oc get deploy -n openshift-file-integrity

6.2.3. Additional resources

6.3. Updating the File Integrity Operator

As a cluster administrator, you can update the File Integrity Operator on your OpenShift Container Platform cluster.

6.3.1. Preparing for an Operator update

The subscription of an installed Operator specifies an update channel that tracks and receives updates for the Operator. You can change the update channel to start tracking and receiving updates from a newer channel.

The names of update channels in a subscription can differ between Operators, but the naming scheme typically follows a common convention within a given Operator. For example, channel names might follow a minor release update stream for the application provided by the Operator (1.2, 1.3) or a release frequency (stable, fast).

Note

You cannot change installed Operators to a channel that is older than the current channel.

Red Hat Customer Portal Labs include the following application that helps administrators prepare to update their Operators:

You can use the application to search for Operator Lifecycle Manager-based Operators and verify the available Operator version per update channel across different versions of OpenShift Container Platform. Cluster Version Operator-based Operators are not included.

6.3.2. Changing the update channel for an Operator

You can change the update channel for an Operator by using the OpenShift Container Platform web console.

Tip

If the approval strategy in the subscription is set to Automatic, the update process initiates as soon as a new Operator version is available in the selected channel. If the approval strategy is set to Manual, you must manually approve pending updates.

Prerequisites

  • An Operator previously installed using Operator Lifecycle Manager (OLM).

Procedure

  1. In the Administrator perspective of the web console, navigate to Operators → Installed Operators.
  2. Click the name of the Operator you want to change the update channel for.
  3. Click the Subscription tab.
  4. Click the name of the update channel under Update channel.
  5. Click the newer update channel that you want to change to, then click Save.
  6. For subscriptions with an Automatic approval strategy, the update begins automatically. Navigate back to the Operators → Installed Operators page to monitor the progress of the update. When complete, the status changes to Succeeded and Up to date.

    For subscriptions with a Manual approval strategy, you can manually approve the update from the Subscription tab.

6.3.3. Manually approving a pending Operator update

If an installed Operator has the approval strategy in its subscription set to Manual, when new updates are released in its current update channel, the update must be manually approved before installation can begin.

Prerequisites

  • An Operator previously installed using Operator Lifecycle Manager (OLM).

Procedure

  1. In the Administrator perspective of the OpenShift Container Platform web console, navigate to Operators → Installed Operators.
  2. Operators that have a pending update display a status with Upgrade available. Click the name of the Operator you want to update.
  3. Click the Subscription tab. Any updates requiring approval are displayed next to Upgrade status. For example, it might display 1 requires approval.
  4. Click 1 requires approval, then click Preview Install Plan.
  5. Review the resources that are listed as available for update. When satisfied, click Approve.
  6. Navigate back to the Operators → Installed Operators page to monitor the progress of the update. When complete, the status changes to Succeeded and Up to date.

6.4. Understanding the File Integrity Operator

The File Integrity Operator is an OpenShift Container Platform Operator that continually runs file integrity checks on the cluster nodes. It deploys a daemon set that initializes and runs privileged advanced intrusion detection environment (AIDE) containers on each node, providing a status object with a log of files that are modified during the initial run of the daemon set pods.

Important

Currently, only Red Hat Enterprise Linux CoreOS (RHCOS) nodes are supported.

6.4.1. Creating the FileIntegrity custom resource

An instance of a FileIntegrity custom resource (CR) represents a set of continuous file integrity scans for one or more nodes.

Each FileIntegrity CR is backed by a daemon set running AIDE on the nodes matching the FileIntegrity CR specification.

Procedure

  1. Create the following example FileIntegrity CR named worker-fileintegrity.yaml to enable scans on worker nodes:

    Example FileIntegrity CR

    apiVersion: fileintegrity.openshift.io/v1alpha1
    kind: FileIntegrity
    metadata:
      name: worker-fileintegrity
      namespace: openshift-file-integrity
    spec:
      nodeSelector: 1
          node-role.kubernetes.io/worker: ""
      tolerations: 2
      - key: "myNode"
        operator: "Exists"
        effect: "NoSchedule"
      config: 3
        name: "myconfig"
        namespace: "openshift-file-integrity"
        key: "config"
        gracePeriod: 20 4
        maxBackups: 5 5
        initialDelay: 60 6
      debug: false
    status:
      phase: Active 7

    1
    Defines the selector for scheduling node scans.
    2
    Specify tolerations to schedule on nodes with custom taints. When not specified, a default toleration allowing running on main and infra nodes is applied.
    3
    Define a ConfigMap containing an AIDE configuration to use.
    4
    The number of seconds to pause in between AIDE integrity checks. Frequent AIDE checks on a node might be resource intensive, so it can be useful to specify a longer interval. Default is 900 seconds (15 minutes).
    5
    The maximum number of AIDE database and log backups (leftover from the re-init process) to keep on a node. Older backups beyond this number are automatically pruned by the daemon. Default is set to 5.
    6
    The number of seconds to wait before starting the first AIDE integrity check. Default is set to 0.
    7
    The running status of the FileIntegrity instance. Statuses are Initializing, Pending, or Active.

    Initializing

    The FileIntegrity object is currently initializing or re-initializing the AIDE database.

    Pending

    The FileIntegrity deployment is still being created.

    Active

    The scans are active and ongoing.

  2. Apply the YAML file to the openshift-file-integrity namespace:

    $ oc apply -f worker-fileintegrity.yaml -n openshift-file-integrity

Verification

  • Confirm the FileIntegrity object was created successfully by running the following command:

    $ oc get fileintegrities -n openshift-file-integrity

    Example output

    NAME                   AGE
    worker-fileintegrity   14s

6.4.2. Checking the FileIntegrity custom resource status

The FileIntegrity custom resource (CR) reports its status through the .status.phase subresource.

Procedure

  • To query the FileIntegrity CR status, run:

    $ oc get fileintegrities/worker-fileintegrity  -o jsonpath="{ .status.phase }"

    Example output

    Active

6.4.3. FileIntegrity custom resource phases

  • Pending - The phase after the custom resource (CR) is created.
  • Active - The phase when the backing daemon set is up and running.
  • Initializing - The phase when the AIDE database is being reinitialized.

6.4.4. Understanding the FileIntegrityNodeStatuses object

The scan results of the FileIntegrity CR are reported in another object called FileIntegrityNodeStatuses.

$ oc get fileintegritynodestatuses

Example output

NAME                                                AGE
worker-fileintegrity-ip-10-0-130-192.ec2.internal   101s
worker-fileintegrity-ip-10-0-147-133.ec2.internal   109s
worker-fileintegrity-ip-10-0-165-160.ec2.internal   102s

Note

It might take some time for the FileIntegrityNodeStatus object results to be available.

There is one result object per node. The nodeName attribute of each FileIntegrityNodeStatus object corresponds to the node being scanned. The status of the file integrity scan is represented in the results array, which holds scan conditions.

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq

The fileintegritynodestatus object reports the latest status of an AIDE run and exposes the status as Failed, Succeeded, or Errored in a status field.

$ oc get fileintegritynodestatuses -w

Example output

NAME                                                               NODE                                         STATUS
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal   ip-10-0-134-186.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal   ip-10-0-150-230.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-169-137.us-east-2.compute.internal   ip-10-0-169-137.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal   ip-10-0-180-200.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal    ip-10-0-194-66.us-east-2.compute.internal    Failed
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal   ip-10-0-222-188.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal   ip-10-0-134-186.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal   ip-10-0-222-188.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal    ip-10-0-194-66.us-east-2.compute.internal    Failed
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal   ip-10-0-150-230.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal   ip-10-0-180-200.us-east-2.compute.internal   Succeeded

6.4.5. FileIntegrityNodeStatus CR status types

These conditions are reported in the results array of the corresponding FileIntegrityNodeStatus CR status:

  • Succeeded - The integrity check passed; the files and directories covered by the AIDE check have not been modified since the database was last initialized.
  • Failed - The integrity check failed; some files or directories covered by the AIDE check have been modified since the database was last initialized.
  • Errored - The AIDE scanner encountered an internal error.

6.4.5.1. FileIntegrityNodeStatus CR success example

Example output of a condition with a success status

[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:45:57Z"
  }
]
[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:46:03Z"
  }
]
[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:45:48Z"
  }
]

In this case, all three scans succeeded and so far there are no other conditions.

6.4.5.2. FileIntegrityNodeStatus CR failure status example

To simulate a failure condition, modify one of the files AIDE tracks. For example, modify /etc/resolv.conf on one of the worker nodes:

$ oc debug node/ip-10-0-130-192.ec2.internal

Example output

Creating debug namespace/openshift-debug-node-ldfbj ...
Starting pod/ip-10-0-130-192ec2internal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.130.192
If you don't see a command prompt, try pressing enter.
sh-4.2# echo "# integrity test" >> /host/etc/resolv.conf
sh-4.2# exit

Removing debug pod ...
Removing debug namespace/openshift-debug-node-ldfbj ...

After some time, the Failed condition is reported in the results array of the corresponding FileIntegrityNodeStatus object. The previous Succeeded condition is retained, which allows you to pinpoint the time the check failed.

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io/worker-fileintegrity-ip-10-0-130-192.ec2.internal -ojsonpath='{.results}' | jq -r

Alternatively, if you are not mentioning the object name, run:

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq

Example output

[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:54:14Z"
  },
  {
    "condition": "Failed",
    "filesChanged": 1,
    "lastProbeTime": "2020-09-15T12:57:20Z",
    "resultConfigMapName": "aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed",
    "resultConfigMapNamespace": "openshift-file-integrity"
  }
]

The Failed condition points to a config map that gives more details about what exactly failed and why:

$ oc describe cm aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed

Example output

Name:         aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Namespace:    openshift-file-integrity
Labels:       file-integrity.openshift.io/node=ip-10-0-130-192.ec2.internal
              file-integrity.openshift.io/owner=worker-fileintegrity
              file-integrity.openshift.io/result-log=
Annotations:  file-integrity.openshift.io/files-added: 0
              file-integrity.openshift.io/files-changed: 1
              file-integrity.openshift.io/files-removed: 0

Data

integritylog:
------
AIDE 0.15.1 found differences between database and filesystem!!
Start timestamp: 2020-09-15 12:58:15

Summary:
  Total number of files:  31553
  Added files:                0
  Removed files:            0
  Changed files:            1


---------------------------------------------------
Changed files:
---------------------------------------------------

changed: /hostroot/etc/resolv.conf

---------------------------------------------------
Detailed information about changes:
---------------------------------------------------


File: /hostroot/etc/resolv.conf
 SHA512   : sTQYpB/AL7FeoGtu/1g7opv6C+KT1CBJ , qAeM+a8yTgHPnIHMaRlS+so61EN8VOpg

Events:  <none>

Due to the config map data size limit, AIDE logs over 1 MB are added to the failure config map as a base64-encoded gzip archive. In this case, you want to pipe the output of the above command to base64 --decode | gunzip. Compressed logs are indicated by the presence of a file-integrity.openshift.io/compressed annotation key in the config map.

6.4.6. Understanding events

Transitions in the status of the FileIntegrity and FileIntegrityNodeStatus objects are logged by events. The creation time of the event reflects the latest transition, such as Initializing to Active, and not necessarily the latest scan result. However, the newest event always reflects the most recent status.

$ oc get events --field-selector reason=FileIntegrityStatus

Example output

LAST SEEN   TYPE     REASON                OBJECT                                MESSAGE
97s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Pending
67s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Initializing
37s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Active

When a node scan fails, an event is created with the add/changed/removed and config map information.

$ oc get events --field-selector reason=NodeIntegrityStatus

Example output

LAST SEEN   TYPE      REASON                OBJECT                                MESSAGE
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-134-173.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-168-238.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-169-175.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-152-92.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-158-144.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-131-30.ec2.internal
87m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed

Changes to the number of added, changed, or removed files results in a new event, even if the status of the node has not transitioned.

$ oc get events --field-selector reason=NodeIntegrityStatus

Example output

LAST SEEN   TYPE      REASON                OBJECT                                MESSAGE
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-134-173.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-168-238.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-169-175.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-152-92.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-158-144.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-131-30.ec2.internal
87m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
40m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:3,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed

6.5. Configuring the Custom File Integrity Operator

6.5.1. Viewing FileIntegrity object attributes

As with any Kubernetes custom resources (CRs), you can run oc explain fileintegrity, and then look at the individual attributes using:

$ oc explain fileintegrity.spec
$ oc explain fileintegrity.spec.config

6.5.2. Important attributes

Table 6.1. Important spec and spec.config attributes

AttributeDescription

spec.nodeSelector

A map of key-values pairs that must match with node’s labels in order for the AIDE pods to be schedulable on that node. The typical use is to set only a single key-value pair where node-role.kubernetes.io/worker: "" schedules AIDE on all worker nodes, node.openshift.io/os_id: "rhcos" schedules on all Red Hat Enterprise Linux CoreOS (RHCOS) nodes.

spec.debug

A boolean attribute. If set to true, the daemon running in the AIDE deamon set’s pods would output extra information.

spec.tolerations

Specify tolerations to schedule on nodes with custom taints. When not specified, a default toleration is applied, which allows tolerations to run on control plane nodes.

spec.config.gracePeriod

The number of seconds to pause in between AIDE integrity checks. Frequent AIDE checks on a node can be resource intensive, so it can be useful to specify a longer interval. Defaults to 900, or 15 minutes.

maxBackups

The maximum number of AIDE database and log backups leftover from the re-init process to keep on a node. Older backups beyond this number are automatically pruned by the daemon.

spec.config.name

Name of a configMap that contains custom AIDE configuration. If omitted, a default configuration is created.

spec.config.namespace

Namespace of a configMap that contains custom AIDE configuration. If unset, the FIO generates a default configuration suitable for RHCOS systems.

spec.config.key

Key that contains actual AIDE configuration in a config map specified by name and namespace. The default value is aide.conf.

spec.config.initialDelay

The number of seconds to wait before starting the first AIDE integrity check. Default is set to 0. This attribute is optional.

6.5.3. Examine the default configuration

The default File Integrity Operator configuration is stored in a config map with the same name as the FileIntegrity CR.

Procedure

  • To examine the default config, run:

    $ oc describe cm/worker-fileintegrity

6.5.4. Understanding the default File Integrity Operator configuration

Below is an excerpt from the aide.conf key of the config map:

@@define DBDIR /hostroot/etc/kubernetes
@@define LOGDIR /hostroot/etc/kubernetes
database=file:@@{DBDIR}/aide.db.gz
database_out=file:@@{DBDIR}/aide.db.gz
gzip_dbout=yes
verbose=5
report_url=file:@@{LOGDIR}/aide.log
report_url=stdout
PERMS = p+u+g+acl+selinux+xattrs
CONTENT_EX = sha512+ftype+p+u+g+n+acl+selinux+xattrs

/hostroot/boot/    	CONTENT_EX
/hostroot/root/\..* PERMS
/hostroot/root/   CONTENT_EX

The default configuration for a FileIntegrity instance provides coverage for files under the following directories:

  • /root
  • /boot
  • /usr
  • /etc

The following directories are not covered:

  • /var
  • /opt
  • Some OpenShift Container Platform-specific excludes under /etc/

6.5.5. Supplying a custom AIDE configuration

Any entries that configure AIDE internal behavior such as DBDIR, LOGDIR, database, and database_out are overwritten by the Operator. The Operator would add a prefix to /hostroot/ before all paths to be watched for integrity changes. This makes reusing existing AIDE configs that might often not be tailored for a containerized environment and start from the root directory easier.

Note

/hostroot is the directory where the pods running AIDE mount the host’s file system. Changing the configuration triggers a reinitializing of the database.

6.5.6. Defining a custom File Integrity Operator configuration

This example focuses on defining a custom configuration for a scanner that runs on the control plane nodes based on the default configuration provided for the worker-fileintegrity CR. This workflow might be useful if you are planning to deploy a custom software running as a daemon set and storing its data under /opt/mydaemon on the control plane nodes.

Procedure

  1. Make a copy of the default configuration.
  2. Edit the default configuration with the files that must be watched or excluded.
  3. Store the edited contents in a new config map.
  4. Point the FileIntegrity object to the new config map through the attributes in spec.config.
  5. Extract the default configuration:

    $ oc extract cm/worker-fileintegrity --keys=aide.conf

    This creates a file named aide.conf that you can edit. To illustrate how the Operator post-processes the paths, this example adds an exclude directory without the prefix:

    $ vim aide.conf

    Example output

    /hostroot/etc/kubernetes/static-pod-resources
    !/hostroot/etc/kubernetes/aide.*
    !/hostroot/etc/kubernetes/manifests
    !/hostroot/etc/docker/certs.d
    !/hostroot/etc/selinux/targeted
    !/hostroot/etc/openvswitch/conf.db

    Exclude a path specific to control plane nodes:

    !/opt/mydaemon/

    Store the other content in /etc:

    /hostroot/etc/	CONTENT_EX
  6. Create a config map based on this file:

    $ oc create cm master-aide-conf --from-file=aide.conf
  7. Define a FileIntegrity CR manifest that references the config map:

    apiVersion: fileintegrity.openshift.io/v1alpha1
    kind: FileIntegrity
    metadata:
      name: master-fileintegrity
      namespace: openshift-file-integrity
    spec:
      nodeSelector:
          node-role.kubernetes.io/master: ""
      config:
          name: master-aide-conf
          namespace: openshift-file-integrity

    The Operator processes the provided config map file and stores the result in a config map with the same name as the FileIntegrity object:

    $ oc describe cm/master-fileintegrity | grep /opt/mydaemon

    Example output

    !/hostroot/opt/mydaemon

6.5.7. Changing the custom File Integrity configuration

To change the File Integrity configuration, never change the generated config map. Instead, change the config map that is linked to the FileIntegrity object through the spec.name, namespace, and key attributes.

6.6. Performing advanced Custom File Integrity Operator tasks

6.6.1. Reinitializing the database

If the File Integrity Operator detects a change that was planned, it might be required to reinitialize the database.

Procedure

  • Annotate the FileIntegrity custom resource (CR) with file-integrity.openshift.io/re-init:

    $ oc annotate fileintegrities/worker-fileintegrity file-integrity.openshift.io/re-init=

    The old database and log files are backed up and a new database is initialized. The old database and logs are retained on the nodes under /etc/kubernetes, as seen in the following output from a pod spawned using oc debug:

    Example output

     ls -lR /host/etc/kubernetes/aide.*
    -rw-------. 1 root root 1839782 Sep 17 15:08 /host/etc/kubernetes/aide.db.gz
    -rw-------. 1 root root 1839783 Sep 17 14:30 /host/etc/kubernetes/aide.db.gz.backup-20200917T15_07_38
    -rw-------. 1 root root   73728 Sep 17 15:07 /host/etc/kubernetes/aide.db.gz.backup-20200917T15_07_55
    -rw-r--r--. 1 root root       0 Sep 17 15:08 /host/etc/kubernetes/aide.log
    -rw-------. 1 root root     613 Sep 17 15:07 /host/etc/kubernetes/aide.log.backup-20200917T15_07_38
    -rw-r--r--. 1 root root       0 Sep 17 15:07 /host/etc/kubernetes/aide.log.backup-20200917T15_07_55

    To provide some permanence of record, the resulting config maps are not owned by the FileIntegrity object, so manual cleanup is necessary. As a result, any previous integrity failures would still be visible in the FileIntegrityNodeStatus object.

6.6.2. Machine config integration

In OpenShift Container Platform 4, the cluster node configuration is delivered through MachineConfig objects. You can assume that the changes to files that are caused by a MachineConfig object are expected and should not cause the file integrity scan to fail. To suppress changes to files caused by MachineConfig object updates, the File Integrity Operator watches the node objects; when a node is being updated, the AIDE scans are suspended for the duration of the update. When the update finishes, the database is reinitialized and the scans resume.

This pause and resume logic only applies to updates through the MachineConfig API, as they are reflected in the node object annotations.

6.6.3. Exploring the daemon sets

Each FileIntegrity object represents a scan on a number of nodes. The scan itself is performed by pods managed by a daemon set.

To find the daemon set that represents a FileIntegrity object, run:

$ oc -n openshift-file-integrity get ds/aide-worker-fileintegrity

To list the pods in that daemon set, run:

$ oc -n openshift-file-integrity get pods -lapp=aide-worker-fileintegrity

To view logs of a single AIDE pod, call oc logs on one of the pods.

$ oc -n openshift-file-integrity logs pod/aide-worker-fileintegrity-mr8x6

Example output

Starting the AIDE runner daemon
initializing AIDE db
initialization finished
running aide check
...

The config maps created by the AIDE daemon are not retained and are deleted after the File Integrity Operator processes them. However, on failure and error, the contents of these config maps are copied to the config map that the FileIntegrityNodeStatus object points to.

6.7. Troubleshooting the File Integrity Operator

6.7.1. General troubleshooting

Issue
You want to generally troubleshoot issues with the File Integrity Operator.
Resolution
Enable the debug flag in the FileIntegrity object. The debug flag increases the verbosity of the daemons that run in the DaemonSet pods and run the AIDE checks.

6.7.2. Checking the AIDE configuration

Issue
You want to check the AIDE configuration.
Resolution
The AIDE configuration is stored in a config map with the same name as the FileIntegrity object. All AIDE configuration config maps are labeled with file-integrity.openshift.io/aide-conf.

6.7.3. Determining the FileIntegrity object’s phase

Issue
You want to determine if the FileIntegrity object exists and see its current status.
Resolution

To see the FileIntegrity object’s current status, run:

$ oc get fileintegrities/worker-fileintegrity  -o jsonpath="{ .status }"

Once the FileIntegrity object and the backing daemon set are created, the status should switch to Active. If it does not, check the Operator pod logs.

6.7.4. Determining that the daemon set’s pods are running on the expected nodes

Issue
You want to confirm that the daemon set exists and that its pods are running on the nodes you expect them to run on.
Resolution

Run:

$ oc -n openshift-file-integrity get pods -lapp=aide-worker-fileintegrity
Note

Adding -owide includes the IP address of the node that the pod is running on.

To check the logs of the daemon pods, run oc logs.

Check the return value of the AIDE command to see if the check passed or failed.