6.4. 了解 File Integrity Operator

File Integrity Operator 是一个 OpenShift Container Platform Operator,其在集群节点上持续运行文件完整性检查。它部署一个守护进程集,在每个节点上初始化并运行特权高级入侵检测环境 (AIDE) 容器,从而提供一个状态对象和在守护进程集初始运行时修改的文件日志。

重要

目前,仅支持 Red Hat Enterprise Linux CoreOS (RHCOS) 节点。

6.4.1. 创建 FileIntegrity 自定义资源

FileIntegrity 自定义资源 (CR) 的实例代表一组对一个或多个节点进行的持续文件完整性扫描。

每个 FileIntegrity CR 都由一个在符合 FileIntegrity CR 规格的节点上运行 AIDE 的守护进程集支持。

流程

  1. 创建名为 worker-fileintegrity.yamlFileIntegrity CR 示例,以便在 worker 节点上启用扫描:

    示例 FileIntegrity CR

    apiVersion: fileintegrity.openshift.io/v1alpha1
    kind: FileIntegrity
    metadata:
      name: worker-fileintegrity
      namespace: openshift-file-integrity
    spec:
      nodeSelector: 1
          node-role.kubernetes.io/worker: ""
      tolerations: 2
      - key: "myNode"
        operator: "Exists"
        effect: "NoSchedule"
      config: 3
        name: "myconfig"
        namespace: "openshift-file-integrity"
        key: "config"
        gracePeriod: 20 4
        maxBackups: 5 5
        initialDelay: 60 6
      debug: false
    status:
      phase: Active 7

    1
    定义调度节点扫描的选择器。
    2
    在带有自定义污点的节点上指定调度容限。如果没有指定,则应用允许在主节点上运行的默认容限。
    3
    定义包含要使用的 AIDE 配置的 ConfigMap
    4
    AIDE 完整性检查之间暂停的秒数。在节点中频繁进行 AIDE 检查需要大量资源,因此可以指定较长的时间间隔。默认值为 900 秒 (15 分钟)。
    5
    从 re-init 进程保留的最大 AIDE 数据库和日志备份数,以保留在节点上。守护进程会自动修剪超出这个数字的旧备份。默认设为 5。
    6
    启动第一个 AIDE 完整性检查前等待的秒数。默认设为 0。
    7
    FileIntegrity 实例的运行状态。状态为 InitializingPendingActive

    Initializing

    FileIntegrity 对象目前正在初始化或重新初始化 AIDE 数据库。

    待处理

    FileIntegrity 部署仍然被创建。

    Active

    扫描处于活动状态且持续。

  2. 将 YAML 文件应用到 openshift-file-integrity 命名空间:

    $ oc apply -f worker-fileintegrity.yaml -n openshift-file-integrity

验证

  • 运行以下命令确认 FileIntegrity 对象已创建成功:

    $ oc get fileintegrities -n openshift-file-integrity

    输出示例

    NAME                   AGE
    worker-fileintegrity   14s

6.4.2. 检查 FileIntegrity 自定义资源状态

FileIntegrity 自定义资源 (CR) 通过 .status.phase 子资源报告其状态。

流程

  • 要查询 FileIntegrity CR 状态,请运行:

    $ oc get fileintegrities/worker-fileintegrity  -o jsonpath="{ .status.phase }"

    输出示例

    Active

6.4.3. FileIntegrity 自定义资源阶段

  • 待处理 - 创建自定义资源 (CR) 后的阶段。
  • Active - 后备守护进程集启动并运行的阶段。
  • 初始化 - AIDE 数据库重新初始化的阶段。

6.4.4. 了解 FileIntegrityNodeStatuses 对象

FileIntegrity CR 的扫描结果会在名为 FileIntegrityNodeStatuses 的另一个对象中报告。

$ oc get fileintegritynodestatuses

输出示例

NAME                                                AGE
worker-fileintegrity-ip-10-0-130-192.ec2.internal   101s
worker-fileintegrity-ip-10-0-147-133.ec2.internal   109s
worker-fileintegrity-ip-10-0-165-160.ec2.internal   102s

注意

可能需要经过一段时间 FileIntegrityNodeStatus 对象才会可用。

每个节点都有一个结果对象。每个 FileIntegrityNodeStatus 对象的 nodeName 属性都对应于被扫描的节点。文件完整性扫描的状态在 results 数组中表示,其中包含扫描条件。

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq

fileintegritynodestatus 对象报告 AIDE 运行的最新状态,并在 status 字段中公开状态为 FailedSucceededErrored

$ oc get fileintegritynodestatuses -w

输出示例

NAME                                                               NODE                                         STATUS
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal   ip-10-0-134-186.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal   ip-10-0-150-230.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-169-137.us-east-2.compute.internal   ip-10-0-169-137.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal   ip-10-0-180-200.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal    ip-10-0-194-66.us-east-2.compute.internal    Failed
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal   ip-10-0-222-188.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal   ip-10-0-134-186.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal   ip-10-0-222-188.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal    ip-10-0-194-66.us-east-2.compute.internal    Failed
example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal   ip-10-0-150-230.us-east-2.compute.internal   Succeeded
example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal   ip-10-0-180-200.us-east-2.compute.internal   Succeeded

6.4.5. FileIntegrityNodeStatus CR 状态类型

这些条件会在对应的 FileIntegrityNodeStatus CR 状态的结果数组中报告:

  • Succeeded - 通过完整性检查;AIDE 检查中所涵盖的文件和目录自数据库上次初始化以来没有被修改。
  • Failed - 完整性检查失败;AIDE 检查中所涵盖的一些文件和目录自数据库上次初始化以来已被修改。
  • Errored - AIDE 扫描程序遇到内部错误。

6.4.5.1. FileIntegrityNodeStatus CR 成功示例

成功状态条件的输出示例

[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:45:57Z"
  }
]
[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:46:03Z"
  }
]
[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:45:48Z"
  }
]

在这种情况下,所有三个扫描都成功,目前没有其他条件。

6.4.5.2. FileIntegrityNodeStatus CR 失败状态示例

要模拟失败条件,请修改 AIDE 跟踪的其中一个文件。例如,在其中一个 worker 节点上修改 /etc/resolv.conf

$ oc debug node/ip-10-0-130-192.ec2.internal

输出示例

Creating debug namespace/openshift-debug-node-ldfbj ...
Starting pod/ip-10-0-130-192ec2internal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.130.192
If you don't see a command prompt, try pressing enter.
sh-4.2# echo "# integrity test" >> /host/etc/resolv.conf
sh-4.2# exit

Removing debug pod ...
Removing debug namespace/openshift-debug-node-ldfbj ...

一段时间后,相应的 FileIntegrityNodeStatus 对象的结果数组中会报告 Failed 条件。之前的 Succeeded 条件被保留,便于您查明检查失败的时间。

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io/worker-fileintegrity-ip-10-0-130-192.ec2.internal -ojsonpath='{.results}' | jq -r

或者,如果您没有提到对象名称,则运行:

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq

输出示例

[
  {
    "condition": "Succeeded",
    "lastProbeTime": "2020-09-15T12:54:14Z"
  },
  {
    "condition": "Failed",
    "filesChanged": 1,
    "lastProbeTime": "2020-09-15T12:57:20Z",
    "resultConfigMapName": "aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed",
    "resultConfigMapNamespace": "openshift-file-integrity"
  }
]

Failed 条件指向一个配置映射,该映射详细介绍了具体的失败及失败原因:

$ oc describe cm aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed

输出示例

Name:         aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Namespace:    openshift-file-integrity
Labels:       file-integrity.openshift.io/node=ip-10-0-130-192.ec2.internal
              file-integrity.openshift.io/owner=worker-fileintegrity
              file-integrity.openshift.io/result-log=
Annotations:  file-integrity.openshift.io/files-added: 0
              file-integrity.openshift.io/files-changed: 1
              file-integrity.openshift.io/files-removed: 0

Data

integritylog:
------
AIDE 0.15.1 found differences between database and filesystem!!
Start timestamp: 2020-09-15 12:58:15

Summary:
  Total number of files:  31553
  Added files:                0
  Removed files:            0
  Changed files:            1


---------------------------------------------------
Changed files:
---------------------------------------------------

changed: /hostroot/etc/resolv.conf

---------------------------------------------------
Detailed information about changes:
---------------------------------------------------


File: /hostroot/etc/resolv.conf
 SHA512   : sTQYpB/AL7FeoGtu/1g7opv6C+KT1CBJ , qAeM+a8yTgHPnIHMaRlS+so61EN8VOpg

Events:  <none>

由于配置映射数据大小限制,超过 1MB 的 AIDE 日志会作为 base64 编码的 gzip 存档添加到失败配置映射中。在这种情况下,您要将以上命令的输出输出管道到 base64 --decode | gunzip。压缩的日志由配置映射中存在 file-integrity.openshift.io/compressed 注解键来表示。

6.4.6. 了解事件

FileIntegrityFileIntegrityNodeStatus 对象的状态由 事件 记录。事件的创建时间反映了最新变化,如从 Initializing 变为 Active,它不一定是最新的扫描结果。但是,最新的事件始终反映最新的状态。

$ oc get events --field-selector reason=FileIntegrityStatus

输出示例

LAST SEEN   TYPE     REASON                OBJECT                                MESSAGE
97s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Pending
67s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Initializing
37s         Normal   FileIntegrityStatus   fileintegrity/example-fileintegrity   Active

当节点扫描失败时,会使用 add/changed/removed 和配置映射信息创建事件。

$ oc get events --field-selector reason=NodeIntegrityStatus

输出示例

LAST SEEN   TYPE      REASON                OBJECT                                MESSAGE
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-134-173.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-168-238.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-169-175.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-152-92.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-158-144.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-131-30.ec2.internal
87m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed

更改添加、更改或删除的文件数量会导致新的事件,即使节点的状态尚未转换。

$ oc get events --field-selector reason=NodeIntegrityStatus

输出示例

LAST SEEN   TYPE      REASON                OBJECT                                MESSAGE
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-134-173.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-168-238.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-169-175.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-152-92.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-158-144.ec2.internal
114m        Normal    NodeIntegrityStatus   fileintegrity/example-fileintegrity   no changes to node ip-10-0-131-30.ec2.internal
87m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
40m         Warning   NodeIntegrityStatus   fileintegrity/example-fileintegrity   node ip-10-0-152-92.ec2.internal has changed! a:3,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed