第 9 章 查看集群日志记录状态

您可以查看 Cluster Logging Operator 的状态以及多个集群日志记录组件的状态。

9.1. 查看 Cluster Logging Operator 的状态

您可以查看 Cluster Logging Operator 的状态。

先决条件

  • 必须安装 Cluster Logging 和 Elasticsearch。

流程

  1. 进入 openshift-logging 项目。

    $ oc project openshift-logging
  2. 查看集群日志记录状态:

    1. 获取集群日志记录状态:

      $ oc get clusterlogging instance -o yaml

      输出中包含类似于如下的信息:

      apiVersion: logging.openshift.io/v1
      kind: ClusterLogging
      
      ....
      
      status:  1
        collection:
          logs:
            fluentdStatus:
              daemonSet: fluentd  2
              nodes:
                fluentd-2rhqp: ip-10-0-169-13.ec2.internal
                fluentd-6fgjh: ip-10-0-165-244.ec2.internal
                fluentd-6l2ff: ip-10-0-128-218.ec2.internal
                fluentd-54nx5: ip-10-0-139-30.ec2.internal
                fluentd-flpnn: ip-10-0-147-228.ec2.internal
                fluentd-n2frh: ip-10-0-157-45.ec2.internal
              pods:
                failed: []
                notReady: []
                ready:
                - fluentd-2rhqp
                - fluentd-54nx5
                - fluentd-6fgjh
                - fluentd-6l2ff
                - fluentd-flpnn
                - fluentd-n2frh
        curation:  3
          curatorStatus:
          - cronJobs: curator
            schedules: 30 3 * * *
            suspended: false
        logstore: 4
          elasticsearchStatus:
          - ShardAllocationEnabled:  all
            cluster:
              activePrimaryShards:    5
              activeShards:           5
              initializingShards:     0
              numDataNodes:           1
              numNodes:               1
              pendingTasks:           0
              relocatingShards:       0
              status:                 green
              unassignedShards:       0
            clusterName:             elasticsearch
            nodeConditions:
              elasticsearch-cdm-mkkdys93-1:
            nodeCount:  1
            pods:
              client:
                failed:
                notReady:
                ready:
                - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
              data:
                failed:
                notReady:
                ready:
                - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
              master:
                failed:
                notReady:
                ready:
                - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
      visualization:  5
          kibanaStatus:
          - deployment: kibana
            pods:
              failed: []
              notReady: []
              ready:
              - kibana-7fb4fd4cc9-f2nls
            replicaSets:
            - kibana-7fb4fd4cc9
            replicas: 1
      1
      在输出中,集群状态字段显示在 status 小节中。
      2
      Fluentd Pod 的相关信息。
      3
      Curator Pod 的相关信息。
      4
      Elasticsearch Pod 的相关信息,包括 Elasticsearch 集群健康状态 greenyellowred
      5
      Kibana Pod 的相关信息。

9.1.1. 情况消息示例

以下示例是来自集群日志记录实例的 Status.Nodes 部分的一些情况消息。

类似于以下内容的状态消息表示节点已超过配置的低水位线,并且没有分片将分配给此节点:

  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T15:57:22Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be not
        be allocated on this node.
      reason: Disk Watermark Low
      status: "True"
      type: NodeStorage
    deploymentName: example-elasticsearch-clientdatamaster-0-1
    upgradeStatus: {}

类似于以下内容的状态消息表示节点已超过配置的高水位线,并且分片将重新定位到其他节点:

  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T16:04:45Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be relocated
        from this node.
      reason: Disk Watermark High
      status: "True"
      type: NodeStorage
    deploymentName: cluster-logging-operator
    upgradeStatus: {}

类似于以下内容的状态消息表示 CR 中的 Elasticsearch 节点选择器与集群中的任何节点都不匹配:

    Elasticsearch Status:
      Shard Allocation Enabled:  shard allocation unknown
      Cluster:
        Active Primary Shards:  0
        Active Shards:          0
        Initializing Shards:    0
        Num Data Nodes:         0
        Num Nodes:              0
        Pending Tasks:          0
        Relocating Shards:      0
        Status:                 cluster health unknown
        Unassigned Shards:      0
      Cluster Name:             elasticsearch
      Node Conditions:
        elasticsearch-cdm-mkkdys93-1:
          Last Transition Time:  2019-06-26T03:37:32Z
          Message:               0/5 nodes are available: 5 node(s) didn't match node selector.
          Reason:                Unschedulable
          Status:                True
          Type:                  Unschedulable
        elasticsearch-cdm-mkkdys93-2:
      Node Count:  2
      Pods:
        Client:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:
        Data:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:
        Master:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:

类似于以下内容的状态消息表示请求的 PVC 无法绑定到 PV:

      Node Conditions:
        elasticsearch-cdm-mkkdys93-1:
          Last Transition Time:  2019-06-26T03:37:32Z
          Message:               pod has unbound immediate PersistentVolumeClaims (repeated 5 times)
          Reason:                Unschedulable
          Status:                True
          Type:                  Unschedulable

类似于以下内容的状态消息表示无法调度 Curator Pod,因为节点选择器与任何节点都不匹配:

Curation:
    Curator Status:
      Cluster Condition:
        curator-1561518900-cjx8d:
          Last Transition Time:  2019-06-26T03:20:08Z
          Reason:                Completed
          Status:                True
          Type:                  ContainerTerminated
        curator-1561519200-zqxxj:
          Last Transition Time:  2019-06-26T03:20:01Z
          Message:               0/5 nodes are available: 1 Insufficient cpu, 5 node(s) didn't match node selector.
          Reason:                Unschedulable
          Status:                True
          Type:                  Unschedulable
      Cron Jobs:                 curator
      Schedules:                 */5 * * * *
      Suspended:                 false

类似于以下内容的状态消息表示无法调度 Fluentd Pod,因为节点选择器与任何节点都不匹配:

Status:
  Collection:
    Logs:
      Fluentd Status:
        Daemon Set:  fluentd
        Nodes:
        Pods:
          Failed:
          Not Ready:
          Ready: