第 3 章 日志故障排除

3.1. 查看日志记录状态

您可以查看 Red Hat OpenShift Logging Operator 的状态和其他日志记录组件。

3.1.1. 查看 Red Hat OpenShift Logging Operator 的状态

您可以查看 Red Hat OpenShift Logging Operator 的状态。

先决条件

  • 安装了 Red Hat OpenShift Logging Operator 和 OpenShift Elasticsearch Operator。

流程

  1. 运行以下命令,切换到 openshift-logging 项目:

    $ oc project openshift-logging
  2. 运行以下命令来获取 ClusterLogging 实例状态:

    $ oc get clusterlogging instance -o yaml

    输出示例

    apiVersion: logging.openshift.io/v1
    kind: ClusterLogging
    # ...
    status:  1
      collection:
        logs:
          fluentdStatus:
            daemonSet: fluentd  2
            nodes:
              collector-2rhqp: ip-10-0-169-13.ec2.internal
              collector-6fgjh: ip-10-0-165-244.ec2.internal
              collector-6l2ff: ip-10-0-128-218.ec2.internal
              collector-54nx5: ip-10-0-139-30.ec2.internal
              collector-flpnn: ip-10-0-147-228.ec2.internal
              collector-n2frh: ip-10-0-157-45.ec2.internal
            pods:
              failed: []
              notReady: []
              ready:
              - collector-2rhqp
              - collector-54nx5
              - collector-6fgjh
              - collector-6l2ff
              - collector-flpnn
              - collector-n2frh
      logstore: 3
        elasticsearchStatus:
        - ShardAllocationEnabled:  all
          cluster:
            activePrimaryShards:    5
            activeShards:           5
            initializingShards:     0
            numDataNodes:           1
            numNodes:               1
            pendingTasks:           0
            relocatingShards:       0
            status:                 green
            unassignedShards:       0
          clusterName:             elasticsearch
          nodeConditions:
            elasticsearch-cdm-mkkdys93-1:
          nodeCount:  1
          pods:
            client:
              failed:
              notReady:
              ready:
              - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
            data:
              failed:
              notReady:
              ready:
              - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
            master:
              failed:
              notReady:
              ready:
              - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
    visualization:  4
        kibanaStatus:
        - deployment: kibana
          pods:
            failed: []
            notReady: []
            ready:
            - kibana-7fb4fd4cc9-f2nls
          replicaSets:
          - kibana-7fb4fd4cc9
          replicas: 1

    1
    在输出中,集群状态字段显示在 status 小节中。
    2
    Fluentd Pod 的相关信息。
    3
    Elasticsearch Pod 的相关信息,包括 Elasticsearch 集群健康状态 greenyellowred
    4
    Kibana Pod 的相关信息。

3.1.1.1. 情况消息示例

以下是来自 ClusterLogging 实例的 Status.Nodes 部分的一些情况消息示例。

类似于以下内容的状态消息表示节点已超过配置的低水位线,并且没有分片将分配给此节点:

输出示例

  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T15:57:22Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be not
        be allocated on this node.
      reason: Disk Watermark Low
      status: "True"
      type: NodeStorage
    deploymentName: example-elasticsearch-clientdatamaster-0-1
    upgradeStatus: {}

类似于以下内容的状态消息表示节点已超过配置的高水位线,并且分片将重新定位到其他节点:

输出示例

  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T16:04:45Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be relocated
        from this node.
      reason: Disk Watermark High
      status: "True"
      type: NodeStorage
    deploymentName: cluster-logging-operator
    upgradeStatus: {}

类似于以下内容的状态消息表示 CR 中的 Elasticsearch 节点选择器与集群中的任何节点都不匹配:

输出示例

    Elasticsearch Status:
      Shard Allocation Enabled:  shard allocation unknown
      Cluster:
        Active Primary Shards:  0
        Active Shards:          0
        Initializing Shards:    0
        Num Data Nodes:         0
        Num Nodes:              0
        Pending Tasks:          0
        Relocating Shards:      0
        Status:                 cluster health unknown
        Unassigned Shards:      0
      Cluster Name:             elasticsearch
      Node Conditions:
        elasticsearch-cdm-mkkdys93-1:
          Last Transition Time:  2019-06-26T03:37:32Z
          Message:               0/5 nodes are available: 5 node(s) didn't match node selector.
          Reason:                Unschedulable
          Status:                True
          Type:                  Unschedulable
        elasticsearch-cdm-mkkdys93-2:
      Node Count:  2
      Pods:
        Client:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:
        Data:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:
        Master:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:

类似于以下内容的状态消息表示请求的 PVC 无法绑定到 PV:

输出示例

      Node Conditions:
        elasticsearch-cdm-mkkdys93-1:
          Last Transition Time:  2019-06-26T03:37:32Z
          Message:               pod has unbound immediate PersistentVolumeClaims (repeated 5 times)
          Reason:                Unschedulable
          Status:                True
          Type:                  Unschedulable

类似于以下内容的状态消息表示无法调度 Fluentd Pod,因为节点选择器与任何节点都不匹配:

输出示例

Status:
  Collection:
    Logs:
      Fluentd Status:
        Daemon Set:  fluentd
        Nodes:
        Pods:
          Failed:
          Not Ready:
          Ready:

3.1.2. 查看日志记录组件的状态

您可以查看多个日志记录组件的状态。

先决条件

  • 安装了 Red Hat OpenShift Logging Operator 和 OpenShift Elasticsearch Operator。

流程

  1. 进入 openshift-logging 项目。

    $ oc project openshift-logging
  2. 查看日志记录环境的状态:

    $ oc describe deployment cluster-logging-operator

    输出示例

    Name:                   cluster-logging-operator
    
    ....
    
    Conditions:
      Type           Status  Reason
      ----           ------  ------
      Available      True    MinimumReplicasAvailable
      Progressing    True    NewReplicaSetAvailable
    
    ....
    
    Events:
      Type    Reason             Age   From                   Message
      ----    ------             ----  ----                   -------
      Normal  ScalingReplicaSet  62m   deployment-controller  Scaled up replica set cluster-logging-operator-574b8987df to 1----

  3. 查看日志记录副本集的状态:

    1. 获取副本集的名称:

      输出示例

      $ oc get replicaset

      输出示例

      NAME                                      DESIRED   CURRENT   READY   AGE
      cluster-logging-operator-574b8987df       1         1         1       159m
      elasticsearch-cdm-uhr537yu-1-6869694fb    1         1         1       157m
      elasticsearch-cdm-uhr537yu-2-857b6d676f   1         1         1       156m
      elasticsearch-cdm-uhr537yu-3-5b6fdd8cfd   1         1         1       155m
      kibana-5bd5544f87                         1         1         1       157m

    2. 获取副本集的状态:

      $ oc describe replicaset cluster-logging-operator-574b8987df

      输出示例

      Name:           cluster-logging-operator-574b8987df
      
      ....
      
      Replicas:       1 current / 1 desired
      Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed
      
      ....
      
      Events:
        Type    Reason            Age   From                   Message
        ----    ------            ----  ----                   -------
        Normal  SuccessfulCreate  66m   replicaset-controller  Created pod: cluster-logging-operator-574b8987df-qjhqv----