OpenShift Data Foundation のトラブルシューティング

名前: NooBaaNamespaceResourceErrorState

メッセージ: A NooBaa Namespace Resource Is In Error State

説明: A NooBaa namespace resource {{ $labels.namespace_resource_name }} is in error state for more than 5m

重大度: 警告

解決策: 修正

名前: NooBaaNamespaceBucketErrorState

メッセージ: A NooBaa Namespace Bucket Is In Error State

説明: A NooBaa namespace bucket {{ $labels.bucket_name }} is in error state for more than 5m

重大度: 警告

解決策: 修正

名前: NooBaaBucketExceedingQuotaState

メッセージ: A NooBaa Bucket Is In Exceeding Quota State

説明: A NooBaa bucket {{ $labels.bucket_name }} is exceeding its quota - {{ printf "%0.0f" $value }}% used message:A NooBaa Bucket Is In Exceeding Quota State

重大度: 警告

解決策: 修正

手順: クォータを超過した状態の NooBaa Bucket の解決

名前: NooBaaBucketLowCapacityState

メッセージ: A NooBaa Bucket Is In Low Capacity State

説明: A NooBaa bucket {{ $labels.bucket_name }} is using {{ printf "%0.0f" $value }}% of its capacity

重大度: 警告

解決策: 修正

名前: NooBaaBucketNoCapacityState

メッセージ: A NooBaa Bucket Is In No Capacity State

説明: A NooBaa bucket {{ $labels.bucket_name }} is using all of its capacity

重大度: 警告

解決策: 修正

名前: NooBaaBucketReachingQuotaState

メッセージ: A NooBaa Bucket Is In Reaching Quota State

説明: A NooBaa bucket {{ $labels.bucket_name }} is using {{ printf "%0.0f" $value }}% of its quota

重大度: 警告

解決策: 修正

名前: NooBaaResourceErrorState

メッセージ: A NooBaa Resource Is In Error State

説明: A NooBaa resource {{ $labels.resource_name }} is in error state for more than 6m

重大度: 警告

解決策: 回避策

名前: NooBaaSystemCapacityWarning100

メッセージ: A NooBaa System Approached Its Capacity

説明: A NooBaa system approached its capacity, usage is at 100%

重大度: 警告

解決策: 修正

名前: NooBaaSystemCapacityWarning85

メッセージ: A NooBaa System Is Approaching Its Capacity

説明: A NooBaa system is approaching its capacity, usage is more than 85%

重大度: 警告

解決策: 修正

名前: NooBaaSystemCapacityWarning95

メッセージ: A NooBaa System Is Approaching Its Capacity

説明: A NooBaa system is approaching its capacity, usage is more than 95%

重大度: 警告

解決策: 修正

名前: CephMdsMissingReplicas

メッセージ: Insufficient replicas for storage metadata service.

Description: `Minimum required replicas for storage metadata service not available.

Might affect the working of storage cluster.`

重大度: 警告

手順:

Check for alerts and operator status.
If the issue cannot be identified, contact Red Hat support.

名前: CephMgrIsAbsent

メッセージ: Storage metrics collector service not available anymore.

説明: Ceph Manager has disappeared from Prometheus target discovery.

重大度: Critical

手順:

ユーザーインターフェイスとログを調べて、更新が進行中であるかどうかを確認します。
- If an update in progress, this alert is temporary.
- If an update is not in progress, restart the upgrade process.
Once the upgrade is complete, check for alerts and operator status.
If the issue persistents or cannot be identified, contact Red Hat support.

名前: CephNodeDown

メッセージ: Storage node {{ $labels.node }} went down

説明: Storage node {{ $labels.node }} went down.Please check the node immediately.

重大度: Critical

手順:

Check which node stopped functioning and its cause.
Take appropriate actions to recover the node.If node cannot be recovered:
- Red Hat OpenShift Data Foundation のストレージノードの交換を参照してください
- Contact Red Hat support.

名前: CephClusterErrorState

メッセージ: Storage cluster is in error state

説明: Storage cluster is in error state for more than 10m.

重大度: Critical

手順:

Check for alerts and operator status.
If the issue cannot be identified, download log files and diagnostic information using must-gather.
Open a Support Ticket with Red Hat Support with an attachment of the output of must-gather.

名前: CephClusterWarningState

メッセージ: Storage cluster is in degraded state

説明: Storage cluster is in warning state for more than 10m.

重大度: 警告

手順:

Check for alerts and operator status.
If the issue cannot be identified, download log files and diagnostic information using must-gather.
Open a Support Ticket with Red Hat Support with an attachment of the output of must-gather.

名前: CephDataRecoveryTakingTooLong

メッセージ: Data recovery is slow

説明: Data recovery has been active for too long.

重大度: 警告

名前: CephOSDDiskNotResponding

メッセージ: Disk not responding

説明: Disk device {{ $labels.device }} not responding, on host {{ $labels.host }}.

重大度: Critical

名前: CephOSDDiskUnavailable

メッセージ: Disk not accessible

説明: Disk device {{ $labels.device }} not accessible on host {{ $labels.host }}.

重大度: Critical

名前: CephPGRepairTakingTooLong

メッセージ: Self heal problems detected

説明: Self heal operations taking too long.

重大度: 警告

名前: CephMonHighNumberOfLeaderChanges

メッセージ: Storage Cluster has seen many leader changes recently.

説明: 'Ceph Monitor "{{ $labels.job }}": instance {{ $labels.instance }} has seen {{ $value printf "%.2f" }} leader changes per minute recently.'

重大度: 警告

名前: CephMonQuorumAtRisk

メッセージ: Storage quorum at risk

説明: Storage cluster quorum is low.

重大度: Critical

名前: ClusterObjectStoreState

メッセージ: Cluster Object Store is in unhealthy state.Please check Ceph cluster health.

説明: Cluster Object Store is in unhealthy state for more than 15s.Please check Ceph cluster health.

重大度: Critical

手順:

CephObjectStore CR インスタンスを確認します。
Contact Red Hat support.

名前: CephOSDFlapping

メッセージ: Storage daemon osd.x has restarted 5 times in the last 5 minutes.Please check the pod events or Ceph status to find out the cause.

説明: Storage OSD restarts more than 5 times in 5 minutes.

重大度: Critical

名前: OdfPoolMirroringImageHealth

メッセージ: Mirroring image(s) (PV) in the pool <pool-name> are in Warning state for more than a 1m.Mirroring might not work as expected.

説明: 1 つまたは少数のアプリケーションでは障害復旧に失敗します。

重大度: 警告

名前: OdfMirrorDaemonStatus

メッセージ: Mirror daemon is unhealthy.

説明: クラスター全体で障害復旧に失敗します。mirror デーモンが 1 分以上異常状態になっています。このクラスターのミラーリングは予想通りに機能しません。

重大度: Critical