The Self-Node-Remediation pod is CrashLoopBackOff as the snr agent cannot start in RHOCP 4

Solution Verified - Updated 2024-09-19T11:01:59+00:00 -

Environment

Red Hat OpenShift Container Platform 4
Self Node Remediation

Issue

The Self-Node-Remediation(snr) pod fails with below error:

ERROR   safe-time-calculator    snr agent can't start: the requested value for SafeTimeToAssumeNodeRebootedSeconds is too low   {"requested SafeTimeToAssumeNodeRebootedSeconds": "3m0s", "minimal calculated value for SafeTimeToAssumeNodeRebootedSeconds": "3m5s", "error": "snr agent can't start: the requested value for SafeTimeToAssumeNodeRebootedSeconds is too low"}

Resolution

Increase the number of safeTimeToAssumeNodeRebootedSeconds from 180 to a higher value in the SelfNodeRemediationConfig CR.

$ oc edit SelfNodeRemediationConfig self-node-remediation-config -n openshift-operators
spec:
  apiCheckInterval: 5s
  apiServerTimeout: 15s
  hostPort: 30001
  isSoftwareRebootEnabled: true
  maxApiErrorThreshold: 3
  peerApiServerTimeout: 5s
  peerDialTimeout: 5s
  peerRequestTimeout: 5s
  peerUpdateInterval: 15m
  safeTimeToAssumeNodeRebootedSeconds: 180     <-------------- change the value to 600 or any higher value
  watchdogFilePath: /dev/watchdog

Root Cause

The snr agent cannot start because the requested value for SafeTimeToAssumeNodeRebootedSeconds is too low in the SelfNodeRemediationConfig CR.

Diagnostic Steps

Check if any Self-Node-Remediation pods are in CrashLoopBackOff state.

$ oc get pods -n openshift-operators | grep CrashLoopBackOff
self-node-remediation-ds-abc                         0/1     CrashLoopBackOff   241 (64s ago)      20h
self-node-remediation-ds-pqr                         0/1     CrashLoopBackOff   240 (52s ago)      20h

Check the logs of CrashLoopBackOff Self-Node-Remediation pods:

$ oc logs po/self-node-remediation-ds-abc -n openshift-operators
ERROR   safe-time-calculator    snr agent can't start: the requested value for SafeTimeToAssumeNodeRebootedSeconds is too low   {"requested SafeTimeToAssumeNodeRebootedSeconds": "3m0s", "minimal calculated value for SafeTimeToAssumeNodeRebootedSeconds": "3m5s", "error": "snr agent can't start: the requested value for SafeTimeToAssumeNodeRebootedSeconds is too low"}

Check the value of safeTimeToAssumeNodeRebootedSeconds:

$ oc get SelfNodeRemediationConfig self-node-remediation-config -n openshift-operators -o yaml
spec:
  apiCheckInterval: 5s
  apiServerTimeout: 15s
  hostPort: 30001
  isSoftwareRebootEnabled: true
  maxApiErrorThreshold: 3
  peerApiServerTimeout: 5s
  peerDialTimeout: 5s
  peerRequestTimeout: 5s
  peerUpdateInterval: 15m
  safeTimeToAssumeNodeRebootedSeconds: 180     
  watchdogFilePath: /dev/watchdog

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

The Self-Node-Remediation pod is CrashLoopBackOff as the snr agent cannot start in RHOCP 4

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links