Poison Pill Operator not re-creating the stopped nodes

Environment

Red Hat OpenShift Container Platform (RHOCP)
- 4
AWS provider
Poison Pill (PP) Operator

Issue

After triggering the Machine Health Check and Poison Pill Operator the Worker nodes remain stopped in NOTREADY STATUS

Resolution

The default MachineHealthCheck (MHC) recovery strategy is to destroy the machines and recreate them. This is an “active” mechanism that does not require cooperation from the failed node, and the cluster would always regain its full capacity. So for the case when the node is stopped or shutdown , it just need default MachineHealth Check recovery strategy to bring it back . Using Poison pill remediation wont work as the node wont be responding. Hence remove the "remediation" spec to point to Poison pill remediation template and set "AllowedRemediation" . Hence when the node is stopped the MHC pick the node and reboot it automatically.

Root Cause

The behavior described here is unfortunately expected as Poison Pill is a completely passive mechanism, which can guarantee that the node will enter a safe state (stopped) in order to allow workloads to move elsewhere, but restoring capacity is only best-effort
When the node is stopped manually, there is no Poison Pill (PP) process running there to trigger the reboot.

Diagnostic Steps

Stop worker node from AWS console

$ oc get nodes
$ oc get ppr -A
$ oc get machineset -n openshift-machine-api 
$ oc get machine -n openshift-machine-api -o wide

After deletion and recreation of node the node remain in NOTREADY Status

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Poison Pill Operator not re-creating the stopped nodes

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links