When starting cluster update, the nodes in paused MCPs state begin updating in RHOCP 4

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP) 4.11+

Issue

  • When starting a Red Hat OpenShift cluster upgrade, paused MachineConfigPools (MCPs) may become un-paused and begin upgrading
  • Use-cases for pausing MCPs prior to a cluster upgrade may include:
    • Canary rollouts, explained here.
    • Extended Update Support (EUS) upgrades, explained here.

Resolution

  • When upgrading OpenShift clusters that are starting at RHOCP 4.11+, use the command-line interface, explained here
  • Ensure that all machine config pools (MCPs) are running and not paused. Nodes associated with a paused MCP are skipped during the update process. Pause the MCPs if you are performing a canary rollout update strategy.
  • Red Hat is aware of this Bug and it is fixed in Errata in version RHOCP 4.16.
  • With this release, nodes of paused MachineConfigPools correctly stay paused when performing a cluster update.
  • Please open a support case for more information on the bug.

Root Cause

There is a possibility of a race condition when using the WebUI to trigger cluster updates, which may cause paused MCPs to become incorrectly unpaused and begin upgrading unintentionally.

Diagnostic Steps

  • Check if the canary rollout strategy was followed using documentation and start the cluster update.
  • Notice that the worker nodes in paused MCPs begin update.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments