OCP4.x: Nodes Stuck in Reboot Loop After Upgrade.

Issue

After upgrading an OpenShift Container Platform (OCP) cluster, some nodes (both control plane and workers) fail to stay online, entering a crash/reboot loop. The following symptoms are observed:
- Kernel panic with error message: kernel BUG at arch/x86/kernel/alternative.c:288!. This appears in the node’s console log or journald right before the node reboots.

kernel: zstd_compress: no symbol version for module_layout  
kernel: ------------[ cut here ]------------  
kernel: kernel BUG at arch/x86/kernel/alternative.c:288!

Portworx storage pods (and related containers like portworx-api and CSI driver) are in CrashLoopBackOff. The kubelet might log errors about Portworx containers failing to start or readiness probes failing. For instance: “Error syncing pod ... failed container=portworx-api ... CrashLoopBackOff” and “Failed to load PX filesystem dependencies for kernel…” messages.

Environment

Red Hat OpenShift Container Platform 4.16.47 to 4.16.50
Red Hat OpenShift Container Platform 4.17.40 to 4.17.42
Red Hat OpenShift Container Platform 4.18.24 to 4.18.26
Out-of-tree (O) kernel modules (at least one of the following):
Oracle [oracleoks]
IBM [mmfs26]
IBM [tracedev]
HPE [ice]
HPE [numatools]
Portworx [px]
eTrust [SEOS]

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

OCP4.x: Nodes Stuck in Reboot Loop After Upgrade.

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links