Troubleshooting OCP on RHV deployments

Updated -

Gather below logs, before you open an issue about OCP on RHV

  • Collect and share RHV Engine Logs to troubleshoot the environment issues.
  • Collect and share VDSM Logs.
  • Collect and share Must-gather from the OCP on RHV cluster to debug the Red Hat Openshift Container Platform related issues.
  • Follow this KCS in case must-gather is not available when ```````$ oc adm must-gather``` fails.
  • Collect and share sosreport from OCP on RHV machines to analyze issues related to nodes.
  • Make sure all the logs attached to the case are from the same timeframe.

Additional information to gather

  • What Red Hat Virtualization Manager Host and Red Hat Openshift Container Platform version is currently installed and which strategy is used to deploy OCP on RHV Environment? Ex. IPI or UPI
    Note: User Provisioned Infrastructure is a non-integrated platform in case of Red Hat Virtualization, so most of the bare metal debugging procedures apply for installation issues.

  • Confirm If there are multiple Red Hat Openshift Container Platform clusters running on the same Red Hat Virtualization Manager. Make sure all control plane nodes are on separate hosts by creating affinity groups. Also, separate high I/O workloads to separate worker nodes with taints and tolerations, and make sure these nodes are on their own hosts by using affinity groups too.
    Reason: Customers often deploy multiple Red Hat Openshift Container Platform clusters on the same Red Hat Virtualization Manager Host without creating affinity groups leading to the control plane nodes being on the same physical host. This can often cause performance problems.

  • Verify if minimum requirements and network setup are met as per the official docs.

  • Check if the firewall is set up correctly and ImageIO port 54322 is allowed during the prerequisite check for deploying the cluster in case the installation is failing due to the installer not being able to upload the Red Hat CoreOS image.
  • Check firewall, router and switch logs for networking issues like packet loss or when hitting the limits in networking equipment, such as bandwidth, MAC address, port, etc.
  • Confirm if Red Hat Virtualization Manager Host network range is different from the Red Hat Openshift Container Platform pod network range.
  • Confirm if the the IP address of the newly-created control plane or worker node present in the RHV manager, when the RHV host is running a version below 4.4.10? If no, this may be a known issue. In such cases upgrade your RHV host.

  • Provide details on the underlying storage system. In particular, note if software-defined storage, such as OpenShift Data Foundation, are used, and how it is deployed.

  • Confirm if all the Red Hat Virtualization related pods are restarted after changing the credentials.

Comments