Wrong Logical Storage Operator device removed from node while in use by OpenShift Container Storage OSD pod

Solution In Progress - Updated -

Issue

When deploying OpenShift Container Storage (OCS) using the Local Storage Operator (LSO) in an OpenShift Container Platform (OCP) environment, the rook-ceph OSDs will consume the devices on the nodes according to the LSO configuration.

However when a physical drive fails on a specific node it might become difficult to identify the failed physical drive used by the OCS OSD before removal.

This script is a helper tool to help you perform the following:
- Identify the name of the deviceset for the failed OSD
- Identify the logical name of the physical device that failed
- Identify the by-id name of the physical device that failed

Given the provided information you will be able to accurately do the following:
- Identify the correct drive by-id name that is to be removed from the LSO configuration
- Scale down the correct OSD deployment object in OCP
- Identify the correct PVC that might have to be deleted (PVC claim name contains the deviceset name)
- Make sure the device matches a particular OSD ID

Environment

OpenShift Container Storage 4.4 and later

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content