Wrong Logical Storage Operator device removed from node while in use by OpenShift Container Storage OSD pod
Issue
When deploying OpenShift Container Storage (OCS) using the Local Storage Operator (LSO) in an OpenShift Container Platform (OCP) environment, the rook-ceph
OSDs will consume the devices on the nodes according to the LSO configuration.
However when a physical drive fails on a specific node it might become difficult to identify the failed physical drive used by the OCS OSD before removal.
This script is a helper tool to help you perform the following:
- Identify the name of the deviceset
for the failed OSD
- Identify the logical name of the physical device that failed
- Identify the by-id
name of the physical device that failed
Given the provided information you will be able to accurately do the following:
- Identify the correct drive by-id
name that is to be removed from the LSO configuration
- Scale down the correct OSD deployment object in OCP
- Identify the correct PVC that might have to be deleted (PVC claim name contains the deviceset name)
- Make sure the device matches a particular OSD ID
Environment
OpenShift Container Storage 4.4 and later
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.