OCS / ODF Database Workloads Must Not Use CephFS PVs/PVCs (RDBMSs, NoSQL, PostgreSQL, Mongo DBs, etc.)

Solution Verified - Updated -

Issue

There were issues noted with customers running database applications using PVs based in CephFS. In some scenarios, administrators were also taking snapshots of CephFS PV. This is a corner case that is difficult to hit, but when hit, it can cause severe impact (CephFS service outage).

The impact includes damage to the metadata of files used by the database application (only file metadata is damaged, not the data itself). This can cause both Ceph MDS pods to crash when the database application is deleting a file with damaged metadata. In this scenario, Ceph can no longer serve any IO for CephFS volumes, causing loss of data access to CephFSs (at least one MDS (Meta Data Server) pod must be up and running to serve CephFS IO).

In order to identify potentially damaged files, a tool is being developed:
OCS 4.X How to use first-damage.py tool to detect damaged files

In case of MDS crash, this other document might help to identify the PV that contains damaged file (and the associated application)
OCS/ODF How to identify the PV cephfs associated to a damaged file reported by mds
and this other document to stop mds pods crashing:
ODF - MDS pods in constant CrashLoopBackOff (Bad Session)

Environment

Red Hat OpenShift Container Storage (RHOCS) 4.x
Red Hat OpenShift Data Foundations (RHODF) 4.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content