OCS / ODF OSD removal fails with error: unknown parameter name "FORCE_OSD_REMOVAL"

Solution Verified - Updated -

Issue

  • This is ODF 4.10.8 and we are trying to remove an OSD following steps from
    Steps to replace failed OSD in Red Hat OpenShift Container Storage 4.X
    or
    in the ODF 4.10 Documentation : Replacing devices

  • This command

    oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=3 |oc create -n openshift-storage -f -
    

    creates a job and a removal pod , but the removal pod ocs-osd-removal-job is running and does not complete.

  • Find out why by looking into its logs oc logs ocs-osd-removal-job

    2022-12-07 13:31:16.627495 D | exec: Running command: ceph osd dump --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.a
    dmin.keyring --format json
    2022-12-07 13:31:19.427571 I | cephosd: validating status of osd.3
    2022-12-07 13:31:19.427625 I | cephosd: osd.3 is marked 'DOWN'. Removing it
    2022-12-07 13:31:19.427732 D | exec: Running command: ceph osd find 3 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client
    .admin.keyring --format json
    2022-12-07 13:31:20.029956 D | exec: Running command: ceph osd out osd.3 --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/cli
    ent.admin.keyring --format json
    2022-12-07 13:31:20.658128 I | cephosd: removing the OSD deployment "rook-ceph-osd-3"
    2022-12-07 13:31:20.658174 D | op-k8sutil: removing rook-ceph-osd-3 deployment if it exists
    2022-12-07 13:31:20.658186 I | op-k8sutil: removing deployment rook-ceph-osd-3 if it exists
    2022-12-07 13:31:20.686597 I | op-k8sutil: Removed deployment rook-ceph-osd-3
    2022-12-07 13:31:20.693035 I | op-k8sutil: "rook-ceph-osd-3" still found. waiting...
    2022-12-07 13:31:22.733871 I | op-k8sutil: confirmed rook-ceph-osd-3 does not exist
    2022-12-07 13:31:22.746526 I | cephosd: removing the osd prepare job "rook-ceph-osd-prepare-6f8f4e58014c4c06de3d8d181ee62d11"
    2022-12-07 13:31:22.762293 I | cephosd: removing the OSD PVC "ocs-deviceset-volume01-0-data-10l8bdn"
    2022-12-07 13:31:22.774728 D | exec: Running command: ceph osd purge osd.3 --force --yes-i-really-mean-it --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/v
    ar/lib/rook/openshift-storage/client.admin.keyring --format json
    

    Notice this pod is waiting for the "ceph osd purge osd.3" command to finish.

  • Delete the job oc delete job job_name , that will delete the removal job also, and try again with the option FORCE_OSD_REMOVAL , but this may fail with this error:

    # oc process -n openshift-storage ocs-osd-removal -p FORCE_OSD_REMOVAL=true -p FAILED_OSD_IDS=3 | oc create -f -
    error: unknown parameter name "FORCE_OSD_REMOVAL"
    error: no objects passed to create
    # 
    

Environment

  • OCS 4.8
  • ODF 4.9 and higher

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content