Disabling Garbage Collection of Data Volumes Managed by GitOps

Solution Verified - Updated -

Environment

OpenShift Virtualization 4.12 and higher

OpenShift GitOps (any compatible version)

Issue

When DataVolumes are managed by a GitOps Application, the automatic removal of DataVolumes by the Garbage Collector can lead to errors that prevent the Application from succeeding.

Resolution

The work-around is to disable Garbage Collection of any GitOps managed DataVolumes.

Add the annotation, cdi.kubevirt.io/storage.deleteAfterCompletion = "false". This can either go directly in the DataVolume YAML, or using Kustomization with the commonAnnotations feature.

An example annotated DataVolume follows:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  annotations:
    cdi.kubevirt.io/storage.deleteAfterCompletion: "false"
  name: fedora
spec:
  pvc:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 20Gi
    volumeMode: Filesystem
  source:
    http:
      url: https://download.fedoraproject.org/pub/fedora/linux/releases/35/Cloud/x86_64/images/Fedora-Cloud-Base-35-1.2.x86_64.raw.xz

Another technique to avoid failed Applications from deleted DataVolumes is to exclusively use DataVolumeTemplates within Application managed VirtualMachine definitions. In this case, any DataVolumes created by a VirtualMachine will not be managed by ArgoCD, and therefore will not trigger an error upon being garbage collected.

Root Cause

Since version v0.57.0 of KubeVirt and version 4.12 of OpenShift Virtualization, DataVolumes created in the cluster are automatically annotated with cdi.kubevirt.io/storage.deleteAfterCompletion set to true.
Once the DataVolume reaches the Succeeded phase, the Garbage Collector steps in to delete the DataVolume.
Further, if the DataVolume is recreated after garbage collection, the admission webhook for CDI will reject it with an error if the PVC it would populate already exists.

This behavior clashes with GitOps engines like OpenShift GitOps (ArgoCD) because attempts to ensure the DataVolume resource remains present will result in errors, which can cause a larger sync operation to end prematurely in failure.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments