Disabling Garbage Collection of Data Volumes Managed by GitOps
Environment
OpenShift Virtualization 4.12 and higher
OpenShift GitOps (any compatible version)
Issue
When DataVolumes are managed by a GitOps Application, the automatic removal of DataVolumes by the Garbage Collector can lead to errors that prevent the Application from succeeding.
Resolution
The work-around is to disable Garbage Collection of any GitOps managed DataVolumes.
Add the annotation, cdi.kubevirt.io/storage.deleteAfterCompletion = "false". This can either go directly in the DataVolume YAML, or using Kustomization with the commonAnnotations feature.
An example annotated DataVolume follows:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
annotations:
cdi.kubevirt.io/storage.deleteAfterCompletion: "false"
name: fedora
spec:
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
volumeMode: Filesystem
source:
http:
url: https://download.fedoraproject.org/pub/fedora/linux/releases/35/Cloud/x86_64/images/Fedora-Cloud-Base-35-1.2.x86_64.raw.xz
Another technique to avoid failed Applications from deleted DataVolumes is to exclusively use DataVolumeTemplates within Application managed VirtualMachine definitions. In this case, any DataVolumes created by a VirtualMachine will not be managed by ArgoCD, and therefore will not trigger an error upon being garbage collected.
Root Cause
Since version v0.57.0 of KubeVirt and version 4.12 of OpenShift Virtualization, DataVolumes created in the cluster are automatically annotated with cdi.kubevirt.io/storage.deleteAfterCompletion set to true.
Once the DataVolume reaches the Succeeded phase, the Garbage Collector steps in to delete the DataVolume.
Further, if the DataVolume is recreated after garbage collection, the admission webhook for CDI will reject it with an error if the PVC it would populate already exists.
This behavior clashes with GitOps engines like OpenShift GitOps (ArgoCD) because attempts to ensure the DataVolume resource remains present will result in errors, which can cause a larger sync operation to end prematurely in failure.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments