OpenShift APIs for Data Protection (OADP) FAQ

Updated -

Table of Contents

An important part of any platform used to host business and user workloads is data protection. Data protection may include operations including on-demand backup, scheduled backup and restore. These operations allow the objects within a cluster to be backed up to a storage provider, either locally or on a public cloud, and restore that cluster from the backup in the event of a failure or scheduled maintenance.

Red Hat has created OpenShift API for Data Protection, or OADP, for this purpose. OADP brings an API to the OpenShift Container Platform that Red Hat partners can leverage in creating a disaster recovery and data protection solution.

Frequently Asked Questions

What is OADP?

OADP (OpenShift APIs for Data Protection) is an operator that Red Hat has created to create backup and restore APIs in the OpenShift cluster.
You can read more about OADP in the following links:
OADP documentation
OADP Customer Portal: verified solutions, articles, and discussions with support
OADP blog posts
OADP Troubleshooting Guide
Backup OpenShift applications using the OpenShift API for Data Protection with Multicloud Object Gateway

How to upgrade OADP

Please reference the official documentation for OADP upgrades.

Need a UI with OADP?

Please checkout our partnership and collaboration with CloudCasa! CloudCasa can provide a hosted or on-premise web based user interface that integrates with OADP. For details please refer to our partner page

What APIs does the operator provide?

OADP provides the following APIs:

  • Backup
  • Restore
  • Schedule
  • BackupStorageLocation
  • VolumeSnapshotLocation

Red Hat has not added, removed or modified any of the APIs as documented in the Velero upstream project. The Velero site has more details on the Velero API Types.

Can OADP backup my entire cluster?

No. OADP is meant to backup customer applications on the OpenShift Platform. OADP will not successfully backup and restore operators or etcd. There are a variety of ways to customize a backup to avoid backing up inappropriate resources via namespaces or labels.

Is there an upstream project for OADP?

Yes. The OADP operator is developed in the oadp-operator upstream project.

What is the support status of the OADP operator?

Please refer to the OADP support policy

Is OADP a full end-to-end data protection solution?

OpenShift API for Data Protection (OADP) features provide options for backing up and restoring applications. You can find more detail regarding OADP's features in our documentation

What data can OADP protect?

OADP provides APIs to backup and restore OpenShift cluster resources (yaml files), internal images and persistent volume data.

What is the OADP operator installing?

The OADP operator will install Velero, and OpenShift plugins for Velero to use, for backup and restore operations.

Can I install multiple versions of OADP in the same cluster?

No, each OADP version can have different CustomResourceDefinitions. Only one version of OADP will work properly if multiple OADP versions are installed in the same cluster.
It is recommended to have a single version of OADP (and Velero) installed in the cluster. Refer to OADP and Velero Version relationship

Can I install OADP alongside MTC?

As long as the OADP version you install is the same version as the OADP version depended on by MTC, it can be done.
You cannot install MTC 1.7 which expects OADP 1.0, as well as install OADP 1.1 for example. You can only install OADP 1.0 in this scenario.

Does OADP support CSI snapshots?

Yes, please refer to the documentation

How does OADP's Restic option manage incremental backups?

Velero looks for the most recent Restic backup for the current volume that is in the same backup location. If one is found, Velero then passes the Restic snapshot ID to the Restic CLI. This means that Restic will only retrieve files that have changed since the most recent backup, and uses the existing files for the rest.

Is there a recorded demo of OADP?

Yes! The OADP team did a great presentation and demonstration of OADP Check it out here. The first half is a very informative Q&A, followed by the demo.

What versions of OpenShift Container Platform can OADP be installed?

The OADP 1.0, and OADP 1.1 operators can be found within the embedded OperatorHub in the OpenShift web console, and are fully supported. Please refer to our support policy

Are there plans to include a data mover with OADP?

The data mover is in tech preview with OADP 1.1 and OADP 1.2. Documentation can be found here

How do I determine the version of Velero OADP installed?

After OADP installation, the velero deployment it will contain the tag of the image. If you install OADP with the default config you will be using upstream tagged images with the version called out in the deployment. You can also check out the version matrix.

Where can I find examples of using OADP APIs for backup/restore?

The OADP operator page in the upstream oadp-operator project has examples that walk through usage.

Using S3 compatible storage that does not have an associated region

There are S3 compatible storage implementations that do not require a region to be setup. In these cases simply substitute a valid aws region like "us-east-1" in the DPA yaml configuration. For example the OADP with MCG documentation. Reference the velero issue

  • A user should provide the:
    • s3Url: https://foo/storage
    • region: us-east-1
 backupLocations:
    - velero:
        config:
          profile: "default"
          region: us-east-1
          s3Url:  https://foo/storage <s3 endpoint>
          insecureSkipTLSVerify: "true"
          s3ForcePathStyle: "true"
        provider: aws
        default: true
        credential:
          key: cloud
          name: cloud-credentials 
        objectStorage:
          bucket: <bucket_name> 
          prefix: <prefix> 

Can OADP restore routes with base domain from the restore cluster?

OADP will restore routes with base domain from the restore cluster when the route being restored is a generated route

A generated route is a route that do not specify route.spec.host at creation and let OpenShift generates the hostname for the route. Generated route will have annotation "openshift.io/host.generated: 'true'". If you manually add this annotation to a route then unexpected behavior may occur during restore. If the user have modified the host value from a generated value, the host value can be lost on restore.

There is no mechanism at this time in OADP to dynamically set route host value based on cluster base domain name for a non generated route.

Likewise, for a generated route, the host value will be stripped by oadp-operator to be regenerated on restore cluster. Any modifications to .spec.host of a generated route will be lost on restore.

Can I turn off internal registry image backup?

If you experienced issues during backup or restore due to errors related to internal registry image (imagestreams) backup you can turn off image backup functionality like so in the DataProtectionApplication spec.

spec:
  backupImages: false // set this to disable image backup/restore

Set a backup to expire

When you create a backup, you can specify a TTL (time to live) by adding the flag --ttl . If Velero sees that an existing backup resource is expired, it removes:

  • The backup resource
  • The backup file from cloud object storage
  • All PersistentVolume snapshots
  • All associated Restores

Upstream Documentation with Details

Issues restoring an OADP backup: application unable to access data

When a Namespace is created in OpenShift, it is assigned a unique User Id (UID) range, a Supplemental Group (GID) range, and unique SELinux MCS labels. This information is stored in the metadata.annotations field of the Namespace. Every time a new Namespace is created, OpenShift assigns it a new range from its available pool of UIDs and updates the metadata.annotations field to reflect the assigned values. We will refer to these annotations as SCC (SecurityContextConstaints) annotations.

However, if the Namespace resource already has those annotations set, OpenShift does not re-assign new values for the Namespace. It instead assumes that the existing values are valid and moves on.

These are the SCC annotations on OpenShift namespaces
* openshift.io/sa.scc.mcs
* openshift.io/sa.scc.supplemental-groups
* openshift.io/sa.scc.uid-range

Workload may not have data access after restore if
* there is a pre-existing namespace with a different SCC annotations than at backup time, such as on a different cluster, OADP will reuse pre-existing namespace.
* backup used a label selector and the namespace where workloads runs on doesn't have the label on it. OADP will not backup the namespace but will create a new namespace without previous namespace annotations during restore causing a new UID range to be assigned to the namespace.

This can be an issue for customer workloads as OpenShift assigns a pod securityContext UID based on namespace annotations which in this case has changed from the time the persistent volume data was backup.
* container UID no longer matches the file owner's UID
* application can complain that it cannot read/write to data owned by a different UID

Simple mitigations include
* Adding label selector to namespace containing workload when using label selector to filter objects to include in backup.
* Removing pre-existing namespace before restoring

Advanced mitigations
* Updating owners of files in the restored cluster by following Fixing UID ranges after migration with step 2-4

There are risks associated with restoring namespace in a new cluster from backup including potential for UID range collisions with another namespace. To mitigate these risks, customer can optionally follow Fixing UID ranges after migration with step 1-4

For more information on OpenShift's UID/GID range (reference A Guide to OpenShift and UIDs)

OADP's data mover and Volsync setup and configuration

OADP 1.1.x and VolSync 0.6.x, 0.7.x

  • Annotation are required
    • For users that have upgraded to VolSync version >= 0.6.0 please note an annotation is required on the openshift-adp namespace for datamover operations to continue to work.
    • Execute the following command to annotate the openshift-adp namespace with `volsync.backube/privileged-movers='true'
oc annotate --overwrite namespace/openshift-adp volsync.backube/privileged-movers='true'

OADP 1.2.x and VolSync 0.7.x

  • Please ensure the annotation used in previous versions are now removed or set to 'false'
    • OADP 1.2.x no longer requires VolSync's privledged movers and can in fact cause errors. A user could see a failed backup that timed out with:
Error: container's runAsUser breaks non-root policy 
Error from server (BadRequest): container "restic" in pod "volsync-src-vsb-xxxyyy" is waiting to start: CreateContainerConfigError
  • By default OADP is leveraging the Pod/Workloads security context to backup data via Volsync/datamover. Find more information in VolSync's documentation security context
  • By default dpa.spec.features.dataMover.volumeOptions.sourceVolumeOptions.moverSecurityContext is set to True.
  • Execute the following command to ensure the annotation is set to false
oc annotate --overwrite namespace/openshift-adp volsync.backube/privileged-movers='false'

Note: In some more advanced use case where a customer may want and need root escalation of privileges the option to use the annotation and setting moverSecurityContext to false.

pod volume backup failed: running Restic backup, stderr=Fatal: unable to open config file: blob.GetProperties: storage

Errors such as

pod volume backup failed: running Restic backup, stderr=Fatal: unable to open config file: blob.GetProperties: storage: service returned error: StatusCode=404, ErrorCode=404 The specified container does not exist

occurs when you delete restic folder in object storage. Try following:
OADP-1.1.x

oc get resticrepositories -n openshift-adp

OADP-1.2.x

oc get backuprepositories.velero.io -n openshift-adp

If you see ones that corresponds to your object storage, delete it so velero recreates another restic repository.

Backing up data from one cluster and restoring to another cluster

  • To successfully backup and restore data to two different clusters please ensure that in your DPA config on both clusters that:
    • The backup store location (BSL) and volume snapshot location have the same names and paths to restore resources to another cluster.
    • The same object storage location credentials must be shared across the clusters
    • The upstream Velero documentation is helpful in the case.
    • For Volume backup and restore please refer to the latest OADP documentation and the datamover sections.
    • Allow OADP to create the namespace on the destination cluster for best results.
    • When restoring PVCs into a namespace where the volumes already exist, prior to restore you should first delete any PVCs that need to be updated. For Restic use cases, the Deployment (or DC, etc.) for the mounting pod must also be removed, assuming that the Deployment is also in the backup being restored.

Can OADP modify nodeSelector during restore?

  • Not at this time. You can manually modify nodeSelectors after restore. OADP team or the community could implement in a future release a RestoreItemAction plugin that does this.

Disaster recovery - Using Schedules and Read-Only Backup Storage Locations

During disaster recovery, it is recommended that you set your backup location accessMode to ReadOnly to prevent addition/deletions to the backup storage location during the restore process.

You would set accessMode to readOnly like so in the DataProtectionApplication spec

...
spec:
  backupLocations:
    - velero:
        accessMode: ReadOnly
...

Proceed to restore from backup.

OADP Restore fails with ArgoCD

If ArgoCD is being used during a restore, it is possible to see the restore fail. This could be caused by a label used by ArgoCD app.kubernetes.io/instance. This label is used to identify which resources ArgoCD needs to manage, which can create conflict with OADP managing resources on restore.

To resolve this issue, you can set .spec.resourceTrackingMethod on the ArgoCD yaml to annotation+label or annotation. If issues still persist, then disable ArgoCD before restore, and then enable again once restore completes.

Please do let us know when the errors occur so we work to resolve the issue.

Can I install OADP into multiple OpenShift Projects to enable project owners?

We will be providing additional documentation to cover this use in the near future, however it is worth noting here. It is possible to install OADP into multiple namespaces to enable project owners to manage their own OADP instance. The deployments of OADP must all be at the same version, installing different versions of OADP on the same cluster is NOT supported.

  • It is required that each individual deployment of OADP have a unique set of credentials and BackupStorageLocation configuration. The workflow has been validated with Restic and CSI.
  • It is worth noting that by default each OADP deployment has cluster level access across namespaces. We recommend that OCP administrators review the security and RBAC settings carefully.

I am trying to use OADP with a ROSA cluster, and need help

We have recently updated the documentation for installing and configuring OADP with ROSA clusters. Please see the documentation here

I set a very short TTL for backup, but the data still exists after TTL expires

The effects of expiration are not applied immediately, they are applied when the gc-controller runs its reconciliation loop every hour.

Datamover enabled backup/restore is stuck in WaitingForPluginOperations

  • One potential cause of the backup being stuck in wait are restic locks. To determine if the root cause are restic locks, cycle through the vsb with the following command.
 oc get vsb -n <protected-ns> -o yaml | grep resticrepository
  • Look for an error referencing the restic lock, Fatal: unable to create lock in backend: repository is already locked
  • The lock must be removed for the backup/restore to be successful.
  • Please reference:
    • https://restic.readthedocs.io/en/stable/100_references.html?highlight=lock#locks
    • https://forum.restic.net/t/detecting-stale-locks/1889/15
    • https://github.com/restic/restic/issues/1450

Note: Datamover backups in OADP 1.1 requires vsb cleanup when run via a schedule or repetitively

If you are running velero backups via the velero scheduler with datamover enabled, vsb's need to removed to avoid inconsistent content in the velero restore. This also applies to repetitive backups executed manually or programmatically. Please see vsb cleanup instructions in this document.

A fix for this issue will NOT be released for OADP 1.1.x. Customers will need to upgrade to OADP 1.2.x for the fix.

To avoid issues with the velero scheduler we recommend executing the velero backup via a k8s cron job that include the backup command and the vsb cleanup.

Datamover backup cleanup for OADP 1.1

In OADP 1.1, some resources can be left behind by datamover.

Remove snapshots in bucket

There will be snapshots in your bucket specified in the DPA .spec.backupLocation.objectStorage.bucket under /<protected-ns>
- delete this folder to delete all snapshots in your bucket.

In the /<protected-ns> folder, there will be additional folder(s) prefixed with /<volumeSnapshotContent name>-pvc where volumeSnapshotContent name is the volumeSnapshotContent created by datamover per PVC.
- delete this folder to delete a single snapshot in your bucket.

Remove cluster resources: There are two main scenarios:

1. Datamover completes: volumeSnapshotBackup/volumeSnapshotRestore CRs still exist in the application namespace.

Datamover backup:
oc delete vsb -n <app-ns> --all

Datamover restore:
oc delete vsr -n <app-ns> --all

  • Note: There will also be volumeSnapshotContents that can be deleted if needed
    oc delete volumesnapshotcontent --all

2. Datamover partiallyFails or Fails: VSB/VSR CRs exist in the application namespace, as well as extra resources created by these controllers.

Datamover backup:

oc delete vsb -n <app-ns> --all

oc delete volumesnapshot -A --all

oc delete volumesnapshotcontent --all

oc delete pvc -n <protected-ns> --all

oc delete replicationsource -n <protected-ns> --all

Datamover restore:

oc delete vsr -n <app-ns> --all

oc delete volumesnapshot -A --all

oc delete volumesnapshotcontent --all

oc delete replicationdestination -n <protected-ns> --all

Failed to check and update snapshot content in a VolumeSnapshotContent Object

Users may notice that during a backup that a VolumeSnapShotContent object will be created per Volume. Users may also notice the VolumeSnapShotContents object in an error state similar to the following. This is a known transient issue and should resolve as OpenShift reconciles the VSC's. For more details see: Get error when creating volume snapshot

      Failed to check and update snapshot content: failed to remove
      VolumeSnapshotBeingCreated annotation on the content
      snapcontent-6cd696d3-2cf4-4c4d-8d96-439dc090b10b: "snapshot controller
      failed to update snapcontent-6cd696d3-2cf4-4c4d-8d96-439dc090b10b on API
      server: Operation cannot be fulfilled on
      volumesnapshotcontents.snapshot.storage.k8s.io
      \"snapcontent-6cd696d3-2cf4-4c4d-8d96-439dc090b10b\": the object has been
      modified; please apply your changes to the latest version and try again"

How does OADP's data mover for CSI snapshots work?

We have a nice blog post describing in detail how OADP datamover works. Also see our public documentation

Removing old snapshots and Pruning with Data Mover backups

  • The restic retain and prune function is known NOT to work with OADP's Data Mover in 1.1.x and 1.2.x. Customers should plan accordingly.

VolumeSnapContent in failed state

You may find the following error causing backups to take longer than usual or to ultimately fail:

snapshot controller failed to update or failed to remove...xxxxx
the object has been modified; please apply your changes to the latest version and try again
  • This is a known issue and a fix is in progress.
  • It may be possible to delete the failed volume snapshot contents object, it should automatically be recreated. Please ensure your backup can be restored successfully with the correct data in such cases.

Backup partially falling for BuildConfig application

Backup may partially fail when build pods are part of the backup due to build pods being completed.

Workaround is to exclude volumes from being included in the backup of build pods by adding the following annotation to build pods backup.velero.io/backup-volumes-excludes=buildworkdir,container-storage-root,build-blob-cache

Build pods can be identified by label or annotation openshift.io/build.name=somebuildname

Command to annotate build pods

NS=<backup-includedNamespace> && oc annotate -n $NS $(oc get pods -n $NS -oname -l openshift.io/build.name=<buildName>) backup.velero.io/backup-volumes-excludes=buildworkdir,container-storage-root,container-storage-run,build-blob-cache

Is it possible to backup 3scale API Mgmt with OADP

Please refer to this article

Source file not found, at least one source file could not be read

  • When using filesystem level volume backups (FSB), you observe something similar in the backup
    logs as shown below
time="2023-09-06T06:04:37Z" level=error msg="Error backing up item" backup=openshift-adp/schedule-202309045804985 error="pod volume backup failed: running Restic backup, stderr={\"message_type\":\"error\",\"error\":{\"Op\":\"lstat\",\"Path\":\"application-name-77d8987g765v-76vqv_2023-08-09-22-04.log\",\"Err\":2},\"during\":\"archival\",\"item\":\"/host_pods/3dfghjkytdcvb-rfbijn-47b85-ftrv3i-34567ng4i/volumes/kubernetes.io~azure-file/pvc-45678-dfgjy6-dfjg6vft-7657fv654c/application-name-77d89cf56c-76vqv_2023-08-09-22-04.log\"}
  • or
\"error\":{\"Op\":\"lstat\",\"Path\":\"indices/IF-3KSDFGHG3-
Zy/0/index/_2x79j_1_Lucene91_1.dvm\",\"Err\":2},\"during\":\"archival\",\"item\":\"/
host_pods/41ce-a170-28casfdafdg/volumes/kubernetes.io~csi/
pvc-4a8ba939-8bd0-443b-91aa-cb0a36066586/mount/indices/IF-3K2QkQLSG3-ZyKKa1Nw/0/index/
_2x79j_1_Lucene91_1.dvm\"}\nWarning: at least one source file could not be read\n: error
running restic

This happens when there is churn in the filesystem and the file is no longer present while performing pod volume backup. It is likely that the file was present during restic's initial scan of the volume but was removed at the point when the file was actually backed up to the restic store.
- Consider using a CSI backup
- Consider xcluding the volume

error validating existing CRs against new CRD''s schema for "podvolumerestores.velero.io"

If you have an installplan error upon upgrading to OADP 1.2+ from prior versions, it means you have created a restore prior to upgrading.

During velero restore when restic is enabled, a podvolumerestore custom resource is created when restoring persistent volume data.

In OADP 1.2, the podvolumerestores object have a new required field that OADP 1.1 podvolumerestore objects won't have. As long as the podvolumerestores are not in-progress they are safe to remove, as new ones will be regenerated when another restore is created.

Delete all podvolumerestores to proceed with the upgrade.

Using cacert with velero command aliased via velero deployment

Some users may want to use velero CLI without installing it locally on their system.

They can do so using aliased velero command like so

alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'

If you want to use cacert with this command, you can add cert to the velero deployment like so

$ oc get dataprotectionapplications.oadp.openshift.io <dpa-name> -o jsonpath='{.spec.backupLocations[0].velero.objectStorage.caCert}' | base64 -d | oc exec -i deploy/velero -c velero -- bash -c "cat > /tmp/your-cacert.txt"
velero describe backup <backup-name> --details --cacert /tmp/your-cacert.txt

In future versions of OADP, we may mount the cert to the velero pod for your convenience to eliminate the extra step above.

S3 bucket versioning

Using S3 bucket versioning is known to cause failures with OADP backups. The OADP team does NOT recommend the use of S3 bucket versioning.
References:
- AWS Bucket Versioning
- Minio Bucket Versioning
- GCP Bucket Versioning

Comments