Migration Toolkit for Containers (MTC) restore big size volume failed with signal killed error
Environment
- Red Hat OpenShift Service on Azure (ARO)
- 4
Issue
- When migrate large PV(>4TB)using MTC, the restore phase failed.
- If we lower the size of PV to 1TB for migration and test again, the restore phase is successful.
Resolution
- Use the below command to modify requests, limit memory to a larger size, and try to restore again.
(After the update, restic related pod will restart)
oc edit daemonsets restic -n openshift-migration
...
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 100m
memory: 2Gi
...
Root Cause
- From the error log, it indicated restic daemonset pod had been signal: killed.
- It is possible the restic pod got OOM (out-of-memory) issue when trying to restore large-size data.
rror='pod volume restore failed: error restoring volume: error running restic restore,'
stderr=: signal: killed' logSource='/remote-source/velero/app/pkg/restore/restore.go:1464'
restore=openshift-migration/migration-xxxxx-stage-xxxxx
Diagnostic Steps
- Check velero-pod logs
$ oc -n openshift-migration logs -f velero-pod
level=error msg='unable to successfully complete restic restores of pod's volumes'
error='pod volume restore failed: error restoring volume: error running restic restore,'
cmd=restic restore --repo=azure:velero:/velero/restic/xxx-xxxxx-xxxx
--password-file=/tmp/credentials/openshift-migration/velero-restic-credentials-repository-password
--cache-dir=/scratch/.cache/restic xxxx --target=. --delete --skip-unchanged,.
stdout=Skip Unchanged True\nrestoring (Snapshot xxxxx of [/host_pods/xxxxx-xxxx-xxxx-xxxx-
xxxxx/volumes/kubernetes.io~cinder/pvc-xxxxx-xxxx-xxxx-xxxx-xxxxx]
at 2023-03-27 07:11:20.949008456 +0000 UTC by root@velero) to . \{n,.
stderr=: signal: killed' logSource='/remote-source/velero/app/pkg/restore/restore.go:1464'
restore=openshift-migration/migration-xxxxx-stage-xxxxx
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments