Language:
Format:

Language:
Format:

11.4. 常见问题和关注

本节介绍在迁移过程中可能导致问题的常见问题。

11.4.1. 直接卷迁移未完成

如果直接卷迁移未完成，则目标集群可能没有与源集群相同的 node-selector 注解。

MTC 在迁移命名空间时会保留所有注解，以保持安全性上下文约束和调度要求。在直接卷迁移过程中，MTC 在从源集群迁移的命名空间中在目标集群上创建 Rsync 传输 pod。如果目标集群命名空间没有与源集群命名空间相同的注解，则无法调度 Rsync 传输 pod。Rsync pod 处于 Pending 状态。

您可以执行以下步骤识别并解决这个问题。

流程

检查 MigMigration CR 的状态：

$ oc describe migmigration <pod> -n openshift-migration

输出包括以下状态消息：

输出示例

Some or all transfer pods are not running for more than 10 mins on destination cluster

在源集群中，获取迁移的命名空间的详情：
```
$ oc get namespace <namespace> -o yaml 1
```
1
指定迁移的命名空间。
在目标集群中，编辑迁移的命名空间：
```
$ oc edit namespace <namespace>
```

将缺少的 openshift.io/node-selector 注解添加到迁移的命名空间中，如下例所示：

apiVersion: v1
kind: Namespace
metadata:
  annotations:
    openshift.io/node-selector: "region=east"
...

再次运行迁移计划。

11.4.2. 错误信息和解决方案

本节论述了您可能会在 Migration Toolkit for Containers（MTC）中遇到的常见错误消息，以及如何解决其底层原因。

11.4.2.1. 首次访问 MTC 控制台时显示的 CA 证书错误

如果在第一次尝试访问 MTC 控制台时显示 CA 证书错误信息，则可能的原因是在一个集群中使用自签名的 CA 证书。

要解决这个问题，进入出错信息中显示的 oauth-authorization-server URL 并接受证书。要永久解决这个问题，将证书添加到网页浏览器的信任存储中。

如果您接受证书后显示 Unauthorized 信息，进入 MTC 控制台并刷新网页。

11.4.2.2. MTC 控制台中的 OAuth 超时错误

如果在接受自签名证书后，MTC 控制台中显示 connection has timed out，其原因可能是：

对 OAuth 服务器的网络访问中断
对 OpenShift Container Platform 控制台的网络访问中断
代理配置中阻断了对 oauth-authorization-server URL 的访问。详情请查看因为 OAuth 超时错误而无法访问 MTC 控制台。

要确定超时的原因：

使用浏览器 web 检查器检查 MTC 控制台网页。
检查 Migration UI pod 日志中的错误。

11.4.2.3. 由未知颁发机构签名的证书错误

如果您使用自签名证书来保护集群或 MTC 的 Migration Toolkit 的复制仓库的安全，则证书验证可能会失败，并显示以下错误消息： Certificate signed by unknown authority。

您可以创建自定义 CA 证书捆绑包文件，并在添加集群或复制存储库时将其上传到 MTC web 控制台。

流程

从远程端点下载 CA 证书，并将其保存为 CA 捆绑包文件：

$ echo -n | openssl s_client -connect <host_FQDN>:<port> \ 1
  | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > <ca_bundle.cert> 2

1: 指定端点的主机 FQDN 和端口，如 api.my-cluster.example.com:6443。
2: 指定 CA 捆绑包文件的名称。

11.4.2.4. 在 Velero pod 日志中有备份存储位置错误

如果 Velero Backup 自定义资源包含对不存在的备份存储位置（BSL）的引用，Velero pod 日志可能会显示以下错误消息：

$ oc logs <Velero_Pod> -n openshift-migration

输出示例

level=error msg="Error checking repository for stale locks" error="error getting backup storage location: BackupStorageLocation.velero.io \"ts-dpa-1\" not found" error.file="/remote-source/src/github.com/vmware-tanzu/velero/pkg/restic/repository_manager.go:259"

您可以忽略这些错误消息。缺少 BSL 不会导致迁移失败。

11.4.2.5. Velero pod 日志中的 Pod 卷备份超时错误

如果因为 Restic 超时造成迁移失败，以下错误会在 Velero pod 日志中显示。

level=error msg="Error backing up item" backup=velero/monitoring error="timed out waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1

restic_timeout 的默认值为一小时。您可以为大型迁移增加这个参数值，请注意，高的值可能会延迟返回出错信息。

流程

在 OpenShift Container Platform web 控制台中导航至 Operators → Installed Operators。
点 Migration Toolkit for Containers Operator。
在 MigrationController 标签页中点 migration-controller。
在 YAML 标签页中，更新以下参数值：
```
spec:
  restic_timeout: 1h 1
```
1
有效单元是 h （小时）、m （分钟）和 s （秒），例如 3h30m15s。
点击 Save。

11.4.2.6. MigMigration 自定义资源中的 Restic 验证错误

如果迁移使用文件系统数据复制方法的持久性卷时数据验证失败，在 MigMigration CR 中会显示以下错误。

输出示例

status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: 2020-04-16T20:35:16Z
    message: There were verify errors found in 1 Restic volume restores. See restore `<registry-example-migration-rvwcm>`
      for details 1
    status: "True"
    type: ResticVerifyErrors 2

1: 错误消息指定了 Restore CR 名称。
2: ResticVerifyErrors 是一个包括验证错误的一般错误警告类型。

注意

数据验证错误不会导致迁移过程失败。

您可以检查 Restore CR，以识别数据验证错误的来源。

流程

登录到目标集群。

查看 Restore CR：

$ oc describe <registry-example-migration-rvwcm> -n openshift-migration

输出会标识出带有 PodVolumeRestore 错误的持久性卷。

输出示例

status:
  phase: Completed
  podVolumeRestoreErrors:
  - kind: PodVolumeRestore
    name: <registry-example-migration-rvwcm-98t49>
    namespace: openshift-migration
  podVolumeRestoreResticErrors:
  - kind: PodVolumeRestore
    name: <registry-example-migration-rvwcm-98t49>
    namespace: openshift-migration

查看 PodVolumeRestore CR:

$ oc describe <migration-example-rvwcm-98t49>

输出中标识了记录错误的 Restic pod。

输出示例

  completionTimestamp: 2020-05-01T20:49:12Z
  errors: 1
  resticErrors: 1
  ...
  resticPod: <restic-nr2v5>

查看 Restic pod 日志以查找错误：
```
$ oc logs -f <restic-nr2v5>
```

11.4.2.7. 从启用了 root_squash 的 NFS 存储中迁移时的 Restic 权限错误

如果您要从 NFS 存储中迁移数据，并且启用了 root_squash，Restic 会映射到 nfsnobody，且没有执行迁移的权限。Restic pod 日志中显示以下错误。

输出示例

backup=openshift-migration/<backup_id> controller=pod-volume-backup error="fork/exec /usr/bin/restic: permission denied" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/controller/pod_volume_backup_controller.go:280" error.function="github.com/vmware-tanzu/velero/pkg/controller.(*podVolumeBackupController).processBackup" logSource="pkg/controller/pod_volume_backup_controller.go:280" name=<backup_id> namespace=openshift-migration

您可以通过为 Restic 创建补充组并将组 ID 添加到 MigrationController CR 清单来解决这个问题。

流程

在 NFS 存储上为 Restic 创建补充组。
在 NFS 目录上设置 setgid 位，以便继承组所有权。
将 restic_supplemental_groups 参数添加到源和目标集群上的 MigrationController CR 清单：
```
spec:
  restic_supplemental_groups: <group_id> 1
```
1
指定补充组 ID。
等待 Restic pod 重启，以便应用更改。

11.4.3. 使用 `spc_t` 在 OpenShift Container Platform 上运行的工作负载自动应用 Skip SELinux 重新标记临时解决方案

当尝试使用 Migration Toolkit for Containers (MTC)迁移命名空间以及与之关联的大量卷时，rsync-server 可能会冻结，且没有提供用于进一步排除此问题的信息。

11.4.3.1. 诊断 Skip SELinux 重新标记临时解决方案

在 kubelet 日志中搜索 Unable to attach or mount volumes for pod…timed out waiting for the condition 错误，它来自为 Direct Volume Migration (DVM) 运行 rsync-server 的节点。

kubelet 日志示例

kubenswrapper[3879]: W0326 16:30:36.749224    3879 volume_linux.go:49] Setting volume ownership for /var/lib/kubelet/pods/8905d88e-6531-4d65-9c2a-eff11dc7eb29/volumes/kubernetes.io~csi/pvc-287d1988-3fd9-4517-a0c7-22539acd31e6/mount and fsGroup set. If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kubernetes/kubernetes/issues/69699

kubenswrapper[3879]: E0326 16:32:02.706363    3879 kubelet.go:1841] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[8db9d5b032dab17d4ea9495af12e085a], unattached volumes=[crane2-rsync-server-secret 8db9d5b032dab17d4ea9495af12e085a kube-api-access-dlbd2 crane2-stunnel-server-config crane2-stunnel-server-secret crane2-rsync-server-config]: timed out waiting for the condition" pod="caboodle-preprod/rsync-server"

kubenswrapper[3879]: E0326 16:32:02.706496    3879 pod_workers.go:965] "Error syncing pod, skipping" err="unmounted volumes=[8db9d5b032dab17d4ea9495af12e085a], unattached volumes=[crane2-rsync-server-secret 8db9d5b032dab17d4ea9495af12e085a kube-api-access-dlbd2 crane2-stunnel-server-config crane2-stunnel-server-secret crane2-rsync-server-config]: timed out waiting for the condition" pod="caboodle-preprod/rsync-server" podUID=8905d88e-6531-4d65-9c2a-eff11dc7eb29

11.4.3.2. 使用 Skip SELinux 重新标记临时解决方案解决

要解决这个问题，使用 MigrationController 自定义资源(CR)在源和目标 MigClusters 中将 migration_rsync_super_privileged 参数设置为 true。

MigrationController CR 示例

apiVersion: migration.openshift.io/v1alpha1
kind: MigrationController
metadata:
  name: migration-controller
  namespace: openshift-migration
spec:
  migration_rsync_super_privileged: true 1
  azure_resource_group: ""
  cluster_name: host
  mig_namespace_limit: "10"
  mig_pod_limit: "100"
  mig_pv_limit: "100"
  migration_controller: true
  migration_log_reader: true
  migration_ui: true
  migration_velero: true
  olm_managed: true
  restic_timeout: 1h
  version: 1.8.3

1: migration_rsync_super_privileged 参数的值指示是否将 Rsync Pod 作为 超级特权容器 运行 (spc_t selinux context)。有效设置为 true 或 false。

Select Your Language

11.4. 常见问题和关注

11.4.1. 直接卷迁移未完成

11.4.2. 错误信息和解决方案

11.4.2.1. 首次访问 MTC 控制台时显示的 CA 证书错误

11.4.2.2. MTC 控制台中的 OAuth 超时错误

11.4.2.3. 由未知颁发机构签名的证书错误

11.4.2.4. 在 Velero pod 日志中有备份存储位置错误

11.4.2.5. Velero pod 日志中的 Pod 卷备份超时错误

11.4.2.6. MigMigration 自定义资源中的 Restic 验证错误

11.4.2.7. 从启用了 root_squash 的 NFS 存储中迁移时的 Restic 权限错误

11.4.3. 使用 `spc_t` 在 OpenShift Container Platform 上运行的工作负载自动应用 Skip SELinux 重新标记临时解决方案

11.4.3.1. 诊断 Skip SELinux 重新标记临时解决方案

11.4.3.2. 使用 Skip SELinux 重新标记临时解决方案解决

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Language and Page Formatting Options

11.4. 常见问题和关注

11.4.1. 直接卷迁移未完成

11.4.2. 错误信息和解决方案

11.4.2.1. 首次访问 MTC 控制台时显示的 CA 证书错误

11.4.2.2. MTC 控制台中的 OAuth 超时错误

11.4.2.3. 由未知颁发机构签名的证书错误

11.4.2.4. 在 Velero pod 日志中有备份存储位置错误

11.4.2.5. Velero pod 日志中的 Pod 卷备份超时错误

11.4.2.6. MigMigration 自定义资源中的 Restic 验证错误

11.4.2.7. 从启用了 root_squash 的 NFS 存储中迁移时的 Restic 权限错误

11.4.3. 使用 spc_t 在 OpenShift Container Platform 上运行的工作负载自动应用 Skip SELinux 重新标记临时解决方案

11.4.3.1. 诊断 Skip SELinux 重新标记临时解决方案

11.4.3.2. 使用 Skip SELinux 重新标记临时解决方案解决

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links

11.4.3. 使用 `spc_t` 在 OpenShift Container Platform 上运行的工作负载自动应用 Skip SELinux 重新标记临时解决方案