rhos7: live-migration of an instance on iSCSI works one time, but migrating back leaves instance unusable

Solution Verified - Updated -

Issue

  • live-migration of an instance on iSCSI works one time, but migrating back leaves instance unusable:
Steps to reproduce the problem:

1. Create two boot volumes from an image

[stack@osp7dr1 ~(OC-admin)]$ cinder list
+--------------------------------------+--------+--------------+------+---------------+----------+--------------------------------------+
|                  ID                  | Status | Display Name | Size |  Volume Type  | Bootable |             Attached to              |
+--------------------------------------+--------+--------------+------+---------------+----------+--------------------------------------+
| 6db84a20-cf8d-4fc4-9d7f-1d6455badbd6 | in-use | root_vmA1_12 |  10  |  DX2-ISCSI-RG |   true   | 70fd494a-3065-4f42-b25c-3f320ce1393b |
| ae0fe26d-570f-4cbf-9b6c-464435a03516 | in-use | root_vmA1_2  |  10  | DX2-ISCSI-TPP |   true   | 2841a824-e693-40de-80b0-94cf72a03c97 |
+--------------------------------------+--------+--------------+------+---------------+----------+--------------------------------------+

2. Create two instances from the boot volumes

[stack@osp7dr1 ~(OC-admin)]$ nova list --name vmA1 --fields OS-EXT-SRV-ATTR:host,status,name
+--------------------------------------+------------------------------+--------+---------+
| ID                                   | OS-EXT-SRV-ATTR: Host        | Status | Name    |
+--------------------------------------+------------------------------+--------+---------+
| 70fd494a-3065-4f42-b25c-3f320ce1393b | osp7r1-compute-1.localdomain | ACTIVE | vmA1_12 |
| 2841a824-e693-40de-80b0-94cf72a03c97 | osp7r1-compute-2.localdomain | ACTIVE | vmA1_2  |
+--------------------------------------+------------------------------+--------+---------+

3. Configure Live Migration

4. Live Migrate instance "vmA1_12" from osp7r1-compute-1.localdomain to osp7r1-compute-2.localdomain

[stack@osp7dr1 ~(OC-admin)]$ nova live-migration vmA1_12 osp7r1-compute-2.localdomain

nova list --name vmA1 --fields OS-EXT-SRV-ATTR:host,status,name            
+--------------------------------------+------------------------------+-----------+---------+
| ID                                   | OS-EXT-SRV-ATTR: Host        | Status    | Name    |
+--------------------------------------+------------------------------+-----------+---------+
| 70fd494a-3065-4f42-b25c-3f320ce1393b | osp7r1-compute-1.localdomain | MIGRATING | vmA1_12 |
| 2841a824-e693-40de-80b0-94cf72a03c97 | osp7r1-compute-2.localdomain | ACTIVE    | vmA1_2  |
+--------------------------------------+------------------------------+-----------+---------+

5. After migration check the status of the instances - both should now be uctive on osp7r1-compute-2.localdomain

nova list --name vmA1 --fields OS-EXT-SRV-ATTR:host,status,name

+--------------------------------------+------------------------------+--------+---------+
| ID                                   | OS-EXT-SRV-ATTR: Host        | Status | Name    |
+--------------------------------------+------------------------------+--------+---------+
| 2841a824-e693-40de-80b0-94cf72a03c97 | osp7r1-compute-2.localdomain | ACTIVE | vmA1_2  |
| 70fd494a-3065-4f42-b25c-3f320ce1393b | osp7r1-compute-2.localdomain | ACTIVE | vmA1_12 |
+--------------------------------------+------------------------------+--------+---------+
[root@osp7r1-compute-2 ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 2     instance-00000005              running
 3     instance-00000002              running

[root@osp7r1-compute-1 ~]# virsh list
 Id    Name                           State
----------------------------------------------------

[root@osp7r1-compute-1 ~]#

Check the multipath devices on both compute nodes:

[root@osp7r1-compute-1 ~]# multipath -ll
[root@osp7r1-compute-1 ~]#

[root@osp7r1-compute-2 ~]# multipath -ll
3600000e00d28000000280d7e00080000 dm-2 FUJITSU ,ETERNUS_DXL
size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 11:0:0:1 sdf 8:80  active ready running
| `- 12:0:0:1 sdg 8:96  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 13:0:0:1 sdh 8:112 active ready running
  `- 14:0:0:1 sdi 8:128 active ready running
3600000e00d28000000280d7e00060000 dm-0 FUJITSU ,ETERNUS_DXL
size=10G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 13:0:0:0 sdd 8:48  active ready running
| `- 14:0:0:0 sde 8:64  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 11:0:0:0 sdb 8:16  active ready running
  `- 12:0:0:0 sdc 8:32  active ready running

Make sure both instances are runing and can write data to the disk ...

Instance vmA1_12:

[root@ros2client ~]# ssh -i key-admin.pem -p 6840 cloud-user@172.0.0.112
Last login: Thu Dec  3 07:30:21 2015 from 172.0.0.33
[cloud-user@vma1-12 ~]$
[cloud-user@vma1-12 ~]$
[cloud-user@vma1-12 ~]$ sudo -i
[root@vma1-12 ~]# dd if=/dev/zero of=/home/file count=1000000 conv=sync bs=1024
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.62285 s, 631 MB/s

Instance vmA1_2:

[root@ros2client ~]# ssh -i key-admin.pem -p 6860 cloud-user@172.0.0.112
Last login: Thu Dec  3 07:30:36 2015 from 172.0.0.33
[cloud-user@vma1-2 ~]$ sudo -i
[root@vma1-2 ~]#  dd if=/dev/zero of=/home/file count=1000000 conv=sync bs=1024
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.63544 s, 626 MB/s
[root@vma1-2 ~]#

6. Now start live migration from instance vmA1_12 back to osp7r1-compute-1

[stack@osp7dr1 ~(OC-admin)]$ nova live-migration vmA1_12 osp7r1-compute-1.localdomain

nova list --name vmA1 --fields OS-EXT-SRV-ATTR:host,status,name
+--------------------------------------+------------------------------+-----------+---------+
| ID                                   | OS-EXT-SRV-ATTR: Host        | Status    | Name    |
+--------------------------------------+------------------------------+-----------+---------+
| 2841a824-e693-40de-80b0-94cf72a03c97 | osp7r1-compute-2.localdomain | ACTIVE    | vmA1_2  |
| 70fd494a-3065-4f42-b25c-3f320ce1393b | osp7r1-compute-2.localdomain | MIGRATING | vmA1_12 |
+--------------------------------------+------------------------------+-----------+---------+

nova list --name vmA1 --fields OS-EXT-SRV-ATTR:host,status,name
+--------------------------------------+------------------------------+--------+---------+
| ID                                   | OS-EXT-SRV-ATTR: Host        | Status | Name    |
+--------------------------------------+------------------------------+--------+---------+
| 70fd494a-3065-4f42-b25c-3f320ce1393b | osp7r1-compute-1.localdomain | ACTIVE | vmA1_12 |
| 2841a824-e693-40de-80b0-94cf72a03c97 | osp7r1-compute-2.localdomain | ACTIVE | vmA1_2  |
+--------------------------------------+------------------------------+--------+---------+

7. Check the status of the instances after the live migration finished

Instance vmA1_12:

[root@ros2client ~]# ssh -i key-admin.pem -p 6840 cloud-user@172.0.0.112
Last login: Thu Dec  3 07:46:53 2015 from 172.0.0.33
[cloud-user@vma1-12 ~]$ sudo -i
[root@vma1-12 ~]# dd if=/dev/zero of=/home/file count=1000000 conv=sync bs=1024
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 1.52992 s, 669 MB/s

Instance vmA1_2:

[root@vma1-2 ~]# exit
logout
-bash: /root/.bash_logout: Input/output error
Bus error
[cloud-user@vma1-2 ~]$ exit

[root@ros2client ~]# ssh -i key-admin.pem -p 6860 cloud-user@172.0.0.112
ssh_exchange_identification: Connection closed by remote host

***ERROR ****
The root disk of instance vmA1_2 has problems.

8. Check the multipath status on osp7r1-compute-2 where instance vmA1_2 lives:

[root@osp7r1-compute-2 ~]# multipath -ll
3600000e00d28000000280d7e00070000 dm-0
size=10G features='0' hwhandler='0' wp=rw

[root@osp7r1-compute-2 ~]# lsblk --scsi
NAME HCTL       TYPE VENDOR   MODEL             REV TRAN
sda  0:2:0:0    disk FTS      PRAID EP400i     4.25
sdf  11:0:0:1   disk FUJITSU  ETERNUS_DXL      1033 iscsi
sdg  12:0:0:1   disk FUJITSU  ETERNUS_DXL      1033 iscsi
sdh  13:0:0:1   disk FUJITSU  ETERNUS_DXL      1033 iscsi
sdi  14:0:0:1   disk FUJITSU  ETERNUS_DXL      1033 iscsi

Check the configured disk device in the libvirt XML file:

[root@osp7r1-compute-2 qemu]# virsh list
 Id    Name                           State
----------------------------------------------------
 2     instance-00000005              running

  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/mapper/3600000e00d28000000280d7e00070000'/>
      <target dev='vda' bus='virtio'/>
      <serial>ae0fe26d-570f-4cbf-9b6c-464435a03516</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

[root@osp7r1-compute-2 qemu]# ls -al /dev/mapper/3600000e00d28000000280d7e00070000
lrwxrwxrwx. 1 root root 7 Dec  3 13:49 /dev/mapper/3600000e00d28000000280d7e00070000 -> ../dm-0

Environment

  • Red Hat Openstack Platform 7
  • OSP-d 7.1.0, OSDP 7.0.2
  • iSCSI hosting the instances

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content