Live migration fails due to qemu error in in-place upgrades environment

Solution In Progress - Updated -

Issue

  • When this issue occurred, it failed with the following error on the destination host (overcloud-compute-1).
  • QEMU log on destination overcloud-compute-1:
2022-10-18 04:56:01.615+0000: starting up libvirt version: 7.0.0, package: 14.3.module+el8.4.0+11878+84e54169 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2021-07-20-02:46:13, ), qemu version: 5.2.0qemu-kvm-5.2.0-16.module+el8.4.0+12393+838d9165.8, kernel: 4.18.0-305.19.1.el8_4.x86_64, hostname: overcloud-compute-1.localdomain
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
HOME=/var/lib/libvirt/qemu/domain-1-instance-0000001d \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-instance-0000001d/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-instance-0000001d/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-instance-0000001d/.config \
QEMU_AUDIO_DRV=none \
/usr/libexec/qemu-kvm \
-name guest=instance-0000001d,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-instance-0000001d/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
-cpu SandyBridge,vme=on,hypervisor=on,arat=on,xsaveopt=on \
-m 10240 \
-object memory-backend-ram,id=pc.ram,size=10737418240 \
-overcommit mem-lock=off \
-smp 10,sockets=10,dies=1,cores=1,threads=1 \
-uuid f9b9ddb2-7ed3-4cf5-ba29-0da527dce8fb \
-smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=20.6.2-2.20210607104828.el8ost.4,serial=f9b9ddb2-7ed3-4cf5-ba29-0da527dce8fb,uuid=f9b9ddb2-7ed3-4cf5-ba29-0da527dce8fb,family=Virtual Machine' \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=33,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-boot strict=on \
...<snip>...
-netdev tap,fds=35:36:37:38:39:40:41:42:43:44,id=hostnet0,vhost=on,vhostfds=45:46:47:48:49:50:51:52:53:54 \
-device virtio-net-pci,mq=on,vectors=22,rx_queue_size=512,host_mtu=1500,netdev=hostnet0,id=net0,mac=AA:AA:AA:AA:AA:AA,bus=pci.0,addr=0x3 \
-add-fd set=21,fd=56 \
-chardev pty,id=charserial0,logfile=/dev/fdset/21,logappend=on \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc BB.BB.BB.BB:0 \
-k ja \
-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \
-incoming defer \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \
-object rng-random,id=objrng0,filename=/dev/urandom \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/0 (label charserial0)
2022-10-18T04:56:01.902671Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead
2022-10-18T04:56:03.999678Z qemu-kvm: get_pci_config_device: Bad config data: i=0x9a read: 11 device: 15 cmask: ff wmask: 0 w1cmask:0
2022-10-18T04:56:03.999711Z qemu-kvm: Failed to load PCIDevice:config
2022-10-18T04:56:03.999719Z qemu-kvm: Failed to load virtio-net:virtio
2022-10-18T04:56:03.999725Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-net'
2022-10-18T04:56:04.000902Z qemu-kvm: load of migration failed: Invalid argument
2022-10-18 04:56:04.213+0000: shutting down, reason=crashed
  • Here is history of the instance.
1. Boot instance on RHEL7.9 at overcloud-compute-1
2. Inplace-upgrade(Hybrid mode) both overcloud-compute-0 and overcloud-compute-1
3. Major Upgrade overcloud-compute-0 (RHEL7.9->RHEL8.4)
4.  live-migration from overcloud-compute-1 to overcloud-compute-0 (RHEL7.9->RHEL8.4)
5. Major Upgrade overcloud-compute-1(RHEL7.9->RHEL8.4)
6.  live-migration from overcloud-compute-0 to overcloud-compute-1 ... failed  (RHEL8.4->RHEL8.4)

Environment

  • Red Hat OpenStack Platform 16.2 (RHOSP)
  • nova
  • The instance has the following conditions causes the following conditions.
    • Multi Queue is enabled
    • The number of allocated vcps is 9 or more
    • Instances are running from OSP13

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content