Instances stuck in spawning state

Solution In Progress - Updated -

Issue

  • After OSP 13 is deployed, we are trying to create VM instances, but all of them stuck in a spawning state. None of these VM changed to running state after hours.

  • Previously, the VMs will eventually get created and run after tens of minutes or hours.

  • We expect the VM state changed into "running" in a couple of minutes.

  • We deployed OSP13 with OVS-DPDK on a server with two NUMA nodes and RT-KVM.

  • After several hours of waiting, we check the VM log on the compute node, the last log is:

2018-10-16 05:57:15.411+0000: starting up libvirt version: 3.9.0, package: 14.el7_5.6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-06-05-05:26:44, x86-041.build.eng.bos.redhat.com), qemu version: 2.10.0(qemu-kvm-rhev-2.10.0-21.el7_5.4), hostname: overcloud-computeovsdpdk-0.localdomain
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOME=/root QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000001,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-instance-00000001/master-key.aes -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Server-IBRS,ss=on,hypervisor=on,tsc_adjust=on,clflushopt=on,pku=on,stibp=on -m 8192 -realtime mlock=off -smp 4,sockets=1,cores=2,threads=2 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/1-instance-00000001,share=yes,size=8589934592,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid b3f30301-3891-472a-9cb7-30bdc3aefb11 -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=17.0.3-0.20180420001141.el7ost,serial=f238eda2-f4e4-11e7-9ffd-7ed30ae9f81f,uuid=b3f30301-3891-472a-9cb7-30bdc3aefb11,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-instance-00000001/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -object secret,id=virtio-disk0-secret0,data=UBFRG5SMNp0DCM4K1ynS+TOu6kEdVIHJODh+pxhgV+w=,keyid=masterKey0,iv=XjhnhiK5MGKAUPhUCyvmpg==,format=base64 -drive 'file=rbd:cloud5_nova/b3f30301-3891-472a-9cb7-30bdc3aefb11_disk:id=cloud5_openstack:auth_supported=cephx\;none:mon_host=10.10.10.10\:6789\;10.10.10.11\:6789\;10.10.10.12\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,discard=unmap' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/lib/vhost_sockets/vhube4de6e6-23,server -netdev vhost-user,chardev=charnet0,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:ff:ff:ff:ff,bus=pci.0,addr=0x3 -chardev socket,id=charnet1,path=/var/lib/vhost_sockets/vhu856e6fc4-f0,server -netdev vhost-user,chardev=charnet1,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:ff:ff:ff:fe,bus=pci.0,addr=0x4 -add-fd set=0,fd=80 -chardev pty,id=charserial0,logfile=/dev/fdset/0,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.10.10.14:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
2018-10-16T05:57:16.490733Z qemu-kvm: -chardev socket,id=charnet0,path=/var/lib/vhost_sockets/vhube4de6e6-23,server: info: QEMU waiting for connection on: disconnected:unix:/var/lib/vhost_sockets/vhube4de6e6-23,server"
  • We tried to restart libvirtd, but it failed because:
libvirt: QEMU Driver error : Timed out during operation: cannot acquire state change lock (held by remoteDisp..."
  • Then we tried to reboot the server/compute node. After the server boot up, the libvirtd daemon is not running:
 systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; disabled; vendor preset: enabled)
   Active: failed (Result: start-limit) since Tue 2018-10-16 14:47:30 UTC; 24min ago
     Docs: man:libvirtd(8)
           https://libvirt.org
  Process: 5821 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=1/FAILURE)
 Main PID: 5821 (code=exited, status=1/FAILURE)

Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: Failed to start Virtualization daemon.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: Unit libvirtd.service entered failed state.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: libvirtd.service failed.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: libvirtd.service holdoff time over, scheduling restart.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: start request repeated too quickly for libvirtd.service
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: Failed to start Virtualization daemon.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: Unit libvirtd.service entered failed state.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: libvirtd.service failed.

Environment

  • Red Hat OpenStack Platform 13.0 (RHSOP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In