Instances stuck in spawning state

Issue

After OSP 13 is deployed, we are trying to create VM instances, but all of them stuck in a spawning state. None of these VM changed to running state after hours.
Previously, the VMs will eventually get created and run after tens of minutes or hours.
We expect the VM state changed into "running" in a couple of minutes.
We deployed OSP13 with OVS-DPDK on a server with two NUMA nodes and RT-KVM.
After several hours of waiting, we check the VM log on the compute node, the last log is:

2018-10-16 05:57:15.411+0000: starting up libvirt version: 3.9.0, package: 14.el7_5.6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-06-05-05:26:44, x86-041.build.eng.bos.redhat.com), qemu version: 2.10.0(qemu-kvm-rhev-2.10.0-21.el7_5.4), hostname: overcloud-computeovsdpdk-0.localdomain
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HOME=/root QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000001,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-instance-00000001/master-key.aes -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Server-IBRS,ss=on,hypervisor=on,tsc_adjust=on,clflushopt=on,pku=on,stibp=on -m 8192 -realtime mlock=off -smp 4,sockets=1,cores=2,threads=2 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/1-instance-00000001,share=yes,size=8589934592,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid b3f30301-3891-472a-9cb7-30bdc3aefb11 -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=17.0.3-0.20180420001141.el7ost,serial=f238eda2-f4e4-11e7-9ffd-7ed30ae9f81f,uuid=b3f30301-3891-472a-9cb7-30bdc3aefb11,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-instance-00000001/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -object secret,id=virtio-disk0-secret0,data=UBFRG5SMNp0DCM4K1ynS+TOu6kEdVIHJODh+pxhgV+w=,keyid=masterKey0,iv=XjhnhiK5MGKAUPhUCyvmpg==,format=base64 -drive 'file=rbd:cloud5_nova/b3f30301-3891-472a-9cb7-30bdc3aefb11_disk:id=cloud5_openstack:auth_supported=cephx\;none:mon_host=10.10.10.10\:6789\;10.10.10.11\:6789\;10.10.10.12\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,discard=unmap' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev socket,id=charnet0,path=/var/lib/vhost_sockets/vhube4de6e6-23,server -netdev vhost-user,chardev=charnet0,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:ff:ff:ff:ff,bus=pci.0,addr=0x3 -chardev socket,id=charnet1,path=/var/lib/vhost_sockets/vhu856e6fc4-f0,server -netdev vhost-user,chardev=charnet1,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:ff:ff:ff:fe,bus=pci.0,addr=0x4 -add-fd set=0,fd=80 -chardev pty,id=charserial0,logfile=/dev/fdset/0,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 10.10.10.14:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
2018-10-16T05:57:16.490733Z qemu-kvm: -chardev socket,id=charnet0,path=/var/lib/vhost_sockets/vhube4de6e6-23,server: info: QEMU waiting for connection on: disconnected:unix:/var/lib/vhost_sockets/vhube4de6e6-23,server"

We tried to restart libvirtd, but it failed because:

libvirt: QEMU Driver error : Timed out during operation: cannot acquire state change lock (held by remoteDisp..."

Then we tried to reboot the server/compute node. After the server boot up, the libvirtd daemon is not running:

 systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; disabled; vendor preset: enabled)
   Active: failed (Result: start-limit) since Tue 2018-10-16 14:47:30 UTC; 24min ago
     Docs: man:libvirtd(8)
           https://libvirt.org
  Process: 5821 ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS (code=exited, status=1/FAILURE)
 Main PID: 5821 (code=exited, status=1/FAILURE)

Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: Failed to start Virtualization daemon.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: Unit libvirtd.service entered failed state.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: libvirtd.service failed.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: libvirtd.service holdoff time over, scheduling restart.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: start request repeated too quickly for libvirtd.service
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: Failed to start Virtualization daemon.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: Unit libvirtd.service entered failed state.
Oct 16 14:47:30 overcloud-computeovsdpdk-0 systemd[1]: libvirtd.service failed.

Environment

Red Hat OpenStack Platform 13.0 (RHSOP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links