Instance with high CPU Steal due to most instances being scheduled on the same pCPU in Red Hat OpenStack Platform

Solution In Progress - Updated -

Issue

Very high CPU steal on several instances.

Symptoms:

  • instances show more than 50% CPU steal regularly.
  • most instances run their vCPUs on pCPU 0 of the hypervisors (run this on all hypervisors to find the top scheduled CPUs at a given moment)
virsh list | awk '{print $2}' | xargs -I {} virsh vcpuinfo {} | egrep '^CPU\:' | awk '{print $NF}' | sort | uniq -c | sort -nr
nova4 | SUCCESS | rc=0 >>
     39 0
      9 8
      9 6
(...)
nova7 | SUCCESS | rc=0 >>
     55 0
      4 21
      4 13
      3 22
      3 18
(...)
nova8 | SUCCESS | rc=0 >>
     44 0
      5 9
      4 5
      4 4
(...)
nova10 | SUCCESS | rc=0 >>
     43 0
      7 9
      6 8
      6 7
      6 5
(...)
nova14 | SUCCESS | rc=0 >>
     21 0
      3 21
      2 9
      2 7
(...)

Other details about environment:

  • isolcpus is configured in grub for pCPUs 0 to 3
  • it seems that most vCPUs get mostly scheduled on CPU 0 of the hypervisors
  • these hypervisors were configured with isolcpus kernel command line parameter.

Theory:

  • some bug in the scheduler (possibly triggered due to isolcpus) puts most vCPUs on CPU 0 and thus creates high contention for that CPU and high steal values within the VMs

Environment

Red Hat OpenStack Enterprise Linux Platform 7.0

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content