Instance with high CPU Steal due to most instances being scheduled on the same pCPU in Red Hat OpenStack Platform

Solution In Progress - Updated -

Issue

Very high CPU steal on several instances.

Symptoms:

  • instances show more than 50% CPU steal regularly.
  • most instances run their vCPUs on pCPU 0 of the hypervisors (run this on all hypervisors to find the top scheduled CPUs at a given moment)
virsh list | awk '{print $2}' | xargs -I {} virsh vcpuinfo {} | egrep '^CPU\:' | awk '{print $NF}' | sort | uniq -c | sort -nr
nova4 | SUCCESS | rc=0 >>
     39 0
      9 8
      9 6
(...)
nova7 | SUCCESS | rc=0 >>
     55 0
      4 21
      4 13
      3 22
      3 18
(...)
nova8 | SUCCESS | rc=0 >>
     44 0
      5 9
      4 5
      4 4
(...)
nova10 | SUCCESS | rc=0 >>
     43 0
      7 9
      6 8
      6 7
      6 5
(...)
nova14 | SUCCESS | rc=0 >>
     21 0
      3 21
      2 9
      2 7
(...)

Other details about environment:

  • isolcpus is configured in grub for pCPUs 0 to 3
  • it seems that most vCPUs get mostly scheduled on CPU 0 of the hypervisors
  • these hypervisors were configured with isolcpus kernel command line parameter.

Theory:

  • some bug in the scheduler (possibly triggered due to isolcpus) puts most vCPUs on CPU 0 and thus creates high contention for that CPU and high steal values within the VMs

Environment

Red Hat OpenStack Enterprise Linux Platform 7.0

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In