Nova hypervisor State is fluctuating

Solution Verified - Updated -

Environment

Red Hat OpenStack Platform 10.0

Issue

  • Hypervisor state seems to be fluctuating every now and then

    • Intially

      [stack@ospd-spr-c1s1 ~]$ nova hypervisor-list
      +----+------------------------------------+-------+---------+
      | ID | Hypervisor hostname                | State | Status  |
      +----+------------------------------------+-------+---------+
      | 3  | sprint-dpdk-compute-6.localdomain  | up    | enabled |
      | 6  | sprint-dpdk-compute-2.localdomain  | up    | enabled |
      | 9  | sprint-dpdk-compute-4.localdomain  | down  | enabled |
      | 12 | sprint-dpdk-compute-10.localdomain | down  | enabled |
      | 15 | sprint-dpdk-compute-8.localdomain  | up    | enabled |
      | 18 | sprint-dpdk-compute-7.localdomain  | down  | enabled |
      | 21 | sprint-dpdk-compute-0.localdomain  | down  | enabled |
      | 24 | sprint-dpdk-compute-3.localdomain  | down  | enabled |
      | 27 | sprint-dpdk-compute-5.localdomain  | down  | enabled |
      | 30 | sprint-dpdk-compute-1.localdomain  | down  | enabled |
      | 33 | sprint-dpdk-compute-9.localdomain  | down  | enabled |
      +----+------------------------------------+-------+---------+
      [stack@ospd-spr-c1s1 ~]$
      
    • Later

      [stack@ospd-spr-c1s1 ~]$ nova hypervisor-list
      +----+------------------------------------+-------+---------+
      | ID | Hypervisor hostname                | State | Status  |
      +----+------------------------------------+-------+---------+
      | 3  | sprint-dpdk-compute-6.localdomain  | down  | enabled |
      | 6  | sprint-dpdk-compute-2.localdomain  | down  | enabled |
      | 9  | sprint-dpdk-compute-4.localdomain  | up    | enabled |
      | 12 | sprint-dpdk-compute-10.localdomain | up    | enabled |
      | 15 | sprint-dpdk-compute-8.localdomain  | up    | enabled |
      | 18 | sprint-dpdk-compute-7.localdomain  | up    | enabled |
      | 21 | sprint-dpdk-compute-0.localdomain  | up    | enabled |
      | 24 | sprint-dpdk-compute-3.localdomain  | down  | enabled |
      | 27 | sprint-dpdk-compute-5.localdomain  | down  | enabled |
      | 30 | sprint-dpdk-compute-1.localdomain  | up    | enabled |
      | 33 | sprint-dpdk-compute-9.localdomain  | down  | enabled |
      +----+------------------------------------+-------+---------+
      

Resolution

  • Please ensure the NTP is synced on all nodes.
  • Every node points to same NTP server.
  • To provide NTP details at time of deployment please refire our NTP KBase article.

Root Cause

  • At very first check the NTP server if its properly synced just run simple command ntptime and the result should be OK

  • Most of the time NTP is main cause since overcloud nodes are not synced so it become difficult for services to communicate due to time difference.

Diagnostic Steps

  • Upon checking ntpstat it was in error state

    ntp_gettime() returns code 5 (ERROR)
      time e06bc70b.6a813420  Thu, Apr 25 2019  5:41:31.416, (.416034384),
      maximum error 16000000 us, estimated error 16000000 us, TAI offset 0
    ntp_adjtime() returns code 5 (ERROR)
      modes 0x0 (),
      offset 0.000 us, frequency -5.634 ppm, interval 1 s,
      maximum error 16000000 us, estimated error 16000000 us,
      status 0x2041 (PLL,UNSYNC,NANO),
      time constant 3, precision 0.001 us, tolerance 500 ppm,
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments