How to delete stale allocations for active instances.

Solution Verified - Updated -

Environment

Red Hat OpenStack Platform 13.0

Issue

  • When an instance is evacuated from one compute to another, allocations for that particular instance remains on the previous host.
  • Deleting the stale allocations for the previous host from placement also deletes the allocations of the active instances from the current host.

Resolution

  • Delete stale allocations from placement using nova-manage command if the issue is observed in RHOSP16.1 or RHOSP13 where nova version is openstack-nova-17.0.13-16.el7ost or a higher version. Follow the steps given in the article to use nova-manage command to delete stale allocations.
  • Delete the allocations from the resource provider by following the below steps.
  • Obtain the UUID of the compute node and set it to a variable.

    # nova hypervisor-list
    # uuid="<$uuid obtained from above command>"   
    
  • Get placement endpoint and token by running the below commands.

    # PLACEMENTENDPOINT=`openstack endpoint list --service placement --interface public -f value -c URL`
    # TOKEN=`openstack token issue -f value -c id`
    
  • Get the allocations of the resource provider.

    # curl -X GET ${PLACEMENTENDPOINT}/resource_providers/${uuid}/allocations -H "X-Auth-Token:${TOKEN}" -H "Openstack-API-Version: placement latest" | jq .
    
  • If you find certain allocations in the above output then you can delete as below. Here replace with the UUID of the allocations obtained in the above output.

    #curl -X DELETE ${PLACEMENTENDPOINT}/allocations/<ID> -H "X-Auth-Token:${TOKEN}" -H "OpenStack-API-Version: placement latest"
    
  • After deleting the allocations, verify if the instances are still active.

  • Then perform a cold migration of the Instances to regain the allocations.

    # openstack server stop <UUID>
    # openstack server migrate <UUID>
    

Root Cause

  • When evacuating an instance if the compute service on the source host is stopped then the records in placement information are not deleted automatically.
  • As a result, nova-compute service cannot communicate to nova-conductor service running on the controller so details are not shared there and in results nova-compute remains in an impression that still holds the instance.

Diagnostic Steps

  • The following error is seen in nova-compute.log for an instance that was evacuated to a different compute node.

    Instance e01183b1-92d4-4bcf-a2b6-8248164acaf8 has been moved to another host compute-0.redhat.local. There are allocations remaining against the source host that might need to be removed: {u'resources': {u'VCPU': 2, u'MEMORY_MB': 1024, u'DISK_GB': 10}}.
    
  • You can see the allocation for the instance e01183b1-92d4-4bcf-a2b6-8248164acaf8 when you retrieve the placement information for the previous compute node.

    (overcloud) [stack@undercloud-0 ~]$ curl -X GET ${PLACEMENTENDPOINT}/resource_providers/e145be5c-f81a-4e11-812d-8b08f15a7183/allocations -H "X-Auth-Token:${TOKEN}" -H "Openstack-API-Version: placement latest" | jq .
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
    100   257  100   257    0     0    354      0 --:--:-- --:--:-- --:--:--   354
    {
    "allocations": {
    "fc638f78-5c0a-47ca-b2cd-c6584c1cc3d8": {
    "resources": {
      "DISK_GB": 10,
      "MEMORY_MB": 1024,
      "VCPU": 2
    }
    },
    "e01183b1-92d4-4bcf-a2b6-8248164acaf8": {
    "resources": {
      "DISK_GB": 10,
      "MEMORY_MB": 1024,
      "VCPU": 2
    }
    }
    },
    "resource_provider_generation": 48
    } 
    
  • Then you also check the placement information for the current host of the instance e01183b1-92d4-4bcf-a2b6-8248164acaf8.

    (overcloud) [stack@undercloud-0 ~]$ curl -X GET ${PLACEMENTENDPOINT}/resource_providers/2db8af7a-7c02-43a2-98c7-9345feb0d9a7/allocations -H "X-Auth-Token:${TOKEN}" -H "Openstack-API-Version: placement latest" | jq .
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
    100   461  100   461    0     0  12055      0 --:--:-- --:--:-- --:--:-- 12131
    {
    "allocations": {
    "cb47c89b-44d0-475c-89da-6facb70ebecf": {
    "resources": {
      "DISK_GB": 10,
      "MEMORY_MB": 1024,
      "VCPU": 2
    }
    },
    "e01183b1-92d4-4bcf-a2b6-8248164acaf8": {
    "resources": {
      "DISK_GB": 10,
      "MEMORY_MB": 1024,
      "VCPU": 2
    }
    }
    },
    "resource_provider_generation": 32
    }
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.