Undercloud. New nova instances were deleted after failed scale-out. Scale-out retries are now failing

Solution Verified - Updated -

Issue

After failed compute scale out, new undercloud nova instances, that represented new compute nodes and were in ERROR and ACTIVE states, were deleted. Original scale-out failure occurred due to a small timeout, but the same situation may be faced under different conditions. After deleting Nova instances it is impossible to re-run the scale out due to the following Ironic errors (original output was parsed to improve readability):

2018-07-06 19:35:59.016 7273 ERROR ironic.drivers.modules.ipmitool
    [req-1ad88caa-6d60-4f8b-b2e3-8445684e5f01 - - - - -]
        IPMI Error while attempting "ipmitool -I lanplus -H 192.168.168.1 -L ADMINISTRATOR -U root -R 3 -N 5 -f /tmp/tmpqj2KRZ power status"
            for node OLD_NODE_UUID. Error: Unexpected error while running command.
        Stderr: u'> Error: no response from RAKP 1 message\n>
                            Error: no response from RAKP 3 message\n>
                            Error: no response from RAKP 1 message\n
                            Set Session Privilege Level to ADMINISTRATOR failed\n
                            Error: Unable to establish IPMI v2 / RMCP+ session\n'
2018-07-06 19:35:59.017 7273 WARNING ironic.drivers.modules.ipmitool
    [req-1ad88caa-6d60-4f8b-b2e3-8445684e5f01 - - - - -]
        IPMI power status failed for node OLD_NODE_UUID with error: Unexpected error while running command.
        Stderr: u'> Error: no response from RAKP 1 message\n>
                            Error: no response from RAKP 3 message\n>
                            Error: no response from RAKP 1 message\n
                            Set Session Privilege Level to ADMINISTRATOR failed\n
                            Error: Unable to establish IPMI v2 / RMCP+ session\n'.
2018-07-06 19:35:59.017 7273 WARNING ironic.conductor.manager
    [req-1ad88caa-6d60-4f8b-b2e3-8445684e5f01 - - - - -]
        During sync_power_state, could not get power state for node OLD_NODE_UUID, attempt 1 of 3. Error: IPMI call failed: power status..

OLD_NODE_UUID was used during failed scale-out and represents the baremetal machines that were used during scale-out to create Nova instances for new computes. At the point of time when the messages above were generated there was no such nodes in the environment (they were manually deleted from undercloud Nova and Ironic and re-introspected).

It is also essential to note that overcloud stack details observed in "heat stack-show OVERCLOUD_STACK" command contain ComputeCount parameter that is equal to a number of existing computes plus the number of new computes that wasn't really added during failed scale-out. An output of "openstack compute service list" from overcloud doesn't contain new computes.

Environment

Red Hat Openstack Platform 10

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content