Chapter 4. Debugging recommendations and known issues

Review the following section for debugging suggestions that can help you troubleshoot your deployment.

4.1. Known issues

The following list outlines existing current limitations.

BZ#1857451 - Ansible forks value should have an upper limit and Current Calculation needs to change
By default, the Ansible playbooks in mistral are configured to use 10*CPU_COUNT forks in the ansible.cfg file. When you do not use the --limit option to limit the Ansible execution to a specific node or set of nodes and the Ansible execution is set to run on all of the existing nodes, Ansible consumes almost 100% of memory utilisation.

4.2. Introspection debugging

Review the following list of recommendations when you debug introspection.

Check your introspection DHCP range and NICs in your undercloud.conf file
If any of these values are incorrect, fix them, and rerun the openstack undercloud install command.
Ensure that you do not try to introspect more than your DHCP range of nodes can allow
The DHCP lease for each node continues to be active for approximately two minutes after introspection finishes.
Ensure that target nodes are responsive
If all nodes fail introspection, ensure that you can ping target nodes over the native VLAN by using the configured NIC and that the out-of-band interface credentials and addresses are correct.
Check the introspection commands in the console
For debugging specific nodes, watch the console when the node boots and observe introspection commands to the node. If the node stops before it completes the PXE process, check the connectivity, IP allocation, and the network load. When a node exits the BIOS and boots the introspection image, failures are rare and almost exclusively related to connectivity issues. Ensure that the heartbeat from the introspection image is not interrupted on its way to the undercloud.

4.3. Deployment debugging

Use the following recommendations when you debug a deployment.

Inspect the DHCP servers that provide addresses on the provisioning network

Any additional DHCP servers that supply addresses on the provisioning network can prevent Red Hat OpenStack Platform director from inspecting and provisioning machines.

  • For DHCP or PXE introspection issues, enter the following command:

    $ sudo tcpdump -i any port 67 or port 68 or port 69
  • For DHCP or PXE deployment issues, enter the following command:

    $ sudo ip netns exec qdhcp tcpdump -i <interface> port 67 or port 68 or port 69
Check the state of your failed or foreign disks
For failed or foreign disks, check the state of your disks to ensure that, according to the out-of-band management of the machine, the state of the failed or foreign disks is set to Up. Disks can exit the Up state during a deployment cycle and change the order that your disks appear in the base operating system.
Use the following commands to debug failed overcloud deployments
  • openstack stack failures list overcloud
  • heat resource-list -n5 overcloud | grep -i fail
  • less /var/lib/mistral/config-download-latest/ansible.log

To review the output of the commands, log in to the node where the failure occurs and review the log files in /var/log/ and /var/log/containers/.