Chapter 4. Debugging tips

4.1. Introspection debugging

  • Check your introspection DHCP range and NICs in your undercloud.conf file. If either of these values are incorrect, fix them and rerun the openstack undercloud install command.
  • Ensure you are not trying to introspect more than your DHCP range of nodes can allow. Also remember that the DHCP lease for each node will still be active for approximately two minutes after introspection finishes.
  • If all nodes fail introspection, ensure that you can ping target nodes over the native VLAN using the configured NIC and that the out-of-band interface credentials and addresses are correct.
  • For debugging specific nodes, watch the console when the node boots and observe introspection commands to the node. If the node stops before completing the PXE process, check the connectivity, IP allocation, and the network load. When a node exits the BIOS and boots the introspection image, failures are rare and almost exclusively connectivity issues. Ensure that the heartbeat from the introspection image is not interrupted on its way to the undercloud.

4.2. Deployment debugging

  • Any additional DHCP servers that supply addresses on the provisioning network can prevent director from inspecting and provisioning machines.
  • For DHCP or PXE issues:

    • For introspection issues, run the following command:

      sudo tcpdump -i any port 67 or port 68 or port 69
    • For deployment issues, run:

      sudo ip netns exec qdhcp tcpdump -i <interface> port 67 or port 68 or port 69
  • For failed or foreign disks, be aware of disks that do not have an Up state according to the machine’s out-of-band management. Disks can exit the Up state during a deployment cycle and change the order that your disks appear in the base operating system.
  • Use the following commands to debug failed overcloud deployments:

    • openstack stack failures list overcloud
    • heat resource-list -n5 overcloud | grep -i fail
    • less /var/lib/mistral/config-download-latest/ansible.log

    Review the output, log into the node where the failure occurs, and review the log files in /var/log/ and /var/log/containers/.