12.6. Troubleshooting the Overcloud after Creation
12.6.1. Overcloud Stack Modifications
overcloudstack through the director. Example of stack modifications include:
- Scaling Nodes
- Removing Nodes
- Replacing Nodes
overcloudHeat stack. In particular, use the following command to help identify problematic resources:
heat stack-list --show-nested
- List all stacks. The
--show-nesteddisplays all child stacks and their respective parent stacks. This command helps identify the point where a stack failed.
heat resource-list overcloud
- List all resources in the
overcloudstack and their current states. This helps identify which resource is causing failures in the stack. You can trace this resource failure to its respective parameters and configuration in the Heat template collection and the Puppet modules.
heat event-list overcloud
- List all events related to the
overcloudstack in chronological order. This includes the initiation, completion, and failure of all resources in the stack. This helps identify points of resource failure.
12.6.2. Controller Service Failures
pcs) command is a tool that manages a Pacemaker cluster. Run this command on a Controller node in the cluster to perform configuration and monitoring functions. Here are few commands to help troubleshoot Overcloud services on a high availability cluster:
- Provides a status overview of the entire cluster including enabled resources, failed resources, and online nodes.
pcs resource show
- Shows a list of resources on their respective nodes.
pcs resource disable [resource]
- Stop a particular resource.
pcs resource enable [resource]
- Start a particular resource.
pcs cluster standby [node]
- Place a node in standby mode. The node is no longer available in the cluster. This is useful for performing maintenance on a specific node without affecting the cluster.
pcs cluster unstandby [node]
- Remove a node from standby mode. The node becomes available in the cluster again.
12.6.3. Compute Service Failures
- View the status of the service using the following
$ sudo systemctl status openstack-nova-compute.serviceLikewise, view the
systemdjournal for the service using the following command:
$ sudo journalctl -u openstack-nova-compute.service
- The primary log file for Compute nodes is
/var/log/nova/nova-compute.log. If issues occur with Compute node communication, this log file is usually a good place to start a diagnosis.
- If performing maintenance on the Compute node, migrate the existing virtual machines from the host to an operational Compute node, then disable the node. See Section 7.8, “Migrating VMs from an Overcloud Compute Node” for more information on node migrations.