Chapter 4. Troubleshooting Director-Based Upgrades
This section provides advice for troubleshooting issues with both the undercloud and overcloud.
4.1. Undercloud Upgrades
In situations where an Undercloud upgrade command (openstack undercloud upgrade) fails, use the following advice to locate the issue blocking upgrade progress:
-
The
openstack undercloud upgradecommand prints out a progress log while it runs and saves it to.instack/install-undercloud.log. If an error occurs at any point in the upgrade process, the command halts at the point of error. Use this information to identify any issues impeding upgrade progress. The
openstack undercloud upgradecommand runs Puppet to configure Undercloud services. This generates useful Puppet reports in the following directories:-
/var/lib/puppet/state/last_run_report.yaml- The last Puppet reports generated for the Undercloud. This file shows any causes of failed Puppet actions. -
/var/lib/puppet/state/last_run_summary.yaml- A summary of thelast_run_report.yamlfile. /var/lib/puppet/reports- All Puppet reports for the Undercloud.Use this information to identify any issues impeding upgrade progress.
-
Check for any failed services:
$ sudo systemctl -t service
If any services have failed, check their corresponding logs. For example, if
openstack-ironic-apifailed, use the following commands to check the logs for that service:$ sudo journalctl -xe -u openstack-ironic-api $ sudo tail -n 50 /var/log/ironic/ironic-api.log
After correcting the issue impeding the Undercloud upgrade, rerun the upgrade command:
$ openstack undercloud upgrade
The upgrade command begins again and configures the Undercloud.
4.2. Overcloud Upgrades
In situations where an Overcloud upgrade process fails, use the following advice to locate the issue blocking upgrade progress:
Check the stack listing and identify any stacks that have an
UPDATE_FAILEDstatus. The following command identifies failed stacks:$ openstack stack failures list overcloud
View the failed stacks and its template to identify how the stack failed:
$ openstack stack show overcloud-Controller-qyoy54dyhrll-1-gtwy5bgta3np $ openstack stack template show overcloud-Controller-qyoy54dyhrll-1-gtwy5bgta3np
Check that Pacemaker is running correctly on all Controller nodes. If necessary, log into a Controller node and restart the Controller cluster:
$ sudo pcs cluster start
-
Check the configuration log files for any failures. The
/var/run/heat-config/deployed/directory on each node contains these logs. These files are named in date order and are separated into standard output (*-stdout.log) and error output (*-stderr.log).
The director performs a set of validation checks before the upgrade process to make sure the overcloud is in a good state. If the upgrade has failed and you want to retry, you might need to disable these validation checks. To disable these checks, temporarily add the SkipUpgradeConfigTags: [validation] to the parameter_defaults section of an environment file included with your overcloud.
After correcting the issue impeding the Overcloud upgrade, check that no resources have an IN_PROGRESS status:
$ openstack stack resource list overcloud -n5 --filter status='*IN_PROGRESS'
If any resources have an IN_PROGRESS status, wait until they either complete or fail.
Rerun the openstack overcloud deploy command for the failed upgrade step you attempted. This following is an example of the first openstack overcloud deploy command in the upgrade process, which includes the major-upgrade-composable-steps.yaml:
$ openstack overcloud deploy --templates \ --control-scale 3 \ --compute-scale 3 \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml \ -e network_env.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps.yaml \ --ntp-server pool.ntp.org
The openstack overcloud deploy retries the Overcloud stack update.
