Additional log details to troubleshoot the failure overcloud stack

Solution Verified - Updated -

Environment

  • Red Hat OpenStack Platform (RHOSP) 13, 14

Issue

  • How to collect additional information which requires to troubleshoot the RHOSP 13 deployment?

Resolution

Note: For RHOSP 14 specific log refer to the following article: [RHOSP14] Additional log details to troubleshoot the failure overcloud stack

Refer to Director Deployment Guide - Troubleshooting Director Issues to troubleshoot the RHOSP13 deployment.

  1. To review overcloud nodes with its associated resources, execute the script below which extracts the required details to xxx_xxx_Overcloud_details.tar.gz and uploads the tar file to the support case which has been opened for Red Hat OpenStack:

    cat<<'EOF'>hypervisor.sh
    #!/bin/bash
    FS=$'\n' ;
    dir_name=/tmp/OSP13
    rm -rf ${dir_name}/*
    mkdir -p ${dir_name}
    
    source /home/stack/stackrc
    stackname=`openstack stack list -c "Stack Name" -f value`
    
    director() {
    source /home/stack/stackrc
    echo -e "\n\n[stack@`hostname`~]$ $1 "  | tee -a ${dir_name}/`hostname`_env.log
    eval "$1" | tee -a ${dir_name}/`hostname`_env.log
    }
    
    stack() {
    echo -e "\n\n[stack@`hostname`~]$ $1 " | tee -a ${dir_name}/`hostname`_OC.log
    eval "$1" | tee -a ${dir_name}/`hostname`_OC.log
    }
    
    os-collect() {
        source /home/stack/stackrc
        for i in `openstack server list -c Networks -f value | cut -d \= -f 2`
        do
        host_name=`eval 'ssh -l heat-admin '$i' sudo hostname -f'`
        echo -e "\n\n[heat-admin@'$host_name'~]$ sudo journalctl -u os-collect-config" | tee -a ${dir_name}/$host_name-journal.log
        eval 'ssh -l heat-admin '$i' sudo journalctl -u os-collect-config' | tee -a ${dir_name}/$host_name-journal.log
        echo -e "\n\n[heat-admin@'$host_name'~]$ sudo os-collect-config --debug --print" | tee -a ${dir_name}/$host_name-os-collect-config.log
        eval 'ssh -l heat-admin '$i' sudo os-collect-config --debug --print' | tee -a ${dir_name}/$host_name-os-collect-config.log
        eval 'ssh -l heat-admin '$i' sudo tar -czf - /etc/puppet/' > ${dir_name}/$host_name-puppet.tar.gz
        done
    }
    
    
    oc-list(){
        source /home/stack/stackrc
        echo -e "\n\n[stack@`hostname`~]$ openstack stack environment show $stackname";
        eval 'openstack stack environment show $stackname' | tee -a ${dir_name}/`hostname`_oc_env.log
        echo -e "\n\n[stack@`hostname`~]$ openstack stack file list $stackname --fit-width"
        eval 'openstack stack file list $stackname --fit-width' | tee -a ${dir_name}/`hostname`_oc_file_list.log
        echo -e "\n\n[stack@`hostname`~]$ openstack stack output list $stackname" | tee -a ${dir_name}/`hostname`_oc_output_list.log
        eval 'openstack stack output list $stackname' | tee -a ${dir_name}/`hostname`_oc_output_list.log
        openstack stack output list $stackname -c output_key -f value | while read line; do 
            echo -e "\n\n[stack@`hostname`~]$ openstack stack output show $stackname $line --fit-width" | tee -a ${dir_name}/`hostname`_oc_output_list.log;
            eval 'openstack stack output show $stackname $line --fit-width' | tee -a ${dir_name}/`hostname`_oc_output_list.log;
        done
    }
    
    os-collect;
    oc-list;
    
    director "openstack stack list"
    director "openstack server list"
    director "openstack hypervisor list"
    director "openstack baremetal node list"
    director "openstack hypervisor list"
    director "openstack flavor list --long"
    director "openstack image list --long"
    director "openstack overcloud profiles list"
    for i in `openstack hypervisor list -c "Hypervisor Hostname" -f value`; do director "openstack hypervisor show $i"; done
    for i in `openstack baremetal node list -c UUID -f value`; 
    do 
            director "openstack baremetal introspection status $i"; 
            director "openstack baremetal node show $i"; 
            director "openstack baremetal introspection interface list $i"; 
            director "openstack baremetal introspection data save $i --file /tmp/OSP13/$i.log"; 
            director "openstack baremetal port list --node $i";
            director "openstack baremetal port list --node $i -c UUID -f value | xargs -I {} openstack baremetal port show {}";     
    done
    
    stack "openstack stack event list --nested-depth 5 $stackname"
    stack "openstack stack failures list $stackname --long"
    stack "openstack stack resource list -n 5 --filter status=CREATE_FAILED $stackname"
    stack "openstack stack resource list -n 5 --filter status=IN_PROGRESS $stackname"
    stack "openstack stack resource list --filter status=CREATE_COMPLETE $stackname"
    stack "openstack stack list --nested --fit-width"
    openstack stack resource list -n 5 --filter status=CREATE_FAILED $stackname -c resource_name -f value | grep [a-zA-Z] | sort --unique | while read line; do
    stack "openstack stack resource show  --fit-width $stackname $line"
    done
    openstack stack resource list -n 5 $stackname | grep -i deployment | awk '{print $4}' |grep -v $stackname | while read line; do
    stack "openstack software deployment show $line"
    stack "openstack software deployment output show --all --long $line"
    done
    
    stack 'openstack stack hook poll --nested-depth 5 --fit-width $stackname'
    
    archive_name="`hostname`_`date '+%F_%H%m%S'`_Overcloud_details.tar.gz";
    tar -czf $archive_name ${dir_name};
    echo "Archived all data in archive ${archive_name}";
    EOF
    
  2. Execute the script:

    $ chmod +x hypervisor.sh
    $ bash hypervisor.sh
    

Diagnostic Steps

  • The OpenStack Workflow (mistral) service groups multiple OpenStack tasks into workflows. It uses a set of these workflows to perform common functions across the CLI and web UI.
  • This includes bare metal node control, validations, plan management, and overcloud deployment.
  1. OpenStack Workflow and action execution also provide robust logging of executions, To diagnosis, the workflow and action details, execute the script below and review the details in /tmp/xxx_workflow_xxx.log:

    cat<<'EOF'>mistral_action_troubleshoot.sh
    #!/bin/bash
    FS=$'\n' ;
    file=/tmp/`hostname`_workflow_`date '+%F_%H%m%S'`.log
    
    source /home/stack/stackrc
    stackname=`openstack stack list -c "Stack Name" -f value`
    
    echofun() {
      echo -e "\n\n$1" | tee -a $file
    }
    
    workflow(){
    echofun "[stack@`hostname`~]$ $1 "
    eval "$1" | tee -a $file
    }
    
    workflow "openstack workflow execution list --fit-width"
    for i in `openstack workflow execution list | grep "ERROR" | awk '{print $2}'`;
    do 
    workflow "openstack workflow execution show --fit-width $i";
    workflow "openstack workflow execution output show $i";
    done
    
    workflow "openstack action execution list --fit-width"
    for i in `openstack action execution list -c ID -f value`
    do
    workflow "openstack action execution show $i --fit-width"
    workflow "openstack action execution output show $i"
    done
    EOF
    
  2. Execute the script:

    $ chmod +x mistral_action_troubleshoot.sh
    $ bash mistral_action_troubleshoot.sh
    
  3. If this is for a case which was opened to Red Hat support, attach the resulting tar archive to the case.

  4. Generate and provide a sosreport of the node which has impacted:

    sosreport -a --batch --all-logs
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments