[RHOSP-10] Additional log details to troubleshoot the failure overcloud enviroment

Solution Verified - Updated -

Environment

  • Red Hat OpenStack v.10

Issue

  • How to collect additional information which requires to troubleshoot for RHOSP10 deployment?

Resolution

  • Refer Director Deployment Guide to troubleshoot the RHOSP10 deployment.

  • The below script can help to collect the relevant deployment information which suffices to review the bottleneck of the issue and further resolution. So execute below script in Director which extract the required details to xxx_xxx_Overcloud_details.tar.gz.

    cat<<'EOF'>osp10_hypervisor.sh
    #!/bin/bash
    FS=$'\n' ;
    dir_name=/tmp/OSP10
    rm -rf ${dir_name}/*
    mkdir -p ${dir_name}
    
    source /home/stack/stackrc
    stackname=`openstack stack list -c "Stack Name" -f value`
    
    director() {
    source /home/stack/stackrc
    echo -e "\n\n[stack@`hostname`~]$ $1 "  | tee -a ${dir_name}/`hostname`_env.log
    eval "$1" | tee -a ${dir_name}/`hostname`_env.log
    }
    
    stack() {
    echo -e "\n\n[stack@`hostname`~]$ $1 " | tee -a ${dir_name}/`hostname`_OC.log
    eval "$1" | tee -a ${dir_name}/`hostname`_OC.log
    }
    
    os-collect() {
        source /home/stack/stackrc
        for i in `openstack server list -c Networks -f value | cut -d \= -f 2`
        do
        host_name=`eval 'ssh -l heat-admin '$i' sudo hostname -f'`
        echo -e "\n\n[heat-admin@'$host_name'~]$ sudo journalctl -u os-collect-config" | tee -a ${dir_name}/$host_name-journal.log
        eval 'ssh -l heat-admin '$i' sudo journalctl -u os-collect-config' | tee -a ${dir_name}/$host_name-journal.log
        echo -e "\n\n[heat-admin@'$host_name'~]$ sudo os-collect-config --debug --print" | tee -a ${dir_name}/$host_name-os-collect-config.log
        eval 'ssh -l heat-admin '$i' sudo os-collect-config --debug --print' | tee -a ${dir_name}/$host_name-os-collect-config.log
        eval 'ssh -l heat-admin '$i' sudo tar -czf - /etc/puppet/' > ${dir_name}/$host_name-puppet.tar.gz
        done
    }
    
    
    oc-list(){
        source /home/stack/stackrc
        echo -e "\n\n[stack@`hostname`~]$ openstack stack environment show $stackname";
        eval 'openstack stack environment show $stackname' | tee -a ${dir_name}/`hostname`_oc_env.log
        echo -e "\n\n[stack@`hostname`~]$ openstack stack file list $stackname"
        eval 'openstack stack file list $stackname' | tee -a ${dir_name}/`hostname`_oc_file_list.log
        echo -e "\n\n[stack@`hostname`~]$ openstack stack output list $stackname" | tee -a ${dir_name}/`hostname`_oc_output_list.log
        eval 'openstack stack output list $stackname' | tee -a ${dir_name}/`hostname`_oc_output_list.log
        openstack stack output list $stackname -c output_key -f value | while read line; do 
            echo -e "\n\n[stack@`hostname`~]$ openstack stack output show $stackname $line" | tee -a ${dir_name}/`hostname`_oc_output_list.log;
            eval 'openstack stack output show $stackname $line' | tee -a ${dir_name}/`hostname`_oc_output_list.log;
        done
    }
    
    os-collect;
    oc-list;
    
    director "openstack stack list"
    director "openstack server list"
    director "openstack hypervisor list"
    director "openstack baremetal node list"
    director "openstack hypervisor list"
    director "openstack flavor list --long"
    director "openstack overcloud profiles list"
    for i in `openstack hypervisor list -c "Hypervisor Hostname" -f value`; do director "nova hypervisor-show $i"; done
    for i in `openstack baremetal node list -c UUID -f value`; 
    do 
            director "openstack baremetal node show $i"; 
            director "openstack baremetal introspection data save $i --file /tmp/OSP10/$i.log"; 
            director "openstack baremetal port list --node $i";
            director "openstack baremetal port list --node $i -c UUID -f value | xargs -I {} openstack baremetal port show {}";     
    done
    
    stack "openstack stack event list --nested-depth 5 $stackname"
    stack "openstack stack failures list $stackname --long"
    stack "openstack stack resource list -n 5 --filter status=CREATE_FAILED $stackname"
    stack "openstack stack resource list -n 5 --filter status=IN_PROGRESS $stackname"
    stack "openstack stack resource list --filter status=CREATE_COMPLETE $stackname"
    stack "openstack stack list --nested "
    openstack stack resource list -n 5 --filter status=CREATE_FAILED $stackname -c resource_name -f value | grep [a-zA-Z] | sort --unique | while read line; do
    stack "openstack stack resource show  $stackname $line"
    done
    openstack stack resource list -n 5 $stackname | grep -i deployment | awk '{print $4}' |grep -v $stackname | while read line; do
    stack "openstack software deployment show $line"
    stack "openstack software deployment output show --all --long $line"
    done
    
    stack 'openstack stack hook poll --nested-depth 5  $stackname'
    
    archive_name="`hostname`_`date '+%F_%H%m%S'`_Overcloud_details.tar.gz";
    tar -czf $archive_name ${dir_name};
    echo "Archived all data in archive ${archive_name}";
    EOF
    
  • Execute the script:

    $ chmod +x osp10_hypervisor.sh
    $ bash osp10_hypervisor.sh
    
  • If this is for a case which was opened with Red Hat support, attach the resulting tar archive to the case.

Part B

  • Generate and provide a sosreport of the node which has impacted.

    sosreport -a --batch --all-logs
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments