[RHOSP14] Additional log details to troubleshoot the failure overcloud stack

Solution Verified - Updated -

Environment

  • Red Hat OpenStack v.14

Issue

  • How to collect additional information which requires to troubleshoot the RHOSP14 deployment?

Resolution

  • In Red Hat OpenStack 14 ships with an Ansible dynamic inventory creation script called tripleo-ansible-inventory which help to include all Undercloud and Overcloud hosts into Ansible inventory.
  • Refer Director Deployment Guide to troubleshoot the Red Hat OpenStack 14 deployment.
  • In Red Hat OpenStack 14 Ansible is used to replace the communication and transport of the software configuration deployment data between Heat and the Heat agent os-collect-config on the overcloud nodes.
  • Instead of os-collect-config running on each overcloud node and polling for deployment data from Heat, the Ansible control node applies the configuration by running ansible-playbook with an Ansible inventory file and a set of playbooks and tasks.
  • config-download is the feature name that enables using Ansible in this manner, and will often be used to refer to the method detailed in this documentation.
  • The difference with config-download is that although Heat creates all the deployment data necessary via SoftwareDeployment resources to perform the overcloud installation and configuration, it does not apply any of the software deployments. The data is only made available via the Heat API. Once the stack is created, an additional config-download Mistral workflow is triggered that downloads all of the deployment data from Heat.
  • Using the downloaded deployment data, the workflow then generates Ansible playbooks and tasks that are used by the undercloud to complete the configuration of the overcloud using ansible-playbook.

  • This diagram details the overall sequence of how using config-download completes an overcloud deployment:
    alt text

  • In Red Hat OpenStack 14, Ansible use Dynamic inventory of hosts makes it easier to do administrative and troubleshooting tasks against infrastructure in a repeatable way. It helps manage things like server restarts, log gathering, and environment validation.
    Disclaimer: Links contained herein to the external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

    Part A

  • To review and troubleshoot the existing overcloud stack, here’s an example script, run on the director node, to get stack information, puppet debug details and Ansible’s dynamic inventory details of each node quickly.

    cat<<'EOF'>>osp14_hypervisor.sh
    #!/bin/bash
    FS=$'\n' ;
    dir_name=/tmp/OSP14
    rm -rf ${dir_name}/*
    mkdir -p ${dir_name}
    
    source /home/stack/stackrc
    stackname=`openstack stack list -c "Stack Name" -f value`
    
    director() {
      echo -e "\n\n[stack@`hostname`~]$ $1 " | tee -a ${dir_name}/`hostname`_env.log
      eval "$1" | tee -a ${dir_name}/`hostname`_env.log
    }
    
    ansible_log(){
      echo -e "\n\n[stack@`hostname`~]$ $1 " | tee -a ${dir_name}/${stackname}_ansible_details.log
      eval "$1" | tee -a ${dir_name}/${stackname}_ansible_details.log
    }
    
    puppet_debug(){
    for i in `openstack server list -c Networks -f value | cut -d \= -f 2`
    do
        host_name=`eval 'ssh heat-admin@'$i' sudo hostname -f'`
        echo -e "\n\n[heat-admin@'$host_name'~]$ $1 " | tee -a ${dir_name}/${host_name}_puppet_debug.log
        eval 'ssh heat-admin@'$i' $1' | tee -a ${dir_name}/${host_name}_puppet_debug.log
    done
    }
    
    director "openstack stack list"
    director "openstack server list"
    director "openstack hypervisor list"
    director "openstack baremetal node list"
    director "openstack baremetal introspection list"
    director "openstack hypervisor list"
    director "openstack overcloud failures"
    director "openstack overcloud status"
    director "openstack stack failures list --long overcloud"
    for i in `openstack hypervisor list -c "Hypervisor Hostname" -f value`; do director "openstack hypervisor show $i"; done
    for i in `openstack baremetal node list -c UUID -f value`; do director "openstack baremetal introspection status $i"; director "openstack baremetal node show --fit-width $i"; director "openstack baremetal introspection interface list $i"; done
    
    ansible_log 'ansible -i /usr/bin/tripleo-ansible-inventory --become -a "grep -ir status_code /var/lib/heat-config/deployed" overcloud'
    ansible_log "openstack overcloud config download --name $stackname --config-dir ${dir_name}/${stackname}_config"
    ansible_log "tripleo-ansible-inventory --list | python -m json.tool"
    ansible_log "tripleo-ansible-inventory --static-inventory ${dir_name}/static-inventory"
    
    puppet_debug "sudo hiera -c /etc/puppet/hiera.yaml step"
    puppet_debug "sudo puppet apply --debug /var/lib/tripleo-config/puppet_step_config.pp"
    
    archive_name="`hostname`_`date '+%F_%H%m%S'`_Overcloud_details.tar.gz";
    tar -czf $archive_name ${dir_name};
    echo "Archived all data in archive ${archive_name}";
    EOF
    
  • Execute the script:

    chmod +x osp14_hypervisor.sh
    ./osp14_hypervisor.sh
    
  • If this is for a case which was opened with Red Hat support, attach the resulting tar archive to the case.

    Part B
    Generate and provide a sosreport of Undercloud.

    sosreport --batch --all-logs
    
  • For advanced troubleshooting on OpenStack stack resources, refer to the following section in KB 3780301 to execute the script which helps to collect overcloud stack deployment resource and workflow execution details.

    • Diagnosis Overcloud Stack
    • Diagnosis Workflows and Executions Error

Attachments

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.