Execution Node becomes "Unavailable" and fails to run Ansible Automation Platform Jobs

Solution Verified - Updated -

Environment

  • Red Hat® Ansible Automation Platform 2.x

Issue

  • In Ansible Automation Platform the jobs running via execution nodes fails with the i/o timeout errors.

  • The Execution Node status in the Automation Platform UI Instances shows "Unavailable".

Resolution

  • Restart the receptor services on all the Controller nodes and execution nodes (which shows Unavailable status) with the following command:

    #  systemctl restart receptor
    #  systemctl status receptor 
    
  • Ensure bidirectional firewall access on TCP port 27199 between Controller and Execution nodes for Receptor communication.

Diagnostic Steps

  • The connectivity to the execution nodes can be checked via receptor service with the help of following command:

    • In older versions of AAP 2.4 and below
    # receptorctl --socket /var/run/awx-receptor/receptor.sock ping <node_ip/fqdn> --count 5
    
    • In newer versions
    # receptorctl --socket /var/run/receptor/receptor.sock ping <node_ip/fqdn> --count 5
    

    NOTE: Replace node_ip/fqdn of Execution node instance in above command based on awx-manage list_instances command output

  • The below error can be observed inside /var/log/receptor/receptor.log:

    ERROR 2024/04/03 03:37:26 Status file has disappeared for OmpjgDqX.
    INFO 2024/04/03 03:37:27 Running control service control
    ERROR 2024/04/03 03:37:27 Status file has disappeared for OmpjgDqX.
    INFO 2024/04/03 03:37:27 Initialization complete
    ERROR 2024/04/03 05:29:05 Status file has disappeared for OmpjgDqX.
    INFO 2024/04/03 05:29:06 Running control service control
    ERROR 2024/04/03 05:29:06 Status file has disappeared for OmpjgDqX.
    INFO 2024/04/03 05:29:06 Initialization complete
    ERROR 2024/04/03 05:39:41 Backend sending error read tcp 172.xx.xx.xx:27199->172.xx.xx.xx.xx:33114: i/o timeout
    ERROR 2024/04/03 05:42:27 Backend sending error read tcp 172.xx.xx.xx.xx:27199->172.xx.xx.xx.xx:33002: i/o timeout
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments