Execution Node becomes "Unavailable" and fails to run Ansible Automation Platform Jobs
Environment
- Red Hat® Ansible Automation Platform 2.x
Issue
-
In Ansible Automation Platform the jobs running via execution nodes fails with the i/o timeout errors.
-
The Execution Node status in the Automation Platform UI Instances shows "Unavailable".
Resolution
-
Restart the receptor services on all the Controller nodes and execution nodes ( which shows Unavailable status) with the following command:
# systemctl restart receptor # systemctl status receptor
Diagnostic Steps
-
The connectivity to the execution nodes can be checked via receptor service with the help of following command:
# receptorctl --socket /var/run/awx-receptor/receptor.sock node_ip/fqdn ping --count 5 -
The below error can be observed inside /var/log/receptor/receptor.log :
ERROR 2024/04/03 03:37:26 Status file has disappeared for OmpjgDqX. INFO 2024/04/03 03:37:27 Running control service control ERROR 2024/04/03 03:37:27 Status file has disappeared for OmpjgDqX. INFO 2024/04/03 03:37:27 Initialization complete ERROR 2024/04/03 05:29:05 Status file has disappeared for OmpjgDqX. INFO 2024/04/03 05:29:06 Running control service control ERROR 2024/04/03 05:29:06 Status file has disappeared for OmpjgDqX. INFO 2024/04/03 05:29:06 Initialization complete ERROR 2024/04/03 05:39:41 Backend sending error read tcp 172.xx.xx.xx:27199->172.xx.xx.xx.xx:33114: i/o timeout ERROR 2024/04/03 05:42:27 Backend sending error read tcp 172.xx.xx.xx.xx:27199->172.xx.xx.xx.xx:33002: i/o timeout
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments