Jobs fail to access hosts in Ansible Automation Platform (AAP) and an IP in the 10.0.2.* range is involved
Environment
- Ansible Automation Platform (AAP) 2.x
AND ANY OF: - DNS server has an IP address of 10.0.2.*
- failing destination has an IP address of 10.0.2.*
- Active Directory or Kerberos servers have an IP address of 10.0.2.*
Issue
- AAP jobs consistently fail to resolve any hostname
- AAP jobs consistently fail to access a subset of target nodes
- AAP jobs consistently fail to communicate with certain URLs or APIs
- AAP Execution Environments need to use a subnet other than 10.0.2.0/24
Resolution
Add a custom configuration to AAP Controller nodes so they call podman with a custom --network option.
On every Controller and Hybrid node in the AAP cluster, create a new file named /etc/tower/conf.d/ee_networking.py with the contents below:
NOTE: The contents below will set the EE subnet to 192.168.255.0/24; modify it according to your needs.
DEFAULT_CONTAINER_RUN_OPTIONS = ["--network", "slirp4netns:enable_ipv6=true,cidr=192.168.255.0/24"]
Next, ensure the /etc/tower/conf.d/ee_networking.py file has correct permissions and ownership and SELinux context by running these commands as root:
# chown awx: /etc/tower/conf.d/ee_networking.py
# chmod ug+r /etc/tower/conf.d/ee_networking.py
# restorecon -vF /etc/tower/conf.d/ee_networking.py
You may alternatively choose to add the DEFAULT_CONTAINER_RUN_OPTIONS setting above to the /etc/tower/conf.d/custom.py or any other file matching /etc/tower/conf.d/*.py instead of /etc/tower/conf.d/ee_networking.py. Just make sure ownership, permissions, and SELinux context are set as above.
Note on IPv6: The default --network flag used by AAP when running EE containers includes the ipv6_enable=true item. In case you prefer to disable IPv6 for EE containers, you may use this definition in /etc/tower/conf.d/ee_networking.py (or another file of your choice) instead:
DEFAULT_CONTAINER_RUN_OPTIONS = ["--network", "slirp4netns:cidr=192.168.255.0/24"]
Root Cause
AAP runs Execution Environments as rootless podman containers.
As described at podman.io, rootless podman containers use slirp4netns for establishing network connectivity, and slirp4netns sets a default subnet of 10.0.2.0/24.
Processes running inside EE containers such as jobs called by AAP will fail to connect to any external host that happens to have an IP address of 10.0.2.* as EE containers will not try to access such IP addresses via its default route. Connections to IP addresses of 10.0.2.* will always time out. The time out interval will depend on what kind of process is trying to access the 10.0.2.* IP address.
If a DNS server in your organization is at e.g. IP address 10.0.2.53 then AAP jobs will fail to connect to this DNS server for name resolution purposes and will usually see host connection failures from the start.
If a target host for your job is at IP address 10.0.2.<anything> then AAP jobs will fail to connect to this host.
If a Kerberos Distribution Center or an Active Directory Domain Controller in your organization is at 10.0.2.<something> then AAP jobs will fail to use kerberos for authentication to target hosts.
Diagnostic Steps
If the problem occurs with name resolution
In AAP Execution Nodes and Hybrid Nodes and Controller Nodes, check the IP addresses of DNS servers in search of IP addresses matching 10.0.2.*:
$ cat /etc/resolv.conf
nameserver 10.0.2.53 <====== this DNS server matches 10.0.2.*
nameserver 192.168.11.22
If host access fails but name resolution succeeds
Test if target hosts have an IP address of 10.0.2.* with getent hosts <target host>, e.g.:
$ getent hosts my-target-host.example.com
10.0.2.99 my-target-host.example.com # Jobs will fail to access this host
If kerberos authentication is what fails
Test if KDCs are in the 10.0.2.* range.
On Execution Nodes and Hybrid Nodes, ensure the krb5-workstation RPM is installed and run the command below to find out if Kerberos or Active Directory servers resolve to an IP address of 10.0.2.*:
NOTE: Each run of the kinit command as below will generate 1 failed kerberos authentication attempt.
NOTE 2: Replace YOUR-DOMAINwith the actual kerberos domain or realm used in your organization.
# export KRB5_TRACE=/dev/stdout
# kinit anything@YOUR-DOMAIN <<<$(echo foobar) | grep -F 10.0.2.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments