Bind fails to resolve external IPv4 recursive queries periodically in RHEL
Issue
- Two of the nameservers in a particular data center are slow to resolve some external recursive IPv4 record queries or time out, and will eventually require restarting the named service before they can successfully resolve A records in "problem" domains again.
- Occurs consistently with some "problem" domains, and sporadically with other domains. Specific examples of "problem" domains include www.example.com, www.example.com and others.
- Impact: This causes mail to fail to be delivered in the event one of the failed queries is in response to an MX lookup.
- The problem is more apparent on A records with very short TTL (time to live) values since they cannot be cached longer than the TTL allows.
- Using
nscd flushhas no effect.
Environment
- Red Hat Enterprise Linux (RHEL)
- Note: Since this is a network issue, this issue can occur with any RHEL release..
- bind nameserver configured to perform recursion of external records for clients
-
bind configuration (/etc/named.conf - zone declarations omitted)
options {
directory "/";
allow-transfer { trustedslaves; };
allow-recursion { recursive; };
auth-nxdomain no;
version "cowbell++";
max-ncache-ttl 10;
statistics-file "/var/log/named.stats";
memstatistics-file "/var/log/named.memstats";
zone-statistics yes;
dump-file "/var/log/named.dump";
recursing-file "/var/log/named.recursing";
querylog yes;
notify no;
listen-on port 53 {
any;
};
};
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.