Reliability/speed of dns lookups
Hi,
Quick dns lookups is critical for many applications. I consider to change the default timeout option in
/etc/resolv.conf from 5 seconds to something less (maybe 1 second?) to reduce the impact of a
unresponsive dns-server. Also the "rotate" option will help a bit.
Still with these changes a 1 second delay for at least 50% of the lookups is very slow and will
influence application performance a lot. Normally a reply is probably received within 10 milliseconds
(only lookups within the organization are performed with a fast lan/wan between the resolver and the
dns-servers).
I'm a bit surprised that the resolver in libc is not more sophisticated.
Wouldn't it be quite simple to implement som sort of blacklist of non responding dns-servers.
For instance if the first dns-server in resolv.conf did not reply within the configured timeout, the
resolver could send the next queries directly to the second and third dns servers in resolv.conf.
After a predefined number of seconds the first one could be tried again (maybe increasing the number of seconds every time
to a maximum like for instance 3600).
To avoid problems like this I see that people suggest many solutions like nscd, unbound, load balancing/failover of
dns-servers etc, but that may not be easy to implement in all cases.
A bit more robustness in the libc resolver would maybe have been better/safer in many cases.
How do you solve this?
Best regards,
Erling Ringen Elvsrud
Responses
Hi,
I have had similar frustration in the path when failing RHEL VM's between sites. I have tried the same resolv.conf configuration items as yourself and found they have very little impact on DNS sensitive applications starting on boot.
My solution (workaround / hack) was to write a startup script that starts immediately after the networking comes up on boot and carries out the following:
- Checks that the primary NIC is up (ie. NIC used to reach DNS servers)
- Return a list of DNS servers listed in /etc/resolv.conf (commented out or not.. makes sense in the next step)
- Loop through each server in the list and determine if it is up (using your chosen method, eg. ping)
3a. If it is up make sure the DNS server is uncommented in /etc/resolv.conf
3b. If it is down, comment it out of /etc/resolv.conf - If the minimum number of DNS servers (configurable) aren't returned as 'up' from the list, repeat the loop for X number of tries before giving up
Although this sounds long winded, it solved the problem for me as a server was quickly able to remove servers from resolv.conf that it couldn't get to during boot which in turn mean that applications that started later in the boot process weren't at the mercy of resolv.conf having an unreachable entry.
Ideally i'd like the option to check multiple DNS servers in parallel and then just provide the quickest response.
Replying to myself... of 4 years ago.... helping myself out?
I did end up coming up with a far more robust solution that queries DNS servers in parallel.. and uses the quickest response to answer the query.
The basic process is:
1. Install dnsmasq
yum install -y dnsmasq
2. Create a dnsmasq file that contains your DNS servers (eg. /etc/resolv.dnsmasq)
nameserver 10.0.0.2
nameserver 10.0.0.3
nameserver 10.0.0.4
nameserver 10.0.0.4
3. Configure dnsmasq to use your dnsmasq specific resolv file
resolv-file=/etc/resolv.dnsmasq
4. Configure resolv.conf to use the dnsmasq resolver on localhost (127.0.0.1)
search pixeldrift.local
nameserver 127.0.0.1
5. This is the critical step... Create a replacement systemd service file to provide an extra parameter to dnsmasq on startup. This will change the behaviour of dnsmasq to hit all DNS servers concurrently and use the first answer.
[Service]
ExecStart=/usr/sbin/dnsmasq -k --all-servers
Note: This will have impact on your DNS traffic, as it will send a DNS request to every server in your resolv.dnsmasq, this can be overcome by implementing caching in dnsmasq.
hi guys , have we concluded this topic ? I would also like to know the possible way of failover from primary dns to secondary dns server. i am facing production outage issue as failover taking more than 10 seconds.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
