System resolver not completing or timing out (locks)
We discovered a problem in our infra leveraging dnsmasq as a local cache and resolv.conf.
The resolv.conf looks like...
nameserver 127.0.0.1
nameserver 172.0.0.81
options timeout:1 attempts:2
And dnsmasq.conf containing...
port=53
no-resolv
no-poll
no-negcache
interface=lo
cache-size=2000
server=172.0.0.80
The intention is for local applications to benefit from a local dnsmasq
cache (and its abilities to select a different resolver for specific domains/etc). However, the problem we face is when 172.0.0.80 does not reply (networking issue), dnsmasq does not reply either, and the system resolver also does not complete, while the expected behavior would have been for the system to try both dnsmasq (127.0.0.1 - which forwards to 172.0.0.80) and remote nameserver (172.0.0.81).
Example:
nslookup testfqdn.test
;; Got SERVFAIL reply from 172.0.0.81, trying next server
[blocks there, nslookup never completes]
Another example are java applications using the native name-service. One java thread would block on the native call (libc), the other threads are block on the lookupTable mutex, and the application would eventually deadlock.
Is there a way to get dnsmasq to timeouts its query to upstream servers, and respond with an error to the client?
Or is there a way for the system resolver to not wait eternally after dnsmasq?