I'm creating this thread to get some community feedback on how the Go
builtin resolver should work, going forward. The motivation for this post
specifically is issue 6579, and conversations with Mikio with how to best
handle it(if at all). In short, the issue says that if resolv.conf
contains two dead nameservers, then a live nameserver, every lookup will
take a long time. This is true - we are using the general default of two
tries, 5 second timeout. So, each lookup would take 20 seconds in the
above scenario. On one hand you could say "Well, fix your resolv.conf", or
on the other, let's make the lookup logic smarter.
Mikio's first resolution to this was to send all queries to all
nameservers, and use the first valid response. This adds no real time to
standard queries, and in the above complainant's scenario, it runs as fast
as the two dead entries not being there. However, I am reluctant to agree
with this approach as it adds unexpected overhead. To the vast majority of
programs, it's just not going to matter much as it's relatively small
overhead. To someone writing a program that does millions of lookups, or
even to someone doing one lookup who is watching network activity closely,
this adds a lot of unexpected overhead as now every message is being sent
three times rather than once. In summary, I consider it 'unexpected
So, here are some of my thoughts on solutions, please - I'd be interested
to hear others.
1 - Do nothing. The user should be responsible for having a stable
Pro - Expected behavior, current behavior
Con - Down nameservers will kill performance
2 - Use Mikio's first resolution - send all queries to all ns, and use the
Pro - Fastest solution
Con - Most network overhead
3 - Use Mikio's second resolution - basically, like step 2 with a delay
built in(300ms, can be more/less). So if first hasn't answered in 300ms,
send the 2nd, and so on. Still stops after receiving first response.
Initially I liked the idea(and even suggested it), but realized that many
queries can take this long - especially in long chains or that hit slow
Pro - Dramatically speeds up queries when resolver(s) down
Con - Neither fastest, and has extra overhead in allocs and network
4 - Manage the order of nameservers. In essence, each lookup would call a
function to get the list of nameservers. If the first one works, do
nothing. If it does not, call a function to set the order of nameservers
to put the working nameserver as first in list(cfg.servers), so that
subsequent calls would hit the working server first. This would need to
use a mutex/channel/similar logic to prevent races.
Pro - Intelligently sort ns order, no extra network overhead
Con - First lookup(s) slow when first nameserver(s) are down/invalid.
Order in resolv.conf is not strictly followed.
Of course, other ideas are much welcome too. The reason for my
thoughts/proposal of 4, is that currently, the rotate option is being
collected but doesn't appear to be used. Implementing rotate would be
pretty easy if 4 is implemented, since lookups no longer accesses
cfg.servers directly, but are given a slice from a function, which can
order as needed before returning(either 'rotating', or with working ns
first in line).
Look forward to hearing your thoughts.
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
For more options, visit https://groups.google.com/groups/opt_out.