Improve DNS lookup behaviour #2277

flub · 2024-05-10T07:32:49Z

IIUC the hickory DNS configuragion we use ends up with doing DNS requests using UDP with a 5 seconds timeout with two attempts, i.e. one retry. These retries are sequential, so the full timeout is 10 seconds.

If the first DNS request is lost, this is essentially fatal for us, netcheck as a whole has a timeout of 5s so will not be able to resolve a relay URL. The relay client has no specific timeout I think, so might be more successful in connecting (e.g. when connecting to a relay extracted from a NodeAddr).

Given the unreliability of UDP we should probably adopt a similar strategy as how netcheck does it's probes for DNS: perform multiple requests in parallel, but stagger their start time. The aim is that most requests never send the second query, but should things be slow or a request is lost there are backup requests happening faster than after 5 seconds.

Some rather ad-hoc testing gave lookup times in the range of 50ms - 200ms on various public DNS servers. Which leads me to suggest the following strategy:

1st request at T+0ms
2nd request at T+200ms
3rd request at T+300ms

And each request with a 3s timeout.

This may be a solution to #2086 as well.

divagant-martian · 2024-05-16T03:29:07Z

Just a heads up: As far as I can tell hickory does not allow us to specify individual name servers per query. This means that the most direct way to stagger concurrent requests to a same server is to stagger calls to hickory's lookup (or similar) functions.

What does this mean?
This means that if we have multiple servers, after shuffling, the staggered calls won't necessarily mean that the udp queries itself will follow the staggering strategy to the letter. I looked into hickory-proto to use a more direct approach but not only would this be a far bigger effort, we would lose certain aspects hickory already handles for us like backed off retries, caching, etc.

If going with the imperfect, least complex solution seems agreeable I think it should be simple to get done quickly

flub added the c-iroh label May 10, 2024

github-project-automation bot added this to iroh May 10, 2024

divagant-martian self-assigned this May 13, 2024

divagant-martian moved this to 🏗 In progress in iroh May 13, 2024

divagant-martian mentioned this issue May 14, 2024

feat(iroh-net): small improvements to dns behaviour #2290

Closed

4 tasks

divagant-martian linked a pull request May 20, 2024 that will close this issue

feat(iroh-net)!: improve dns behaviour by staggering requests #2313

Merged

4 tasks

ramfox added this to the v0.17.0 milestone May 22, 2024

divagant-martian closed this as completed in #2313 May 22, 2024

github-project-automation bot moved this from 🏗 In progress to ✅ Done in iroh May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve DNS lookup behaviour #2277

Improve DNS lookup behaviour #2277

flub commented May 10, 2024

divagant-martian commented May 16, 2024

Improve DNS lookup behaviour #2277

Improve DNS lookup behaviour #2277

Comments

flub commented May 10, 2024

divagant-martian commented May 16, 2024