You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IIUC the hickory DNS configuragion we use ends up with doing DNS requests using UDP with a 5 seconds timeout with two attempts, i.e. one retry. These retries are sequential, so the full timeout is 10 seconds.
If the first DNS request is lost, this is essentially fatal for us, netcheck as a whole has a timeout of 5s so will not be able to resolve a relay URL. The relay client has no specific timeout I think, so might be more successful in connecting (e.g. when connecting to a relay extracted from a NodeAddr).
Given the unreliability of UDP we should probably adopt a similar strategy as how netcheck does it's probes for DNS: perform multiple requests in parallel, but stagger their start time. The aim is that most requests never send the second query, but should things be slow or a request is lost there are backup requests happening faster than after 5 seconds.
Some rather ad-hoc testing gave lookup times in the range of 50ms - 200ms on various public DNS servers. Which leads me to suggest the following strategy:
Just a heads up: As far as I can tell hickory does not allow us to specify individual name servers per query. This means that the most direct way to stagger concurrent requests to a same server is to stagger calls to hickory's lookup (or similar) functions.
What does this mean?
This means that if we have multiple servers, after shuffling, the staggered calls won't necessarily mean that the udp queries itself will follow the staggering strategy to the letter. I looked into hickory-proto to use a more direct approach but not only would this be a far bigger effort, we would lose certain aspects hickory already handles for us like backed off retries, caching, etc.
If going with the imperfect, least complex solution seems agreeable I think it should be simple to get done quickly
IIUC the hickory DNS configuragion we use ends up with doing DNS requests using UDP with a 5 seconds timeout with two attempts, i.e. one retry. These retries are sequential, so the full timeout is 10 seconds.
If the first DNS request is lost, this is essentially fatal for us, netcheck as a whole has a timeout of 5s so will not be able to resolve a relay URL. The relay client has no specific timeout I think, so might be more successful in connecting (e.g. when connecting to a relay extracted from a NodeAddr).
Given the unreliability of UDP we should probably adopt a similar strategy as how netcheck does it's probes for DNS: perform multiple requests in parallel, but stagger their start time. The aim is that most requests never send the second query, but should things be slow or a request is lost there are backup requests happening faster than after 5 seconds.
Some rather ad-hoc testing gave lookup times in the range of 50ms - 200ms on various public DNS servers. Which leads me to suggest the following strategy:
And each request with a 3s timeout.
This may be a solution to #2086 as well.
The text was updated successfully, but these errors were encountered: