-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(iroh-net)!: only call quinn_connect if a send addr is available #2225
Conversation
We also need to ensure that when we get messages from the discovery service, that the messages contain non-empty address fields before returning: |
Maybe on some level the problem is that OTOH if So I kind of think that instead of adding the APIs here to attempt to figure out if a node is reachable that we should make |
81fac77
to
8b483b8
Compare
to properly test that we have fixed the discovery bug, we need to run the test dns and pkarr relay servers, so they should be moved to `test_utils` so the code can be shared
… info for a given node_id
this method handles the logic of getting a mapped quic addr and launching discovery if it is needed
But we now allow for users to dial with just the I agree that this doesn't cover the entire solution: we should not fail when there are no available addrs inside the magicsock. At worst, the connection should timeout as you mentioned. But that was a more complicated issue, the solution of which might mean moving Discovery management down into the magicsock & I wanted to make sure we had a fix for this particular logic bug (which only showed up when trying to join a document ticket) sooner. |
a674622
to
d034423
Compare
`RelayUrlInfo` combines the `RelayUrl` with any "aliveness" and latency information we have. This is propgated up into the `ConnectionInfo`.
Description
A bug in the discovery flow assumed that if we had a mapped quic address for a
node_id
, then we had at least a relay URL or one direct address available for that node.This meant there were instances in which discovery should have been launched before attempting to dial the node, but was never launched, leading to
no UDP or relay address available for node
errors.Now we check if addresses are available for a
node_id
and launch discovery if we do not have any before we attempt to dial.We also now take into account the "alive" status of any relay URL information we have on a
node_id
when determining if we need to run discovery for thatnode_id
tests
This also refactors the test DNS server and the test Pkarr relay server to use
CleanupDropGuard
for resource cleanup. It also moves the functions that launch those servers into theiroh-net::test_utils
crate. This ended up being an unnecessary refactor (I ended up writing the test in thediscovery
crate anyway), but I left it in case we need to do other tests that rely on discovery outside of thediscovery
crate.Notes & open questions
The above issue uncovered a more serious bug: the endpoint currently dies when it attempts to dial a node without any available address information because we return the
no UDP or relay address available for node
error. We should not do this. In that situation, we should let that connection timeout or launch discovery services inside the magicsocket. That bug (#2226) is not fixed in this PR.Breaking changes
RelayUrlInfo
that combines the relay_url and additional information about the state of our connection to the remote node at this relay URL:NodeInfo.relay_url
(calledConnectionInfo
outside ofiroh-net
) is nowOption<RelayUrlInfo>
, changed fromOption<RelayUrl>
Change checklist