-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iroh-net: When guessing at a direct address we do not always send to the same one #2480
Comments
This is on the hot-path of |
This has led to another issue: when my server has multiple IP addresses, only one of them can successfully punch through (in my tests, it's the IPV6 address), with most being LAN addresses. Due to the random selection of direct connection addresses currently, the probability of the correct IP address for successful punching being chosen is very low, relying entirely on luck. Sometimes, it takes a long time before my IPV6 is used for testing (I even began to wonder if the protocol dislikes IPV6, haha). I have attached a complete log. In my opinion, for early punching, LAN IPs might need a higher priority, but since LAN success rates are already high, if a LAN connection is not successful within a certain time (as soon as possible), more opportunities for punching through with public IPs should be given. This is also related to the #2317 |
I'm not sure if this is related to the issue, but my thought is that the address selection strategy might need to be optimized. |
I think this is a misunderstanding, for holepunching it does not guess a random address. It uses all the addresses for holepunching. It does take a long time to holepunch in this log and I need need check it out more to fully understand. But let's move this part back to #2317. |
## Description When we had any direct addresses but not yet a best address we would randomly try to pick one and send to it on the off chance it worked. This required to always pick the same direct addr to send to which was not happening and a bug. However the scenarios in which this would speed up holepunching were very unlikely: only really if a single direct addr known to work was added manually would this improve things. Simply removing this feature is rather nice: - The direct connection still happens very quickly in the best-case scenario: a single ping-pong roundtrip is all it costs. No relay server or discovery mechanism needed so no functionality change. - In all other scenarios there is no change. Connection establishment behaves exactly the same as before. - There are no more confusing connection type changes at startup. This would go relay -> mixed (-> mixed a few times more due to the bug) -> direct. Now it just goes relay -> direct which is intuitively what people would expect. The simplest way to test this is by running doctor accept and connect with direct addrs only and observe the holepunching events emitted. ## Breaking Changes none ## Notes & open questions This is an alternative to #2487. And my preferred solution. Fixes #2480. See #2512 for the flaky test. test_two_devices_roundtrip_network_change managed to hit a timeout on cross CI. Give it some more time. ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] Tests if relevant. - [x] All breaking changes documented.
When we have a relay connection and some direct addresses, but have not yet figured out any information about those direct addresses - like whether they work- we try to pick a random direct address from the list and send to relay + this guess at the same time.
The idea is that maybe we picked a usable address and it just works and we figure this out and start using this path for real.
However we do not stably select the direct address path, so each time we send a packet a new direct address path is being selected. That's rather useless.
This is the bug:
iroh/iroh-net/src/magicsock/node_map/node_state.rs
Line 295 in 4640f4d
The text was updated successfully, but these errors were encountered: