Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iroh-net: When guessing at a direct address we do not always send to the same one #2480

Closed
flub opened this issue Jul 8, 2024 · 4 comments · Fixed by #2509
Closed

iroh-net: When guessing at a direct address we do not always send to the same one #2480

flub opened this issue Jul 8, 2024 · 4 comments · Fixed by #2509
Assignees
Labels
bug Something isn't working c-iroh
Milestone

Comments

@flub
Copy link
Contributor

flub commented Jul 8, 2024

When we have a relay connection and some direct addresses, but have not yet figured out any information about those direct addresses - like whether they work- we try to pick a random direct address from the list and send to relay + this guess at the same time.

The idea is that maybe we picked a usable address and it just works and we figure this out and start using this path for real.

However we do not stably select the direct address path, so each time we send a packet a new direct address path is being selected. That's rather useless.

This is the bug:

.choose_stable(&mut rand::thread_rng())

@flub flub added bug Something isn't working c-iroh labels Jul 8, 2024
@flub flub added this to the v0.21.0 milestone Jul 8, 2024
@flub flub added this to iroh Jul 8, 2024
@flub flub self-assigned this Jul 8, 2024
@flub
Copy link
Contributor Author

flub commented Jul 8, 2024

This is on the hot-path of .poll_send(), an RNG is too much overhead for this.

@zh522130
Copy link
Contributor

zh522130 commented Jul 9, 2024

This has led to another issue: when my server has multiple IP addresses, only one of them can successfully punch through (in my tests, it's the IPV6 address), with most being LAN addresses. Due to the random selection of direct connection addresses currently, the probability of the correct IP address for successful punching being chosen is very low, relying entirely on luck. Sometimes, it takes a long time before my IPV6 is used for testing (I even began to wonder if the protocol dislikes IPV6, haha). I have attached a complete log. In my opinion, for early punching, LAN IPs might need a higher priority, but since LAN success rates are already high, if a LAN connection is not successful within a certain time (as soon as possible), more opportunities for punching through with public IPs should be given. This is also related to the #2317

logs_1720497577.log

@zh522130
Copy link
Contributor

zh522130 commented Jul 9, 2024

I'm not sure if this is related to the issue, but my thought is that the address selection strategy might need to be optimized.

@ramfox ramfox moved this to 📋 Backlog in iroh Jul 10, 2024
@flub
Copy link
Contributor Author

flub commented Jul 10, 2024

This has led to another issue: when my server has multiple IP addresses, only one of them can successfully punch through (in my tests, it's the IPV6 address), with most being LAN addresses. Due to the random selection of direct connection addresses currently, the probability of the correct IP address for successful punching being chosen is very low, relying entirely on luck. Sometimes, it takes a long time before my IPV6 is used for testing (I even began to wonder if the protocol dislikes IPV6, haha). I have attached a complete log. In my opinion, for early punching, LAN IPs might need a higher priority, but since LAN success rates are already high, if a LAN connection is not successful within a certain time (as soon as possible), more opportunities for punching through with public IPs should be given. This is also related to the #2317

logs_1720497577.log

I think this is a misunderstanding, for holepunching it does not guess a random address. It uses all the addresses for holepunching. It does take a long time to holepunch in this log and I need need check it out more to fully understand. But let's move this part back to #2317.

github-merge-queue bot pushed a commit that referenced this issue Jul 18, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
## Description

When we had any direct addresses but not yet a best address we would
randomly try to pick one and send to it on the off chance it worked.
This required to always pick the same direct addr to send to which was
not happening and a bug.

However the scenarios in which this would speed up holepunching were
very unlikely: only really if a single direct addr known to work was
added manually would this improve things.

Simply removing this feature is rather nice:

- The direct connection still happens very quickly in the best-case
  scenario: a single ping-pong roundtrip is all it costs.  No relay
  server or discovery mechanism needed so no functionality change.

- In all other scenarios there is no change.  Connection establishment
  behaves exactly the same as before.

- There are no more confusing connection type changes at startup.
  This would go relay -> mixed (-> mixed a few times more due to the
  bug) -> direct.  Now it just goes relay -> direct which is intuitively
  what people would expect.

The simplest way to test this is by running doctor accept and connect
with direct addrs only and observe the holepunching events emitted.


## Breaking Changes

none

## Notes & open questions

This is an alternative to #2487.  And my preferred solution.

Fixes #2480.

See #2512 for the flaky test.

test_two_devices_roundtrip_network_change managed to hit a timeout on
cross CI.  Give it some more time.

## Change checklist

- [x] Self-review.
- [x] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [x] Tests if relevant.
- [x] All breaking changes documented.
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in iroh Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working c-iroh
Projects
Archived in project
2 participants