Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(iroh-net): local swarm discovery #2376

Merged
merged 23 commits into from
Jul 8, 2024
Merged

Conversation

ramfox
Copy link
Contributor

@ramfox ramfox commented Jun 18, 2024

Description

Using swarm-discovery, this PR implements a discovery service for LocalSwarmDiscovery, allowing nodes on the same local network to discover each other without using the outside internet.

closes #2354

Notes & open questions

  1. Is LocalNodeDiscovery too much of a mouthful? swarm-discovery says, of itself, "This library offers a lightweight discovery service based on mDNS." Soooo, how comfortable are we just calling this mdns?
    edit: calling this LocalSwarmDiscovery & not mdns

  2. There are some const to bikeshed:

/// The n0 local node discovery name
const N0_MDNS_SWARM: &str = "iroh.local.node.discovery";
edit: now N0_LOCAL_SWARM: &str = "iroh.loca.swarm"

/// Provenance string
const PROVENANCE: &str = "local.node.discovery";
edit: now "local.swarm.discovery"

/// How long we will wait before we stop sending discovery items
const DISCOVERY_DURATION: Duration = Duration::from_secs(10);
  1. For the local_swarm_discovery example, I pulled some code from iroh-cli into a new iroh-progress, so that I could re-use some progress terminal output it in the example. iroh-progress might not be the right name, but what do we think of a separate crate for putting in reusable UI-ish pieces. If we aren't on board with this, I can just copy and paste the logic in for now & remove the crate.
    edit: Removed this in favor of copy and pasting the code

  2. I've added LocalSwarmDiscovery as a default discovery service, I'm assuming that's what we are going for here?

Change checklist

  • Self-review.
  • tests
  • Documentation updates if relevant.

Sorry, something went wrong.

@dignifiedquire dignifiedquire added this to the v0.19.0 milestone Jun 18, 2024
@ramfox ramfox force-pushed the ramfox/mdns_discovery_spike branch from 9587dbe to 56deadd Compare June 21, 2024 02:55
@zh522130
Copy link
Contributor

It would be great if there could be an automatic fallback to local discovery when the relay service is not reachable, rather than being forced to choose between the two.

@matheus23
Copy link
Contributor

@zh522130 AFAIK this will be the case - you can enable/disable mdns and using a relay independently of each other! And when you enable both, it'll use whatever is available to try to establish a connection :)

@dignifiedquire dignifiedquire modified the milestones: v0.19.0, v0.20.0 Jun 24, 2024
@ramfox ramfox force-pushed the ramfox/mdns_discovery_spike branch from 56deadd to 8b4c0df Compare June 25, 2024 04:03
@ramfox ramfox requested review from dignifiedquire and Frando June 25, 2024 05:00
@ramfox ramfox self-assigned this Jun 25, 2024
@ramfox ramfox added feat New feature or request c-iroh _c-iroh-legacy Formerly big iroh node with all protocols labels Jun 25, 2024
@ramfox ramfox changed the title WIP: local node discovery feat(iroh-net): local node discovery Jun 25, 2024
@matheus23
Copy link
Contributor

  1. Is LocalNodeDiscovery too much of a mouthful? swarm-discovery says, of itself, "This library offers a lightweight discovery service based on mDNS." Soooo, how comfortable are we just calling this mdns?

IIUC, there's a different way to do mDNS discovery. I'd say the two types are "swarm discovery" and "query based discovery".
Where swarm discovery tries to not explode in traffic if you have a lot of devices in your network, and query based discovery is a lot dumber: It just asks mdns anytime you're looking for something. This means potentially faster answers and no traffic if no node is looking for anything, but also potentially an explosion of traffic with lots of nodes that want to form a swarm.

I think both approaches have ups and downs - so I'd prefer not to squat the mDNS name - at least not "just" as MdnsDiscovery, but maybe MdnsSwarmDiscovery 😅 (Not a lot shorter than LocalSwarmDiscovery, but at least it mentions mdns?)

@matheus23
Copy link
Contributor

3. For the local-node-discovery example, I pulled some code from iroh-cli into a new iroh-progress, so that I could re-use some progress terminal output it in the example. iroh-progress might not be the right name, but what do we think of a separate crate for putting in reusable UI-ish pieces. If we aren't on board with this, I can just copy and paste the logic in for now & remove the crate.

iroh-cli-utils? Maybe expanding the scope from only doing progress might be a good idea.

@zh522130
Copy link
Contributor

Very good, the previous version was reporting errors during testing, but I tested the latest code today and it is now functioning properly.

iroh-net/src/discovery/local_node_discovery.rs Outdated Show resolved Hide resolved
iroh-net/src/discovery/local_node_discovery.rs Outdated Show resolved Hide resolved
iroh-net/src/discovery/local_node_discovery.rs Outdated Show resolved Hide resolved
iroh-net/src/discovery/local_node_discovery.rs Outdated Show resolved Hide resolved
iroh-net/src/discovery.rs Outdated Show resolved Hide resolved
iroh-net/src/discovery/local_node_discovery.rs Outdated Show resolved Hide resolved
iroh-net/src/discovery/local_node_discovery.rs Outdated Show resolved Hide resolved
iroh-net/src/discovery/local_node_discovery.rs Outdated Show resolved Hide resolved
iroh-net/src/discovery/local_node_discovery.rs Outdated Show resolved Hide resolved
iroh-net/src/discovery/local_node_discovery.rs Outdated Show resolved Hide resolved
iroh-cli/Cargo.toml Outdated Show resolved Hide resolved
}
Ok(msg) => msg,
};
match msg {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a single match statement

match msg {
  Err(err) => {}
  Ok(variant..) => {}
  ...
}

impl Discovery for LocalNodeDiscovery {
fn resolve(&self, _ep: Endpoint, node_id: NodeId) -> Option<BoxStream<Result<DiscoveryItem>>> {
let (send, recv) = flume::bounded(20);
self.sender.send(Message::SendAddrs((node_id, send))).ok();
Copy link
Member

@Frando Frando Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the blocking send from flume, which can potentially cause blocks in the async context if this is called a lot and the channel gets full, no?

Edit: Can this deadlock even? If the channel is full and the receiving tokio task happens to be scheduled on the same thread, then sending into the full channel will block forever, because we never yield and the receiver isn't polled unless tokio happens to move it to another thread (which is at tokio's decretion but not guaranteed).

Copy link
Member

@Frando Frando Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution is likely, I guess, either to change resolve in the Discovery trait to return a impl Future, or to spawn a tokio task here that sends the message with send_async.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to check if the sender is full before sending, and return an error if it is, rather than getting blocked?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this block? If the consumer of the stream is not sufficiently fast in consuming the events, right?

It is a stream of results. So we could check the buffer capacity. If it is less than 2, send an error. If it is less than 1 (would block), discard the event.

That way a slow receiver will get a stream

event
event
error(lagged)
event

... so they know about the issue and we are guaranteed not to block!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes more sense to me to spawn a task to wait for the send_async to complete. Returning an error in the stream would mean the discovery service would need to start handling an error about lag (which, to me, ultimately means "wait until the next event" instead of just waiting for the next event).

@ramfox ramfox changed the title feat(iroh-net): local node discovery feat(iroh-net): local swarm discovery Jul 4, 2024
@ramfox ramfox force-pushed the ramfox/mdns_discovery_spike branch 2 times, most recently from c4699c0 to cd20cf4 Compare July 5, 2024 03:08
@ramfox ramfox force-pushed the ramfox/mdns_discovery_spike branch from 7774cf7 to 896c652 Compare July 5, 2024 14:57
@Arqu
Copy link
Collaborator

Arqu commented Jul 8, 2024

/netsim branch arqu/netsim_node_not_found

Copy link

github-actions bot commented Jul 8, 2024

ramfox/mdns_discovery_spike.a5c58b51a52039f8013300922c7be286ec532c25
Perf report:

test case throughput_gbps throughput_transfer
iroh_latency_20ms 1_to_1 0.98 2.28
iroh_latency_20ms 1_to_3 2.62 5.13
iroh_latency_20ms 1_to_5 3.71 5.66
iroh_latency_20ms 1_to_10 4.18 4.79
iroh_latency_20ms 2_to_2 2.24 4.42
iroh_latency_20ms 2_to_4 3.31 5.96
iroh_latency_20ms 2_to_6 4.67 6.95
iroh_latency_20ms 2_to_10 5.66 7.51
iroh_latency_200ms 1_to_1 1.02 2.46
iroh_latency_200ms 1_to_3 2.66 4.81
iroh_latency_200ms 1_to_5 3.64 5.53
iroh_latency_200ms 1_to_10 4.27 4.91
iroh_latency_200ms 2_to_2 1.96 3.83
iroh_latency_200ms 2_to_4 3.53 6.24
iroh_latency_200ms 2_to_6 4.68 7.00
iroh_latency_200ms 2_to_10 5.56 7.11
iroh 1_to_1 1.10 2.54
iroh 1_to_3 2.78 5.94
iroh 1_to_5 4.14 7.04
iroh 1_to_10 4.08 4.62
iroh 2_to_2 2.03 4.11
iroh 2_to_4 3.38 6.24
iroh 2_to_6 4.62 6.86
iroh 2_to_10 5.40 6.82

@ramfox ramfox requested a review from matheus23 July 8, 2024 16:25
@ramfox ramfox enabled auto-merge July 8, 2024 16:34
@ramfox ramfox added this pull request to the merge queue Jul 8, 2024
Merged via the queue into main with commit 3866b6f Jul 8, 2024
26 checks passed
@Arqu Arqu deleted the ramfox/mdns_discovery_spike branch July 9, 2024 11:13
matheus23 pushed a commit that referenced this pull request Nov 14, 2024
## Description

Using [swarm-discovery](https://github.com/rkuhn/swarm-discovery), this
PR implements a discovery service for `LocalSwarmDiscovery`, allowing
nodes on the same local network to discover each other without using the
outside internet.

closes #2354

## Notes & open questions

1) Is `LocalNodeDiscovery` too much of a mouthful? `swarm-discovery`
says, of itself, "This library offers a lightweight discovery service
based on mDNS." Soooo, how comfortable are we just calling this `mdns`?
edit: calling this `LocalSwarmDiscovery` & not `mdns`

2) There are some const to bikeshed:
```rust
/// The n0 local node discovery name
const N0_MDNS_SWARM: &str = "iroh.local.node.discovery";
edit: now N0_LOCAL_SWARM: &str = "iroh.loca.swarm"

/// Provenance string
const PROVENANCE: &str = "local.node.discovery";
edit: now "local.swarm.discovery"

/// How long we will wait before we stop sending discovery items
const DISCOVERY_DURATION: Duration = Duration::from_secs(10);
```

3) For the `local_swarm_discovery` example, I pulled some code from
`iroh-cli` into a new `iroh-progress`, so that I could re-use some
progress terminal output it in the example. `iroh-progress` might not be
the right name, but what do we think of a separate crate for putting in
reusable UI-ish pieces. If we aren't on board with this, I can just copy
and paste the logic in for now & remove the crate.
edit: Removed this in favor of copy and pasting the code

4) I've added `LocalSwarmDiscovery` as a default discovery service, I'm
assuming that's what we are going for here?

## Change checklist

- [x] Self-review.
- [x] tests
- [x] Documentation updates if relevant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-iroh _c-iroh-legacy Formerly big iroh node with all protocols feat New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Add optional LAN discovery service
9 participants