Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: add iroh_net::ConnManager and use in iroh_gossip::net #2315

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Frando
Copy link
Member

@Frando Frando commented May 21, 2024

Description

  • Introduce a ConnManager in iroh_net. Quoting the docs:

A connection manager that ensures that only a single connection between two peers prevails.
You can start to dial peers by calling [ConnManager::dial]. Note that the method only takes a
node id; if you have more addressing info, add it to the endpoint directly with
[Endpoint::add_node_addr] before calling dial;
The [ConnManager] does not accept connections from the endpoint by itself. Instead, you
should run an accept loop yourself, and push connections with a matching ALPN into the manager
with [ConnManager::handle_connection] or [ConnManager::handle_connection_sender].
The [ConnManager] is a [Stream] that yields all connections from both accepting and dialing.
Before accepting incoming connections, the [ConnManager] makes sure that, if we are dialing
the same node, only one of the connections will prevail. In this case, the accepting side
rejects the connection if the peer's node id sorts higher than their own node id.
To make this reliable even if the dials happen exactly at the same time, a single unidirectional
stream is opened, on which a single byte is sent. This additional rountrip ensures that no
double connections can prevail.

  • Use the ConnManager in iroh_gossip::net and improve connection handling

Breaking Changes

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates if relevant.
  • Tests if relevant.
  • All breaking changes documented.

@Frando Frando changed the title Gossip net improvements refactor(iroh-gossip): improve connection handling in net module May 21, 2024
@link2xt
Copy link
Contributor

link2xt commented May 21, 2024

includes functionality to make sure that only a single connection prevails between ourselves and another peer

In plain TCP simultaneous connect from both sides results in a single connection if socket is bound to a port and the other side tries to connect to the same port: https://stackoverflow.com/questions/2231283/tcp-two-sides-trying-to-connect-simultaneously
I have looked a bit into QUIC RFC 9000, seems it cannot do anything similar because of connection IDs.

@dignifiedquire dignifiedquire added this to the v0.17.0 milestone May 22, 2024
@Frando Frando force-pushed the gossip-net-improvements branch from ad018a6 to 6229f8a Compare May 22, 2024 23:02
@Frando Frando force-pushed the gossip-net-improvements branch from fda1736 to c8265ee Compare May 22, 2024 23:15
@Frando Frando changed the title refactor(iroh-gossip): improve connection handling in net module refactor: add iroh_net::ConnManager and use in iroh_gossip::net May 22, 2024
@Frando Frando force-pushed the gossip-net-improvements branch from d1c775a to 3b28b28 Compare May 22, 2024 23:27
@Frando Frando force-pushed the gossip-net-improvements branch 2 times, most recently from 0fcb423 to 240eb7a Compare May 22, 2024 23:32
@Frando Frando force-pushed the gossip-net-improvements branch from 240eb7a to f70e519 Compare May 22, 2024 23:33
@Frando Frando force-pushed the gossip-net-improvements branch from c204d78 to 95fdddb Compare May 22, 2024 23:47
@Frando Frando marked this pull request as ready for review May 22, 2024 23:57
Copy link
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet sure about the ConnManager. I'd be more comfortable if we had some sort of "exprimental" cargo feature or something for it maybe.

impl TestEndpointBuilder {
/// Starts local relay and discovery servers and returns a builder to create readily configured endpoints.
///
/// The local relay, DNS and pkarr servers will shut down once the [`TestEndpointBuilder`] is dropped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also make clear these are listening on a random port on localhost only?

}

/// Create a new endpoint builder which already has discovery and relays configured.
pub fn create_endpoint_builder(&self, secret_key: SecretKey) -> crate::endpoint::Builder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply endpoint_builder() is probably a better name?

#[deref]
pub conn: Connection,
/// The node id of the other peer.
pub node_id: NodeId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always get the NodeId from the Connection, so this is storing duplicate information.


/// The error returned from [`ConnManager::poll_next`].
#[derive(thiserror::Error, Debug)]
#[error("Connection to node {} direction {:?} failed: {:?}", self.node_id, self.direction, self.reason)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use the short NodeId? This is done by using the alternate formatting: {:#}. I'm a bit in two minds about it.

/// This is a no-op if the a connection to the node is already active or if we are currently
/// dialing the node already.
///
/// Returns `true` if this is initiates connecting to the node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Returns `true` if this is initiates connecting to the node.
/// Returns `true` if this initiates a new connection attempt.

/// the [`ConnManager`]'s execpected ALPN before passing the connection to the sender.
///
/// If we are currently dialing the node, the connection will be dropped if the peer's node id
/// sorty higher than our node id. Otherwise, the connection will be yielded from the manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// sorty higher than our node id. Otherwise, the connection will be yielded from the manager
/// sorts higher than our node id. Otherwise, the connection will be yielded from the manager

@flub
Copy link
Contributor

flub commented May 23, 2024

Part of me feels this might belong better in the protocol running on top of the provided quic connection rather than in the endpoint layer itself. And to me iroh-net is about the endpoint layer, not the application on top.

I'm a little sceptical that an application does even need support like this. Could it not keep a map of confirmed connections, i.e. when it received data on a connection. Any time a connection receives data for the first time, whether that connection was dialed or accepted, it can check the map and reset the connection if the NodeId already exists. This is like a very opiniated and specific version of this.

@Frando
Copy link
Member Author

Frando commented May 23, 2024

Yes, I'm also not too sure anymore. I thought initially that this could be done without opening any streams, which would put it more into endpoint-layer land. Alas, an extra roundtrip, and thus opening streams, is required for consistent deduplication it seems. Which puts it more into application protocol land.

At the same time it is a quite common usecase for p2p protocols, and offering something easy-to-use would be nice. Sounds a bit like iroh-net-utils or so material.

@dignifiedquire dignifiedquire removed this from the v0.17.0 milestone May 24, 2024
/// Accept a connection.
///
/// This does not check the connection's ALPN, so you should make sure that the ALPN matches
/// the [`ConnManager`]'s execpected ALPN before passing the connection to the sender.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the [`ConnManager`]'s execpected ALPN before passing the connection to the sender.
/// the [`ConnManager`]'s expected ALPN before passing the connection to the sender.

/// the [`ConnManager`]'s execpected ALPN before passing the connection to the sender.
///
/// If we are currently dialing the node, the connection will be dropped if the peer's node id
/// sorty higher than our node id. Otherwise, the connection will be returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// sorty higher than our node id. Otherwise, the connection will be returned.
/// sorts higher than our node id. Otherwise, the connection will be returned.

/// rejects the connection if the peer's node id sorts higher than their own node id.
///
/// To make this reliable even if the dials happen exactly at the same time, a single unidirectional
/// stream is opened, on which a single byte is sent. This additional rountrip ensures that no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly this sending of a byte happens? Did not find it quickly looking through the changes.

Copy link
Member Author

@Frando Frando Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here:

stream.write_all(&[0]).await?;

@rklaehn
Copy link
Contributor

rklaehn commented Jun 28, 2024

I agree something like this is needed. But - how will this play together with the new custom protocols stuff?

The custom protocol handler uses the standard endpoint accept. So maybe this needs to be integrated more deeply into iroh-net or at least into the iroh node?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

5 participants