General thoughts about iroh-net #2860
Replies: 3 comments 4 replies
-
Hi @CGamesPlay!
Cool!
There is no real limit on how long a connection can last. Theoretically there are some limits in QUIC because you can only send 2**62-1 packets and a few other similar limits. Practically however it depends on how stable the network is and whether the packets keep being delivered. Iroh configures the QUIC protocol to ping regularly to keep the connections as alive as possible in the face of NAT mappings etc. But maybe there are some timeouts to configure to tweak this a bit more, When a direct network path stops working, iroh should fall back the relay path. I emphasise should because I've recently seen a bug report that suggests our default timeouts might not be connected correctly for this and we didn't address that yet. But the functionality for it is there. And this is an area that will improve when we switch to QUIC Multipath.
Yes, this is kind of a known issue. We haven't done much work to reduce this yet, even if we occasionally spend some effort to de-duplicate some things. We have plans to make discovery mechanisms all optional using cargo features. But sometime we'll have to figure out how to make this smaller.
Closing the endpoint is definitely a rough edge right now. I think it will close on drop, but it is not the graceful close for sure. The choice to make it async is an attempt to nudge folks not to rely on drop, because it is not ideal. The choice to include wait_idle into it is again an attempt at improving usability, but perhaps not that successful yet. We've discussed this recently as well and certainly want to improve things, so feel free to open an issue and explain what you'd like to be able to do wrt to closing. I think that would be helpful.
Yeah, this is also a known issue. The plan is to improve on this before 1.0 as well and there is already #2741 for this. To help this forward it's helpful to create issues for specific APIs where the anyhow error contains several cases which you need to distinguish between when you encounter those. That is the actionable feedback we need to improve the errors. In most of our usages we've found little need to distinguish usually, because the decision on how to handle an error usually just depends on the API returning the error for our code. Which is why errors have stayed on anyhow for now.
If it's an error then you have to assume the thing failed. We don't return errors that are "safe to unwrap" currently.
Sounds interesting, would you mind creating an issue and sharing a DEBUG-level log with of when this happens? If you can write a reproducer for this that'd be great as well.
This is also surprising to me, I kind of assume we do this in our tests regularly. If you could write a minimal reproducer for this and file it as an issue as well that would be great.
Thanks for the thoughtful input! Feedback like this certainly allows us to find out what users experience! |
Beta Was this translation helpful? Give feedback.
-
Oh interesting. I see that it's hard-coded to send keep alives every 1 second and time out after 30 seconds, and the config passed to the builder only modifies the behavior of incoming connections. So my connections won't last longer than 30 seconds of wifi downtime, no matter what I do.
I think you're misunderstanding what I'm getting at. Any "boneheaded" error is "safe to unwrap", because the failure indicates a bug elsewhere in my code. Here are some examples in Iroh where you can safely call
|
Beta Was this translation helpful? Give feedback.
-
What's the relationship between Endpoint::conn_type_stream and Connection::remote_address? It seems like remote_address should be deprecated and replaced with a method that returns a ConnectionType based on the current state of the connection. |
Beta Was this translation helpful? Give feedback.
-
Hello Team Iroh!
A side project of mine is building a VPN-ish on top of QUIC (point-to-point tunneling over QUIC). One of the design goals is supporting ad hoc connections between various machines, and Iroh seemed like a dead fit for this. My PoC previously used quinn to do handshaking between peers with self-signed certificates, and I migrated to iroh-net, which was a +695 -1185 line diff, plus I got relay server and STUN support for free. That's pretty awesome! So, I wanted to give a quick experience report. I hit a handful of problems when using the library, so this may come across critical, but it's given in good faith, and overall my experience was actually quite good.
Here's a bit more detail about my project: you can kind of think of it like "mosh but for port forwards". Users bootstrap a peer connection over SSH or some other channel, but afterwards port forwards happen over a QUIC tunnel. This means that peer discovery isn't relevant to me, but the relay server and STUN support are both useful. When connections between peers cannot be established, I can fall back to the original reliable channel to re-bootstrap it. This leads to my main question: how "long-lived" are connections? Given all the work going into multipath, it feels like I can expect an established connection to last, well, until it times out. But will Iroh switch back to relay servers mid-connection, for example?
I was surprised at the number and size of dependencies that iroh pulled in. Given my design, I disabled some of the features enabling DHT/discovery. Even so, my compilation times rose substantially as a result of migrating from quinn. I haven't checked, but I am also concerned that my final binary size will similarly balloon. I would be interested in knowing if it's possible to further reduce the number of dependencies that iroh pulls in (by disabling features that I don't need).
Endpoint::close is async and fallible. This means you can't close your endpoint in a Drop impl. It also returns
anyhow::Result
so it can fail... somehow? But it movesself
, so the endpoint is gone even if it fails. What? I looked at the source code, and this method corresponds to quinn's close + quinn's wait_idle + an async fallible close of magic socket. This should be split into multiple different methods.On that note, I'm disappointed that everything is an
anyhow::Error
. It basically means that my program cannot do anything meaningful with any error returned from Iroh. But also, as a developer, it means I have no idea what errors I should even expect from Iroh. Are the error possibilities relevant to me, or can I safely callunwrap
? I can't tell withanyhow::Error
. I suspect this one is just "better error handling is on the roadmap", but wanted to call it out as a sticking point for a new user of the project.I also encountered a few things that are almost certainly bugs. First, when you use
bind_addr_v6("[::1]:0".parse().unwrap())
, iroh still returns aNodeAddr
that includes my STUN addresses. I was surprised by this, since I used a loopback address for Iroh, so how it even reached a STUN server is beyond me. Wireshark confirmed that Iroh was not respecting my bind address and sending traffic on my public interface. Second, if I have a connection to myself (same process; different Endpoints), callingleft_conn.close(...); right_conn.closed().await
will never finish. I seeCC
frames in Wireshark, but the remote connection (right_conn
) is never reported closed. Adding a sleep fixes this.Overall, I'm quite pleased with the amount of work that Iroh is doing for me, and I'm excited to test it out in real-world conditions involving NAT. Kudos for the work done so far, and I hope that this report is helpful to the developers and other potential users.
Beta Was this translation helpful? Give feedback.
All reactions