-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sudo_network_unstable_watch
#91
Conversation
"event": "missedEvents" | ||
} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it also be beneficial to have an optional even reported for cases of miss-behaving / reputation banning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The networking in general has a shit-ton of concepts. Beyond reputations, you could also think of reporting ping times, handshake and request sizes, k-buckets states, grandpa neighbor packets, block announces, etc.
This PR focuses on just open connections and substreams because 1) these concepts are easy to understand 2) we know for sure that Polkadot will never not have connections and substreams.
Co-authored-by: Alexandru Vasile <[email protected]>
Co-authored-by: Josep M Sobrepere <[email protected]>
{ | ||
"event": "substreamState", | ||
"connectionId": "...", | ||
"substreamId": "...", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question to make sure I got this right: we want to report all open-streams (ie /ipfs/ping
, /ipfs/indentify
, /[hash]/kad
) together with notification protocols, not just notification-protocols (/[hash]/tx/1
etc)?
For notification-protocols, we could easily generate a substreamID
when we establish a connection just before the handshake is done. As for getting all substreams, I believe we'd need to interface substrate somehow with yamux
implementation directly, and we'd most probably end-up either:
- rewriting libp2p-yamux wrapper over the yamux crate
- crate a shim wrapper over libp2p-yamux, but patch latest libp2p version to expose the inner yamux stream which contains the stream ID into: pub Stream( pub yamux::Stream)
And since we could change the muxer
in the future, for example changing the transport to quic, wouldn't the substreamID feel like an implementation detail? (Since we'd need to find a workaround to expose the substreamID from the transport level in that case) 🤔
Let me know if I got this right, or if we could achieve the same behavior easier 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you just need a wrapper (a "middleware") of the StreamMuxer
trait, no need to touch Yamux itself. Every type of connection implements StreamMuxer
and is only ever accessed through this trait by the rest of the stack.
However what I didn't realize is that getting the protocol name is kind of tricky, because it is done by sending messages within the substream.
Substreams themselves don't inherently have protocol names, it's just that in libp2p every single substream starts with a handshake that negotiates its protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Substreams themselves don't inherently have protocol names, it's just that in libp2p every single substream starts with a handshake that negotiates its protocol.
Which also means that this RFC needs to clarify what to do with substreams whose protocol hasn't been negotiated yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these meant to be implemented for libp2p protocols in the first place? Substrate may have no knowledge that a substream for Kademlia or PING has been opened so there would have to some tracking implemented in libp2p that would expose this information to Substrate.
I don't see the problem with protocol names though. Once the protocol has been negotiated, the name is known and can be reported. If there was a separate "opening" state, then it would be more challenging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these meant to be implemented for libp2p protocols in the first place? Substrate may have no knowledge that a substream for Kademlia or PING has been opened so there would have to some tracking implemented in libp2p that would expose this information to Substrate.
Substrate entirely wraps around all the libp2p protocols: https://github.com/paritytech/polkadot-sdk/blob/a56fd32e0affbe9cb61abb5ee3ec0a0bd15f5bf0/substrate/client/network/src/behaviour.rs#L45-L47
I think that you can do it by:
- Adding a wrapper around the
ConnectionHandler
trait that intercepts on_connection_event. - Adding a wrapper around the
NetworkBehaviour
trait so that you can wrap around theConnectionHandler
In your NetworkBehaviour
wrapper, you can modify the ToSwarm
type to include events about substreams being opened/closed, and then intercept these events in the main network events handler.
I haven't touched these traits in years now, so I might be missing something, but I think it should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can do it
Adding a wrapper around the ConnectionHandler trait that intercepts on_connection_event.
Adding a wrapper around the NetworkBehaviour trait so that you can wrap around the ConnectionHandler
From looking at the code, my findings also point in this direction. Thanks!
I started to implement a BehaviorWrapper
a few days ago, which looked like this:
pub struct BehaviorWrapper<B: BlockT> {
behavior: Behavior<B>,
}
impl NetworkBehavior for ... { }
This was the first step to expose the ConnectionID
since the SwarmEvents from our libp2p version does not expose it at the moment.
I gave up on going that path because of two reasons:
- feat(swarm): expose
ConnectionId
and add conn duration metric libp2p/rust-libp2p#3927 which adds it in the p2p crate, and Upgrade libp2p to 0.52.4 polkadot-sdk#1631 which updates the p2p crate including the connection ID - the amount of methods I had to re-export from the underlying Behavior and the overhead of reimplementing the
NetworkBehavior
felt like bad code for the sole benefit of theConnectionID
However, I cannot think of a better way than what Pierre suggested:
pub struct BehaviorWrapper<B: BlockT> {
behavior: Behavior<B>,
}
/// Must propagate the connection ID for proper identification of
/// connection IDs + streamIDs
pub struct HandlerWrapper {
connectionID: ...,
..
}
impl NetworkBehavior for BehaviorWrapper {
type ConnectionHandler = HandlerWrapper;
fn handle_established_inbound_connection(connectionId, ..) {
HandlerWrapper::new(connectionId, is_inbound=true, ...)
}
fn handle_established_outbound_connection(connectionId, ..) {
HandlerWrapper::new(connectionId, is_inbound=false, ...)
}
...
}
impl ConnectionHandler for HandlerWrapper {
// Might have missed something here
// types = tbd / maybe other wrappers here
fn on_connection_event(..) {
// Generate a stream ID
stream_id = AtomicIndex::increment();
// Propagate up to the swarm
MetricsEventWrapper::CustomProtocolOpen { connectionId, streamId, protocolName, direction, .. }
}
This feels indeed a bit involved and I'd resume implementation once the substrate-p2p refactor is complete:
- Rework the event system of
sc-network
polkadot-sdk#1370 - Prevent opening another substream right after the connection is closed polkadot-sdk#1563
- Upgrade libp2p to 0.52.4 polkadot-sdk#1631
Since I expect most of this work will generate conflicts with those PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix of the function in this PR is
sudo_network
, so it's only thesudo_network
that aren't implemented.
Ah ok, I see that now; the other groups all have a singe word like chainHead_
or chainSpec_
or archive_
, but we have sudo_
, sudo_sessionKeys_
and now sudo_network_
as three more groups, which I was confused by since it's a bit inconsistent in my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit inconsistent in my mind.
Yeah it is, because ultimately the sudo
functions should be renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the PoC/prototype for this code turns out to add non-trivial complexity to sc-network
, I hope you reconsider adding this feature.
The libp2p upgrades are really painful to deal with and I would hope to reduce the surface area of libp2p in sc-network
to leave us more time to work on things other than dealing with libp2p and its upgrades. That looks to be at odds with the proposed wrapper implementation and my concern is that if this wrapper adds more code that has to be maintained during each libp2p upgrade, it just leaves us with less time to work on actually improving the Polkadot networking code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't answer that, but I didn't intend this to be implemented in Substrate quickly.
The reason for this function comes from a very pragmatic problem: most of the time, when smoldot isn't working, it's because of a connectivity issue.
When you're running a Substrate node of some kind, you're either a developer or a node operator. You can read logs, run Prometheus, open GitHub issues, etc.
When you're running smoldot, you're an end user, and you shouldn't be reading logs. The only reasonable way to help end users figure out why smoldot isn't working is put some kind of UI on top of it. This JSON-RPC function is what would power this UI.
As a heads up, I'm going to implement this in smoldot (merged or not). It's one of the milestones that I've put in the smoldot financing treasury proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see this merge; it's unstable anyway, and probably mostly useful in Smoldot and perhaps less so in Substrate if I understand right!
I have reviewed this PR and I think that it's ready to be merged. Also, I can't wait to leverage this from polkadot-api and it seems that As @tomaka already explained: this API is mostly meant for light-clients and it lives in its own group of functions. Meaning that a simple query to |
I've encountered an issue while implementing this in smoldot due to a corner case. Let's say that you start an instance of smoldot and add the Polkadot and Kusama chains, and add the same IP address and PeerId as bootnode at the same time to both Polkadot and Kusama. When I wrote this RFC, smoldot would open networking connections for Polkadot and networking connections for Kusama. That's what is reflected in this RFC. In that example scenario, smoldot would open two connections to the same IP address and PeerId, and thus would report through the JSON-RPC function only events that concern the connection of the chain corresponding to that JSON-RPC subscription. However, smoldot now supports sharing connections. In that example scenario, smoldot would only open one connection and use the same connection for both Polkadot and Kusama. This leads to a tricky question: should smoldot, in that case, generate an event to the Kusama JSON-RPC subscription that the connection has been closed, even though it hasn't actually been closed? Doing so would be the logical thing to do, but would be misleading. The objective of this JSON-RPC function is to show to the user what smoldot is doing, and reporting events that don't match the reality (even when logically coherent) goes against this objective. An alternative would be to report every single connection to every subscription, no matter if this connection is going to be used for the chain the subscription corresponds to. This is also misleading and spammy. I don't really know right now how to solve that problem. |
I'm inclined to think that smoldot should generate an event for the Kusama JSON-RPC subscription indicating the connection is closed, focusing on the "logical connection". From the API consumer's perspective, the connection is effectively "closed" once Smoldot ceases communication with the Kusama node, regardless of the physical connection status. This approach would deem the persistence of the physical connection as an implementation detail. Given the API is primary intended to be implemented by light-clients, it would be beneficial to update the specification to clearly differentiate between "logical" and "physical" connections, emphasizing that the API's notifications pertain to the "logical connection". |
This is a revival of paritytech/smoldot#2245
This new function lets the JSON-RPC client subscribe to the networking state of the server, which allows visualizing what is happening, which in turn should be useful for debugging and teaching purposes.
One opinionated decision I took is that the JSON-RPC server might omit some events if the events are generated too quickly. For example, if a peer-to-peer connection opens then closes repeatedly 1000 times per second, we don't have to send 2000 events per second to the JSON-RPC client and instead we're allowed to not send anything.
This means that the JSON-RPC client might not be aware of everything that's happening. There's an event to notify it when that happens so that a warning or something can be shown to the user.
An alternative would be to report everything and generate a
stop
event if the queue is too large, but that's not much better, because the JSON-RPC client wouldn't be able to know what happens between the moment when thestop
happens and the moment when it resubscribes. In other words, this wouldn't solve the problem and is more complicated.There's no possible way (CAP theorem) to guarantee that the JSON-RPC client will receive events about everything that happens, unless you slow down the networking of the node to the speed of the JSON-RPC client. This isn't something that we want to happen, normally. Maybe we could imagine a privileged mode where that is the case, but I haven't included this in this PR at the moment because it seems overkill.