You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While investigating the impact of long blocks on the p2p layer (#10075), we realized that cometbft assumed that channels and their reactors would process messages quickly / in a non blocking manner (cometbft/cometbft#2685). However they do not (cometbft/cometbft#3250), and because of that the basic processing ping / pong messages over the connection is blocked behind these reactors calls completing (cometbft/cometbft#2533).
The outcome is that when the ABCI app (cosmos)'s EndBlock & CommitBlock take a while, which currently happens because of performance issues in SwingSet, the p2p connectivity suffers, compounding the recovery of the network.
Description of the Design
We don't want to change the ordering of messages received over the peer connection, even though the cosmos seems mostly capable of dealing with a relaxation in the current strict ordering that the in-process tendermint offers (with the relaxation created by our committing client). Instead we want to unblock ping/pong message processing from channel messages.
To that end we can add a goroutine processing the messages that have been read from the connection. The number and aggregate size of pending messages should be bounded.
Security Considerations
A remote peer should remain incapable of consuming large amount of memory or CPU on the local peer. It would be optimal for a remote peer to become disconnected if they do not faithfully participate in the p2p network.
Scaling Considerations
The amount of work to maintain a peer connection should not grow based on the number of messages pending over that connection
Test Plan
Manual testing on a patched follower node
Unit testing of edge conditions when bounds on the queue are reached, and when the connection is closed at particular times (e.g. with pending messages in the queue_
Upgrade Considerations
Requires a chain software upgrade but does not affect consensus
The text was updated successfully, but these errors were encountered:
What is the Problem Being Solved?
While investigating the impact of long blocks on the p2p layer (#10075), we realized that cometbft assumed that channels and their reactors would process messages quickly / in a non blocking manner (cometbft/cometbft#2685). However they do not (cometbft/cometbft#3250), and because of that the basic processing ping / pong messages over the connection is blocked behind these reactors calls completing (cometbft/cometbft#2533).
The outcome is that when the ABCI app (cosmos)'s EndBlock & CommitBlock take a while, which currently happens because of performance issues in SwingSet, the p2p connectivity suffers, compounding the recovery of the network.
Description of the Design
We don't want to change the ordering of messages received over the peer connection, even though the cosmos seems mostly capable of dealing with a relaxation in the current strict ordering that the in-process tendermint offers (with the relaxation created by our committing client). Instead we want to unblock ping/pong message processing from channel messages.
To that end we can add a goroutine processing the messages that have been read from the connection. The number and aggregate size of pending messages should be bounded.
Security Considerations
A remote peer should remain incapable of consuming large amount of memory or CPU on the local peer. It would be optimal for a remote peer to become disconnected if they do not faithfully participate in the p2p network.
Scaling Considerations
The amount of work to maintain a peer connection should not grow based on the number of messages pending over that connection
Test Plan
Manual testing on a patched follower node
Unit testing of edge conditions when bounds on the queue are reached, and when the connection is closed at particular times (e.g. with pending messages in the queue_
Upgrade Considerations
Requires a chain software upgrade but does not affect consensus
The text was updated successfully, but these errors were encountered: