Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Peer Discovery Failing #206

Closed
wants to merge 2 commits into from
Closed

Conversation

hmzakhalid
Copy link
Member

@hmzakhalid hmzakhalid commented Dec 12, 2024

Peer Discovery fails through bootstrap despite successful individual connections.

  • Switched to TCP from QUIC

Summary by CodeRabbit

  • New Features

    • Integrated TCP protocol support for multiple services.
    • Enhanced libp2p dependency with additional features for improved networking capabilities.
  • Bug Fixes

    • Improved error handling and logging for network connection issues.
  • Refactor

    • Updated method signatures and event processing logic for better performance and clarity.

Copy link
Contributor

coderabbitai bot commented Dec 12, 2024

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes in this pull request involve updates to the docker-compose.yml, Cargo.toml, and network_peer.rs files. The docker-compose.yml file modifies the network protocol for several services from UDP to TCP. The Cargo.toml file adds new features to the libp2p dependency. Lastly, the network_peer.rs file integrates new protocols, enhances error handling, and refines event processing logic, notably transitioning from QUIC to TCP for the NetworkPeer struct.

Changes

File Change Summary
docker-compose.yml Updated service ports for cn1, cn2, cn3, and aggregator from udp to tcp.
packages/ciphernode/net/Cargo.toml Added features "tcp", "noise", and "yamux" to the libp2p dependency.
packages/ciphernode/net/src/network_peer.rs Updated imports for libp2p, transitioned NetworkPeer from QUIC to TCP, enhanced gossipsub config, improved error handling, and refactored process_swarm_event method.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant NetworkPeer
    participant Swarm
    participant Kademlia

    Client->>NetworkPeer: Connect
    NetworkPeer->>Swarm: Initialize TCP
    Swarm->>Kademlia: Setup Kademlia config
    Kademlia-->>Swarm: Configuration complete
    Swarm-->>NetworkPeer: Ready to process events
    NetworkPeer->>Client: Connection established
Loading

🐰 "In the land of code, where changes abound,
TCP hops in, making a sound.
With features anew, the network's aglow,
A rabbit's delight in the flow!
Error handling's sharper, events now refined,
Hooray for the updates, so well-designed!" 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hmzakhalid hmzakhalid marked this pull request as draft December 12, 2024 20:01
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
packages/ciphernode/net/src/network_peer.rs (1)

144-146: Review Gossipsub mesh parameters for optimal network performance

The mesh parameters in the Gossipsub configuration are set to:

  • mesh_n(3)
  • mesh_n_low(2)
  • mesh_outbound_min(1)

These values determine the mesh's resilience and message propagation efficiency. Ensure these settings align with your network size and desired performance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44aac5b and 5110189.

⛔ Files ignored due to path filters (1)
  • packages/ciphernode/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • docker-compose.yml (4 hunks)
  • packages/ciphernode/net/Cargo.toml (1 hunks)
  • packages/ciphernode/net/src/network_peer.rs (9 hunks)
🔇 Additional comments (9)
packages/ciphernode/net/src/network_peer.rs (7)

8-11: Imported modules for TCP, Noise, and Yamux look appropriate

The addition of tcp, noise, and yamux modules aligns with the transition from QUIC to TCP, enabling the necessary transports and protocols.


54-58: Swarm initialization with TCP, Noise, and Yamux is correct

The configuration correctly initializes the Swarm to use TCP transport with Noise encryption and Yamux multiplexing. This setup is essential for secure and multiplexed TCP connections.


88-89: Listening address updated to TCP protocol

Updating the listening address to /ip4/0.0.0.0/tcp/{port} reflects the switch to TCP and ensures the node listens on the correct protocol.


98-100: Improved error handling for listen failures

Good practice in logging a warning when the swarm fails to listen on the specified address. This aids in debugging and monitoring.


119-119: Updated event processing call aligns with refactored function

Passing self to process_swarm_event matches the updated function signature and allows for more streamlined access to the swarm and other components.


Line range hint 181-264: Refactored event processing enhances code clarity

Changing process_swarm_event to accept &mut NetworkPeer simplifies access to the swarm and associated components within the function, improving code readability and maintainability.


132-135: Verify the protocol identifiers in Identify and Kademlia configurations

Ensure that the protocol strings "/ipfs/kad/1.0.0" used in the Identify and Kademlia configurations are correct and compatible with the peers in your network. Using consistent protocol identifiers is crucial for proper network communication.

Run the following script to check for consistency of protocol identifiers across the codebase:

✅ Verification successful

The initial search showed matching protocol identifiers, but let's verify there aren't any other variants or definitions that might conflict with these. This will help us ensure complete consistency across the codebase.


Protocol identifiers are consistent in the codebase

The verification shows that the protocol identifier "/ipfs/kad/1.0.0" is used consistently in both the Identify and Kademlia configurations within the same file. No conflicting or inconsistent protocol identifiers were found across the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for protocol identifiers in Identify and Kademlia configurations.

# Search for protocol identifiers in Identify configurations
rg 'identify::Config::new\("(.+?)"' packages/

# Search for protocol identifiers in Kademlia configurations
rg 'kad::Config::new\(StreamProtocol::new\("(.+?)"' packages/

Length of output: 348


Script:

#!/bin/bash
# Let's search for any other protocol identifiers in the codebase to ensure consistency
rg -i 'protocol.*"/ipfs/' packages/

# Also search for any configuration or constant definitions that might define these protocols
rg 'const.*PROTOCOL' packages/
rg 'static.*PROTOCOL' packages/

Length of output: 227

packages/ciphernode/net/Cargo.toml (1)

26-28: Included TCP, Noise, and Yamux features for libp2p dependency

Adding "tcp", "noise", and "yamux" to the libp2p features is necessary for enabling TCP transport, secure communication with Noise protocol, and multiplexing with Yamux. This change supports the switch from QUIC to TCP.

docker-compose.yml (1)

15-15: Ports updated to use TCP protocol

Changing the protocol from udp to tcp for services cn1, cn2, cn3, and aggregator ensures that the Docker services expose the correct ports for TCP communication, consistent with the changes in the application code.

Also applies to: 38-38, 60-60, 83-83

Comment on lines +216 to +219
warn!("Failed to redial peer {peer_id}: {e}");
} else {
info!("Redialing peer {peer_id}...");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Implement backoff strategy for redialing failed peers

Currently, when a dial attempt fails, the code immediately tries to redial the peer. This could lead to rapid repeated dial attempts in case of persistent failures.

Consider adding an exponential backoff or limiting the number of redial attempts to prevent potential resource exhaustion and to comply with polite network behavior.

Apply this diff to implement a simple backoff mechanism:

+use tokio::time::{sleep, Duration};
...
+                    let backoff_delay = Duration::from_secs(5);
                     if let Err(e) = network_peer.swarm.dial(multiaddr.clone()) {
                         warn!("Failed to redial peer {peer_id}: {e}");
                     } else {
                         info!("Redialing peer {peer_id}...");
+                        sleep(backoff_delay).await;
                     }

Committable suggestion skipped: line range outside the PR's diff.

@ryardley
Copy link
Contributor

In regards to this I have managed to get things mostly working over here: https://github.com/ryardley/libp2p-kad-gossipsub-quic-example/tree/main next steps might be to integrate with the net package in a different PR.

@hmzakhalid hmzakhalid closed this Dec 19, 2024
@hmzakhalid hmzakhalid deleted the hmza/swarm-network-issues branch December 19, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants