Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(primitives) : Chunk Implementation for Parallel Sender Recovery #5224

Closed
wants to merge 12 commits into from

Conversation

Arindam2407
Copy link
Contributor

@Arindam2407 Arindam2407 commented Oct 30, 2023

Added chunk implementation to TransactionSigned::recover_signers using stream wrappers to handle parallel sender recovery using rayon. Replaced into_par_iter method handling parallel sender recovery for one signed transaction at a time. Partially resolving #5189 issue.

@Arindam2407 Arindam2407 requested a review from gakonst as a code owner October 30, 2023 02:02
@Arindam2407 Arindam2407 changed the title Added Chunk Implementation for Parallel Sender Recovery feat(primitives) : Chunk Implementation for Parallel Sender Recovery Oct 30, 2023
crates/primitives/src/transaction/mod.rs Outdated Show resolved Hide resolved
@onbjerg
Copy link
Member

onbjerg commented Oct 30, 2023

Thanks for taking this on! How does this impl compare perf wise to the current impl on main?

@onbjerg onbjerg added the C-perf A change motivated by improving speed, memory usage or disk footprint label Oct 30, 2023
@rkrasiuk
Copy link
Member

supportive of @onbjerg comments:

  • locks are unnecessary here. the better API would be to return the stream wrapper around receiver channels to guarantee the order of recovered signers
  • we need benchmarks to make sure this is an improvement

@Arindam2407

This comment was marked as outdated.

@Arindam2407

This comment was marked as outdated.

Comment on lines 868 to 870
let chunks = rayon::current_num_threads() * 8;
let chunk_size: usize = (num_txes / chunks) / 8;
let chunk_size = chunk_size.max(2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these constants seem a bit too arbitrary/random

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can definitely change that

Some(recovered_signers)
})
.join()
.unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this unwrap? even if we have to change to Result<Option<...>>

@rakita
Copy link
Collaborator

rakita commented Nov 3, 2023

@onbjerg @rkrasiuk please check the benchmark tests I have added. The chunk implementation outperforms the current into_par_iter method after around 100 transactions.

Just a note here, #[test] are run unoptimized with debug so this measurement is not valid

@Arindam2407 Arindam2407 marked this pull request as ready for review November 4, 2023 20:17
@Arindam2407
Copy link
Contributor Author

Fixed the arbitrary constants. Chunk implementation amended to pass previously failing consensus tests. into_par_iter slightly outperforms chunk impl in unoptimized tests but the overhead of work stealing with into_par_iter might increase with the number of transactions.
photo_2023-11-04_20-28-21

@gakonst
Copy link
Member

gakonst commented Nov 5, 2023

Can you re-run this with mainnet SenderRecovery stage via reth stage run senders and measure the time before/after this PR?

@Arindam2407 Arindam2407 marked this pull request as draft November 8, 2023 17:39
@Arindam2407 Arindam2407 closed this Nov 8, 2023
@Arindam2407 Arindam2407 reopened this Nov 9, 2023
@Arindam2407 Arindam2407 marked this pull request as ready for review November 9, 2023 04:03
@Arindam2407 Arindam2407 marked this pull request as draft November 22, 2023 20:38
@Arindam2407 Arindam2407 reopened this Nov 22, 2023
@gakonst
Copy link
Member

gakonst commented Nov 22, 2023

Hi @Arindam2407 wonder if you had a chance to compare performance before/after?

@Arindam2407
Copy link
Contributor Author

Hi @Arindam2407 wonder if you had a chance to compare performance before/after?

Hi @gakonst, I was not able to run the mainnet SenderRecovery stage because of my device specs. Instead, I ran optimized tests using cargo test --package reth-primitives --release to get more reliable measurement as pointed out by @rakita. I compared the performance before/after (without/with chunks) for recovering 100,000 transactions.

The test:

Additional crates used in mod tests:

use std::{fs::File, io::Write, time::Instant};
use rayon::prelude::*;

    #[test]
    fn benchmark_chunks_parallel_sender_recovery_100000(txes in proptest::collection::vec(proptest::prelude::any::<Transaction>(), *PARALLEL_SENDER_RECOVERY_THRESHOLD * 20000)) {
        let mut rng =rand::thread_rng();
        let secp = Secp256k1::new();
        let txes: Vec<TransactionSigned> = txes.into_iter().map(|mut tx| {
             if let Some(chain_id) = tx.chain_id() {
                // Otherwise we might overflow when calculating `v` on `recalculate_hash`
                tx.set_chain_id(chain_id % (u64::MAX / 2 - 36));
            }

            let key_pair = KeyPair::new(&secp, &mut rng);

            let signature =
                sign_message(B256::from_slice(&key_pair.secret_bytes()[..]), tx.signature_hash()).unwrap();

            TransactionSigned::from_transaction_and_signature(tx, signature)
        }).collect();

        let mut file = File::create("benchmark.txt")?;

        let start_chunk = Instant::now();
        let _par_chunk_senders = TransactionSigned::recover_signers(&txes, txes.len());
        let elapsed_chunk = start_chunk.elapsed();

        let start = Instant::now();
        let _par_senders = txes.into_par_iter().map(|tx| tx.recover_signer()).collect::<Option<Vec<_>>>();
        let elapsed = start.elapsed();

        write!(file, "Time taken for parallel recovery of 100000 addresses: {:.2?} (with chunks) and {:.2?} (without chunks)", elapsed_chunk, elapsed)?;

    }

The results:
benchmark_run_1
benchmark_run_2
benchmark_run_3
benchmark_run_4
benchmark_run_5

Chunk implementation takes 20-30% less time than the current implementation in all runs. It would be great if anyone else could run it and compare the performance. Since they are all optimized test runs, I think the estimates are reliable.

@Arindam2407 Arindam2407 marked this pull request as ready for review November 23, 2023 03:58
@Arindam2407 Arindam2407 requested a review from joshieDo November 23, 2023 04:10
Arindam2407 and others added 6 commits November 25, 2023 05:14
    Updating git repository `https://github.com/bluealloy/revm`
    Updating git repository `https://github.com/sigp/discv5`
    Updating git repository `https://github.com/stevefan1999-personal/rust-igd`
    Updating boyer-moore-magiclen v0.2.18 -> v0.2.19
    Updating hashbrown v0.14.2 -> v0.14.3
    Updating proc-macro2 v1.0.69 -> v1.0.70
chore(deps): weekly `cargo update`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-perf A change motivated by improving speed, memory usage or disk footprint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants