Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(trin-execution): execute multiple blocks in memory + able to execute to merge #1337

Merged
merged 11 commits into from
Sep 8, 2024

Conversation

KolbyML
Copy link
Member

@KolbyML KolbyML commented Jul 4, 2024

What was wrong?

fixes #1333

Execution started slowing down around block 2.34 million, to resolve this issue I add revm::db:State which is a middleware between the database interface and EVM. It makes it so whole blocks can be processed in memory. Instead of as before where we would commit per every transaction. This saves a lot of reads to the database. revm::db::State also gives us reverts as well which is cool

We also lacked some metrics, this PR will add some not all proposed metrics to add like

  • transaction performance
  • block height

How was it fixed?

add the ability to execute multiple blocks in memory, this makes execution a lot faster because then you don't need to call storage as much which is slow

@KolbyML KolbyML self-assigned this Jul 4, 2024
@KolbyML KolbyML force-pushed the why-transaction-so-slow branch from 08b1c18 to a5788ec Compare July 4, 2024 12:29
@KolbyML KolbyML force-pushed the why-transaction-so-slow branch 2 times, most recently from 7959312 to 8eeb65f Compare July 16, 2024 19:34
@KolbyML KolbyML force-pushed the why-transaction-so-slow branch 6 times, most recently from 9d6d28a to 8ed4529 Compare August 13, 2024 02:23
@KolbyML KolbyML changed the title feat(Trin Execution): Add execution cache + foundation for reverts + more metrics feat(trin-execution): execute multiple blocks in memory + able to execute to merge Aug 13, 2024
@KolbyML KolbyML marked this pull request as ready for review August 13, 2024 02:33
@KolbyML KolbyML requested a review from morph-dev August 13, 2024 02:33
@KolbyML KolbyML force-pushed the why-transaction-so-slow branch from 8ed4529 to 7d01bb4 Compare August 13, 2024 02:48
@KolbyML KolbyML force-pushed the why-transaction-so-slow branch 5 times, most recently from fce4ca0 to 41775cf Compare August 29, 2024 22:11
@KolbyML
Copy link
Member Author

KolbyML commented Aug 30, 2024

@morph-dev if you can review this one now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing ExecutionPosition breaks encoding/decoding from previous version.
I think we are ok with this as I believe it's not being used by anyone other than yourself, right? (I think state bridges use only temp db and don't continue previous executions)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah only I am using this right now, in the future when people other then me use this, we would need to properly handle different versions

stop_timer(timer);
let timer = start_timer_vec(&BLOCK_PROCESSING_TIMES, &["processing_block"]);
if block.header.number == 0 {
let end =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this logic.

  • What's wrong with having fixed number of blocks to process (could even be configurable with a flag).
  • What are we trying to accomplish and why?
  • Why use 86 (what is special about it)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this logic.

  • What's wrong with having fixed number of blocks to process (could even be configurable with a flag).
  • What are we trying to accomplish and why?
  • Why use 86 (what is special about it)?

I think it was a sweet, spot for getting over the dao fork, but since we have the early exit code now, I can remove this

impl Database for EvmDB {
type Error = EVMError;

#[doc = " Get basic account information."]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need this? Wouldn't this be inherited from trait? (also below and in DatabaseRef)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the docs they were auto filled in, when I clicked the fill in defaults button

fn basic_ref(&self, address: Address) -> Result<Option<AccountInfo>, Self::Error> {
let _timer = start_timer_vec(&TRANSACTION_PROCESSING_TIMES, &["database_get_basic"]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not sure what we do in other places and if this is recommended way of doing it, but I have small concern.

My understanding is that _timer will record time when it's dropped. However, since it's not used, can compiler decide to drop it immediately (not at the end of the function)? How about future rust compilers, do we have guarantees on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will just explicitly end it

stop_timer(timer);

if !raw_account.is_empty() {
let decoded_account: RocksAccount =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can't this be: let decoded_account = RocksAccount::decode(&mut raw_account.as_slice())?;

});
stop_timer(timer);

if wipe_storage && rocks_account.is_some() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be some overlapping logic with what is happening in the commit_accounts. Maybe move "wipe_storage" logic into AccountDB?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are 2 distinct cases.

One we are

  • deleting an account from the database
  • the other, we are clearing storage, but not deleting the account

Copy link
Member Author

@KolbyML KolbyML Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let mut trie = EthTrie::from(Arc::new(account_db), rocks_account.storage_root)?;
trie.clear_trie_from_db()?;

I guess I can move this into a function under the accountdb struct if that is what you meant, seems a little overkill though for 2 lines

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. But I think most of other logic is still reusable. For example, function like this:

fn delete_account_storage(address_hash: B256, delete_account: bool)

Could do something like this:

  1. loading and decoding the RocksAccount
    • skip to 4. if not present
  2. initializing AccountDB
  3. initialize and clear storage trie
  4. update self.trie and storage in self.db (for address_hash key)
    1. if delete_account is true, delete from both
    2. otherwise, set storage_root to EMPTY_ROOT_HASH and update both

It would also include timer for this operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this plan looks good, but I won't do number one cause I see we can save a storage fetch now in the if !storage.is_empty() {

@@ -346,6 +431,18 @@ impl State {
}
}

pub fn should_we_commit_block_execution_early(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe explain why we use these conditions.

also maybe rename changes_processed to pending_cache_size_mb or something similar

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was wrong. The bundle_size_hint is not in bytes, it is approximation on number of items that should be updated.

So maybe change it to pending_state_changes? What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that State should use era_manager directly. So, how about we have following functions:

pub async fn process_block(&mut self, block: ProcessedBlock) -> anyhow::Result<RootWithTrieDiff> {
    self.process_blocks([block]).await
}

pub async fn process_blocks(&mut self, blocks: impl IntoIter<ProcessedBlock>) -> anyhow::Result<RootWithTrieDiff> {
    ...
}

And whoever calls it can get those blocks from whereever they want. Since process_blocks accepts IntoIter, they can also be lazily fetched when needed. I think that EraManager can easily implement this trait.

What do you think?

Copy link
Collaborator

@morph-dev morph-dev Sep 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized it might not be so easy as EraManager is async, but maybe we can make it something like:
Iterator<Item=T: Future<Result<ProcessedBlock, _>>> or something like that. I will have to look more into it, but I don't have time right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the trait implied was IntoIterator

Other then that when I tried

Iterator<Item=T: Future<Result<ProcessedBlock, _>>>

I got
the trait `Sized` is not implemented for `dyn Future<Output = std::result::Result<ProcessedBlock, anyhow::Error>> + Send

This seems a little more complex so I will wait till you have some time to look into it more

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think?

We have 2 different paradigms for executing blocks

  • (1) executing to latest
  • (2) once at latest following the head of the chain, properly handling re-orgs, receiving execution payloads from the CL client

1 is basically finalized, well I do have to implement support for withdrawls, that is like 10 lines (unless something I didn't expect happens).

2 is uncharted territory for me currently I am not 100% how it will look yet, since I haven't fully gotten their yet.

I think case 2 is more complex then case 1, because currently we are just grabbing blocks,

But in case 2 the CL client tells us what blocks to execute whenever it wants to, so it makes us more reactive, where case 1 we are driving.

So I guess this all comes down to a driving analogy, case 1 we are driving execution, case 2 the CL is driving execution.

Because case 2 is a paradigm shift it may require a refactor or 2, but I would consider both of these are separate cases, I think it will be interesting to define how we handle bridging state data in one mode verses the other.

Definitely some more work to accomplish

Copy link
Member Author

@KolbyML KolbyML Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think?

I am not against this direction necessarily, I am open to do it. Until both case 1 and 2 is implemented or I am more familiar with the scope of stage 2, I am not sure if I can comment fully on this at this stage, but I am fine with us implementing this suggestion now and then seeing whatever has to be done in the future as needed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to find a way around AsyncIterator. But there are some other stuff about current design that I don't like, so lets first discuss those (in which case we might not need to worry about AsyncIterator).

I really don't like that process_range_of_blocks accepts range of blocks, but is supposed to fetch them itself, and it can also stop early (at arbitrary point) and not finish executing all of them... Considering also what you said about (2), I think we should do something like this.

Remove process_range_of_blocks, but keep the functionality that it provides. We do this by splitting it into 3 parts

  1. Initialization (part before the loop) - will be stored in a field (e.g. execution_context)
    • this will include variables that are currently shared across loop iterations (evm, cumulative_gas_used, cumulative_gas_expected)
  2. Loop - this will be done in a process_block function (that will as before return RootWithTrieDiff)
    • unlike before, this will not always commit to the dp, instead it will only modify execution_context
  3. Commit (part after loop) - This will be a separate function, that will commit things to the database
    • it can be called manually from outside
    • the process_block can decide to commit at the start/end if should_we_commit_block_execution_early returns true
      • alternatively, process_block would never commit but there will be public function (e.g. should_commit) that caller can use to decide when to commit

I think this makes this file very flexible for future work (while having less responsibility) and giving more control to the caller.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look more into this when I wake up after the meeting, but I just wanted to note

Loop - this will be done in a process_block function (that will as before return RootWithTrieDiff)
unlike before, this will not always commit to the dp, instead it will only modify execution_context

process_block function (that will as before return RootWithTrieDiff)

With the suggested setup you wouldn't be able to return RootWithTrieDiff, as that inherently means committing to the database, so this suggestion conflicts with itself in a way

In this case step 3 commit bundle to database function would return RootWithTrieDiff

Anyways with that pointed out, I will look more into this after the meeting 🫡

Copy link
Member Author

@KolbyML KolbyML Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also getting RootWithTrieDiff is very costly extremely so.

I imagine this will look like

let execution_context = init();
for current to latest_blocks {
    let block = era_manager.get_next_block();
    process_block(&mut execution_context, block);
    if should_we_exit_early { break }
}
let RootWithTrieDiff = commit_state_bundle(execution_context);

So it doesn't seem too different then what is currently being done it is a little more encapsulated into parts which I assume is the point

keccak256(B256::from(U256::from(block.header.number))),
block.header.hash(),
)?;
if block.header.number >= 8192 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make 8192 a constant that has a name and description explaining it's value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be 256


// Commit the bundle if we have reached the limits, to prevent to much memory usage
// We won't use this during the dos attack to avoid writing empty accounts to disk
if !(block_number < 2_700_000 && block_number > 2_200_000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this check more readable with either:

if block_number < 2_200_000 || block_number > 2_700_000 {...
// or
if !(2_200_000..2_700_000).contains(block_number) {

@morph-dev morph-dev self-requested a review September 3, 2024 11:15
});
stop_timer(timer);

if wipe_storage && rocks_account.is_some() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. But I think most of other logic is still reusable. For example, function like this:

fn delete_account_storage(address_hash: B256, delete_account: bool)

Could do something like this:

  1. loading and decoding the RocksAccount
    • skip to 4. if not present
  2. initializing AccountDB
  3. initialize and clear storage trie
  4. update self.trie and storage in self.db (for address_hash key)
    1. if delete_account is true, delete from both
    2. otherwise, set storage_root to EMPTY_ROOT_HASH and update both

It would also include timer for this operation.

.expect("Clearing trie should never fail");
let mut trie = EthTrie::from(Arc::new(account_db), rocks_account.storage_root)?;
trie.clear_trie_from_db()?;
rocks_account.storage_root = keccak256([EMPTY_STRING_CODE]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be EMPTY_ROOT_HASH (also in other places)

Self {
nonce: account_info.nonce,
balance: account_info.balance,
storage_root: keccak256([EMPTY_STRING_CODE]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand now why we were overriding the storage_root. But, I would change this.
Instead of implementing From<AccountInto> for Account, I think it's better to do one of:

  1. just have a function that accepts both account_info and strorage_root: fn from_account_info(account: AccountInfo, storage_root: B256) -> Self
  2. have a function to override values, e.g:
fn override_from_account_info(&mut self, account_info: AccountInfo) {
    self.balance = account_info.balance;
    self.nonce = account_info.nonce;
    self.code_hash = account_info.code_hash;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I choose 1

address_hash: B256,
account_info: AccountInfo,
) -> anyhow::Result<()> {
let plain_state_some_account_timer = start_timer_vec(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we are using timers in a lot of places and it usually takes multiple lines making code harder to read. If I'm not mistaken, we only use it with BUNDLE_COMMIT_PROCESSING_TIMES and
TRANSACTION_PROCESSING_TIMES in this file. How about we create helper functions:

fn start_commit_timer(&self, name: &str) -> HistogramTimer { ... }

fn start_processing_timer(&self, name: &str) -> HistogramTimer { ... }

@@ -346,6 +431,18 @@ impl State {
}
}

pub fn should_we_commit_block_execution_early(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was wrong. The bundle_size_hint is not in bytes, it is approximation on number of items that should be updated.

So maybe change it to pending_state_changes? What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to find a way around AsyncIterator. But there are some other stuff about current design that I don't like, so lets first discuss those (in which case we might not need to worry about AsyncIterator).

I really don't like that process_range_of_blocks accepts range of blocks, but is supposed to fetch them itself, and it can also stop early (at arbitrary point) and not finish executing all of them... Considering also what you said about (2), I think we should do something like this.

Remove process_range_of_blocks, but keep the functionality that it provides. We do this by splitting it into 3 parts

  1. Initialization (part before the loop) - will be stored in a field (e.g. execution_context)
    • this will include variables that are currently shared across loop iterations (evm, cumulative_gas_used, cumulative_gas_expected)
  2. Loop - this will be done in a process_block function (that will as before return RootWithTrieDiff)
    • unlike before, this will not always commit to the dp, instead it will only modify execution_context
  3. Commit (part after loop) - This will be a separate function, that will commit things to the database
    • it can be called manually from outside
    • the process_block can decide to commit at the start/end if should_we_commit_block_execution_early returns true
      • alternatively, process_block would never commit but there will be public function (e.g. should_commit) that caller can use to decide when to commit

I think this makes this file very flexible for future work (while having less responsibility) and giving more control to the caller.

What do you think?

@KolbyML KolbyML force-pushed the why-transaction-so-slow branch from b7b0d56 to d65163b Compare September 5, 2024 09:57
@KolbyML
Copy link
Member Author

KolbyML commented Sep 6, 2024

@morph-dev ready for another review.

I did a refactor, so the execution loop code should be a lot more readable now and concise, going through the code it was pretty messy.

I am a little confused what you fully meant with #1337 (comment) But I did my best to isolate it into 3 parts like you said also adding the execution_context struct.

Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look very good. No major issue, but bunch of smaller ones and few medium.

};
let account_db = AccountDB::new(address_hash, self.db.clone());
let trie = if account.storage_root == keccak256([EMPTY_STRING_CODE]) {
let trie = if account.storage_root == EMPTY_ROOT_HASH {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can probably be simplified to something like:

let raw_value = if account.storage == EMPTY_ROOT_HASH {
    None
} else {
    let trie = EthTrie::from(Arc::new(account_db), account.storage_root)?;
    trie.get(keccak256(B256::from(index)).as_ref())?
};
let result = match raw_value {
    Some(raw_value) => ...,
    None => ...,
};

meaning, we don't have to create empty EthTrie and get a value from it.


// Write Contract Code
let timer = start_commit_timer("contract:committing_contracts_total");
for (hash, bytecode) in plain_state.contracts {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do you know if this includes only newly created contracts?

Also, do we ever delete the contracts from the db? It might not be big deal, but we are leaking memory if not. I would at least create an issue to track it, as we can't easily do it at the moment (because db key is their hash, and we don't know if there are multiple contracts with the same hash).

Copy link
Member Author

@KolbyML KolbyML Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do you know if this includes only newly created contracts?

Yes this only returns newly created contracts.

Also, do we ever delete the contracts from the db? It might not be big deal, but we are leaking memory if not. I would at least create an issue to track it, as we can't easily do it at the moment (because db key is their hash, and we don't know if there are multiple contracts with the same hash).

Also, do we ever delete the contracts from the db

I think only if a contract is self destructed, but I don't think that is common

It might not be big deal, but we are leaking memory if not. I would at least create an issue to track it, as we can't easily do it at the moment (because db key is their hash, and we don't know if there are multiple contracts with the same hash).

Because of the way REVM is designed, I don't think this is a problem, especially with self destruct being or about to be removed

The only way to do this would be to keep the active count's of every contract which seems like a lot of work

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree it's probably too much work for little effort. I still think it's good to document the issue (e.g. creating github issue and adding link to it in the code).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1428 here is an issue for it. I don't think it will ever be fixed though as their is just so many more important things to improve, but I guess this is good for transparency so future people know this is a non-existent priority

rocks_account: RocksAccount,
delete_account: bool,
) -> anyhow::Result<Option<RocksAccount>> {
let timer_label = match delete_account {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This timer can be extracted to the caller function, where names make sense . No big deal either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the changes requested here #1337 (comment)

That would result in invalid time reporting

Ok(())
}

fn delete_account_storage(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be much clearer if this wouldn't accept and return RocksAccount.
Here is what I have in mind (I also renamed function to wipe_account_storage, but that's least important):

In commit_accounts

if let Some(account_info) = account {
    self.commit_account(address_hash, account_info)?;
} else {
    let timer = start_commit_timer("account:delete_account");
    self.wipe_account_storage(address_hash, /* delete_account= */ true)?;
    stop_timer(timer);
}

In commit_storage

...
let address_hash = keccak256(address);

if wipe_storage {
    let timer = start_commit_timer("storage:wipe_storage");
    self.wipe_account_storage(address_hash, /* delete_account= */ false)?;
    stop_timer(timer);
}

if !storage.is_empty() {
    // review comment: note that commit_storage_changes would have to load RocksAccount from db
    // this means that it's possible that we will have to load it twice
    // but I think it's quite uncommon, to have both wipe_storage and non-empty storage,
    // so I don't think it's big deal
    self.commit_storage_changes(address_hash, storage)?;
}

And this function would be (you can put timers if/where you want):

fn wipe_account_storage(&mut self, address_hash: B256, delete_account: bool) {
    // load account from db
    if let Some(raw_account) = self.db.get(address_hash)? {
        return;
    };
    let mut rocks_account = RocksAccount::decode(&mut raw_account.as_slice())?;

    // wipe storage trie and db
    if rocks_account.storage_root != EMPTY_ROOT_HASH {
        let account_db = AccountDB::new(address_hash, self.db.clone());
        let mut trie = EthTrie::from(Arc::new(account_db), rocks_account.storage_root)?;
        trie.clear_trie_from_db()?;
        rocks_account.storage_root = EMPTY_ROOT_HASH;
    }

    // update account trie and db
    if delete_account {
        self.db.delete(address_hash)?;
        let _ = self.trie.lock().remove(address_hash.as_ref());
    } else {
        self.db
            .put(address_hash, alloy_rlp::encode(&rocks_account))?;
        let _ = self.trie.lock().insert(
            address_hash.as_ref(),
            &alloy_rlp::encode(AccountStateInfo::from(&rocks_account)),
        );
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to do timers a little differently to properly record the number with the requested changes

.database
.db
.delete(keccak256(B256::from(U256::from(
block.header.number - BLOCKHASH_SERVE_WINDOW,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say we are executing block 10'256. My understanding is that transactions in that block can access block hash of latest 256 block at that point, which would include block 10'000. This line would remove that code, before block is executed. So maybe call manage_block_hash_serve_window and the end of execute_block.

Or am I missing something?

@@ -55,14 +45,15 @@ pub struct State {
pub database: EvmDB,
pub config: StateConfig,
execution_position: ExecutionPosition,
pub era_manager: Arc<Mutex<EraManager>>,
pub node_data_directory: PathBuf,
}

const GENESIS_STATE_FILE: &str = "trin-execution/resources/genesis/mainnet.json";
const TEST_GENESIS_STATE_FILE: &str = "resources/genesis/mainnet.json";

impl State {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, State as name no longer makes sense. I don't have alternative (especially considering that I don't like ExecutionContext for the other one either). Maybe TrinEth or EthExecution or something like that?

Ok(State {
execution_position,
config,
era_manager,
database,
node_data_directory,
})
}

pub fn initialize_genesis(&mut self) -> anyhow::Result<RootWithTrieDiff> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in my opinion, some of this logic should probably be in ExecutionContext as well. Maybe this structure should read GenesisConfig and just pass it there, in which case ExecutionContext would just contain the loop below.

But this structure might not have/want to expose this publicly. It would be called internally from process_range_of_blocks when block 0 has to be processed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed ExecutionContent to BlockExecutor, I don't believe initialize_genesis belongs in BlockExecutor it doesn't require the EVM or executing blocks.

I will move the call of initialize_genesis to process_range_of_blocks though as that should be fine

/// This is a lot faster then process_block() as it executes the range in memory, but we won't
/// return the trie diff so you can use this to sync up to the block you want, then use
/// `process_block()` to get the trie diff to gossip on the state bridge
pub async fn process_range_of_blocks(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't make sense for this function to accept start as argument, as it should always match self.next_block_number(). So why not just have end as an argument?

Alternatively, it can accept number of blocks to process, but I think end is better/more useful.

process_dao_fork(&self.database)?;
}
stop_timer(timer);
let root_with_trie_diff = execution_context.commit_bundle(last_state_root)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wouldn't not pass last_state_root to execution_context. Instead, I would do a check here.

let timer = start_timer_vec(&BLOCK_PROCESSING_TIMES, &["set_block_execution_number"]);
self.execution_position.set_next_block_number(
self.database.db.clone(),
execution_context.block_number() + 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't like that we are getting execution_context.block_number(), considering that we should what block we executed. We can maybe combine it with last_state_root. I think we can do something like this:

let mut last_executed_header = None;
for block_number in start..=end {
    ...
    execution_context.execute_block(&block)?;
    last_executed_header = block.header;
    ...
}

let Some(last_executed_header) = last_executed_header else {
    // no blocks were executed
    // either return error, since this can happen only if end<start
    // or something like this:
    return Ok(RootWithTrieDiff {
        root: self.execution_position.state_root(),
        changed_nodes: HashMap::new(),
    });
};

let root_with_trie_diff = execution_context.commit_bundle()?;
ensure!(root_with_trie_diff.root == last_executed_header.state_root, "..");

let timer = start_timer_vec(&BLOCK_PROCESSING_TIMES, &["set_block_execution_number"]);
self.execution_position.set_next_block_number(
    self.database.db.clone(),
    last_executed_header.block_number + 1,
    last_executed_header.state_root,
)?; // nit you can even rename set_next_block_number to update and only pass `last_executed_header` as argument
stop_timer(timer);

Ok(root_with_trie_diff)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no blocks were executed with the current design we have no way to get the last header.

I added an ensure at the top of the function which returns an error if end is less then start, which start is now self.execution_position.next_block_number()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I will just do last_executed_block_number because the suggestion isn't possible in certain edge cases as well

@KolbyML
Copy link
Member Author

KolbyML commented Sep 6, 2024

@morph-dev ready for another look

@KolbyML KolbyML force-pushed the why-transaction-so-slow branch from 2f1d50c to fe96d57 Compare September 7, 2024 08:40
Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big issues, almost there.

@@ -109,20 +108,17 @@ impl StateBridge {
cache_contract_storage_changes: true,
block_to_trace: BlockToTrace::None,
};
let mut state = State::new(Some(temp_directory.path().to_path_buf()), state_config).await?;
let mut state =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update variable name

/// - execute blocks
/// - commit the changes and retrieve the result
pub struct BlockExecutor<'a> {
pub evm: Evm<'a, (), State<EvmDB>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there a reason for evm to be public? Maybe make a function that returns data that is needed.

}
}

fn set_evm_environment_from_block(&mut self, block: &ProcessedBlock) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems that this function only need header, maybe don't pass entire processed_block

stop_timer(timer);
}

pub fn set_transaction_evm_context(&mut self, tx: &TransactionsWithSender) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be private?

stop_timer(timer);
}

pub fn transact(&mut self) -> anyhow::Result<ResultAndState> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we even need this function? I think it's called only from one place, in which case I think we can just remove it and call self.evm.transact()? directly.

self.execution_position.block_execution_number(),
block.header.number
end >= start,
"End block number {} is less than start block number {}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "End block number {end} is less than start block number {start}"

.build()
.transact()?
})
pub async fn process_block(&mut self, block_number: u64) -> anyhow::Result<RootWithTrieDiff> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need an argument. Like this, it behaves exactly as process_range_of_blocks.


if !storage.is_empty() {
self.commit_storage_changes(address_hash, rocks_account, storage)?;
// review comment: note that commit_storage_changes would have to load RocksAccount
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this was meant for you, not to leave it in the code :), so please delete it

storage: Vec<(U256, U256)>,
) -> anyhow::Result<()> {
let timer = start_commit_timer("storage:apply_updates");

let account_db = AccountDB::new(address_hash, self.db.clone());
let mut rocks_account = rocks_account.unwrap_or_default();
let mut rocks_account: RocksAccount = self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be extracted into separate function, as we use it in multiple places (both reading the account from db and decoding it).

);

// If we are starting from the beginning, we need to initialize the genesis state
if start == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that you can/should put genesis logic into BlockExecutor (we can consider this initialization the execution of block zero). We would have to refactor a bit how it works, but nothing special.

In this file, you would remove this check, and handle block zero the same way any other block is handled (just call block_executor.execute_block(&block)?).

And in the block_executor, you would have something like this:

fn process_genesis(&mut self) -> anyhow::Result<()> {
    let genesis_file = ...;
    let genesis: GenesisConfig = ...;
    self.increment_balances(genesis.alloc.iter().map(|(address, alloc)| (address, alloc.balance)))
}

pub fn execute_block(&mut self, block: &ProcessedBlock) -> anyhow::Result<()> {
    ...
    // somewhere in this function, either at the start or next to DAO_FORK
    if block.header.number == 0 {
        self.process_genesis()?;
    }
    ...
}

Note: I think it does, but I'm not 100% sure that increment_balances would work on non-existing accounts. If it doesn't, you can use self.evm.db_mut().insert_account

Copy link
Member Author

@KolbyML KolbyML Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        for (address, alloc_balance) in genesis.alloc {
            self.evm
                .db_mut()
                .insert_account(address, AccountInfo::from_balance(alloc_balance.balance));
        }

        // self.increment_balances(
        //     genesis
        //         .alloc
        //         .into_iter()
        //         .map(|(address, alloc)| (address, alloc.balance.to::<u128>())),
        // )

I tried both of these and they caused the wrong state root to be generated,

        for (address, alloc_balance) in genesis.alloc {
            let address_hash = keccak256(address);
            let mut account = RocksAccount::default();
            account.balance += alloc_balance.balance;
            self.evm.db().database.trie.lock().insert(
                address_hash.as_ref(),
                &alloy_rlp::encode(AccountStateInfo::from(&account)),
            )?;
            self.evm
                .db()
                .database
                .db
                .put(address_hash, alloy_rlp::encode(account))?;
        }

So I am just going to do this ^. I think this suggestion is an unintended usecase anyways, I think it is pointless to debug this for a 5 line difference.

I think the priority is shipping latest.

@KolbyML
Copy link
Member Author

KolbyML commented Sep 7, 2024

@morph-dev ready for another look

Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now. Thanks

stop_timer(timer);

let rocks_account = if !raw_account.is_empty() {
let decoded_account = RocksAccount::decode(&mut raw_account.as_slice())?;
let rocks_account = if let Some(decoded_account) = fetched_rocks_account {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I find fetched_rocks_account and decoded_account a bit confusing names. Maybe just rename both to old_rocks_account or existing_rocks_account?

@@ -297,6 +281,14 @@ impl EvmDB {
stop_timer(timer);

// Write Contract Code
// TODO: Delete contract code if no accounts point to it: https://github.com/ethereum/trin/issues/1428
// This can happen in 1 of 2 cases
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think you need to document problem here as well. I think first line and link to the issue with details is enough. Up to you.

@@ -175,11 +185,50 @@ impl<'a> BlockExecutor<'a> {
})
}

fn process_genesis(&mut self, header: &Header) -> anyhow::Result<()> {
ensure!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since this is internal function, I don't think it's necessary to do this check here (since the check is already done right before the call). Up to you.

/// return the trie diff so you can use this to sync up to the block you want, then use
/// `process_block()` to get the trie diff to gossip on the state bridge
/// Processes up to end block number (inclusive) and returns the root with trie diff
/// If the state cache gets too big, we will commit the state to the database early
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe add that we also return early in that case.

@@ -156,7 +75,7 @@ impl TrinExecution {
self.node_data_directory.clone(),
);
let range_start = Instant::now();
let mut last_executed_block_number = start - 1;
let mut last_executed_block_number = u64::MAX;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think with the check at line 66 and most recent changes, you can do what I proposed few reviews ago regarding keeping track of the last executed block (instead of last_executed_block_number and last_state_root).

Something like this:

let mut last_executed_block_header: Option<Header> = None;
for block_number in start..=end {
    ...
    block_executor.execute_block(&block)?;
    last_executed_block_header = Some(block.header);

    ...
}

let last_executed_block_header = last_executed_block_header.expect("At least one block must have been executed");

let root_with_trie_diff = block_executor.commit_bundle()?;
ensure!(root_with_trie_diff.root == last_executed_block_header.state_root, ... );

self.execute_position.update(self.database.db.clone(), last_executed_block_header);

Up to you.

pub async fn process_block(&mut self, block_number: u64) -> anyhow::Result<RootWithTrieDiff> {
self.process_range_of_blocks(block_number).await
pub async fn process_next_block(&mut self) -> anyhow::Result<RootWithTrieDiff> {
self.process_range_of_blocks(self.execution_position.next_block_number())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe use self.next_block_number()?

@@ -203,15 +254,15 @@ impl<'a> BlockExecutor<'a> {
self.cumulative_gas_used += block_gas_used;

// update beneficiary
let timer = start_timer_vec(&BLOCK_PROCESSING_TIMES, &["update_beneficiary"]);
let timer = start_timer_vec(&BLOCK_PROCESSING_TIMES, &["beneficiary_timer"]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I meant the variable name, not the timer name :)

@KolbyML KolbyML merged commit d48cb58 into ethereum:master Sep 8, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

trin-execution: execute block in memory and commit it at the end
2 participants