Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ViemClient to support concurrent transactions #1266

Open
dcposch opened this issue Aug 12, 2024 · 3 comments
Open

Fix ViemClient to support concurrent transactions #1266

dcposch opened this issue Aug 12, 2024 · 3 comments

Comments

@dcposch
Copy link
Member

dcposch commented Aug 12, 2024

Summary

Under load / many people trying to sign up or transact simultaneously, there's contention for the next nonce.

This has resulted in production downtime.

Proposed fix

Two options, not mutually exclusive:

  • Use multiple EOAs. This lets us send transactions without nonce contention.
  • Track pending nonces. This is what we're currently doing, and have seen unexpected behavior. We have to submit multiple pending transactions into an upcoming block from a single EOA. If a transaction fails (eg because of a gas price spike > underpriced), we then have to reset the nonce and try again.

In either case (= even with multiple EOAs each submitting at most one tx per block, no pending stacked nonces), we will sometimes have transactions fail without reverting during gas price spikes and sequencer issues.

In that case, we must retry the same nonce.

Open question: what is the exact condition where the sequencer returns replacement transaction underpriced ?

@dcposch dcposch changed the title Update ViemClient to support more concurrent transactions Fix ViemClient to support concurrent transactions Aug 12, 2024
@autoregressive
Copy link

Nonce Issue
I’m making big assumptions here as I’m not super familiar with your systems. However generally speaking you’re trying to solve the knapsack problem here. It’s likely only worth introducing parallelism (multiple eoas) if you can reduce complexity (e.g. splitting by txs which are not state dependent). It may be worthwhile exploring how block builders are implemented for inspiration https://github.com/flashbots/rbuilder

For context my team runs searchers across various evm chains.

Gas Issue

Your gas pricing is likely occurring due to two reasons:

  1. The node is returning the base fee for block n when you actually want the base fee for n+1. The next block gas fee is trivial to compute given the block header for block n.
  2. L1 fee miscalculation, most node implementations do not get this correct under heavy usage. It is non trivial to implement. https://docs.optimism.io/stack/transactions/fees

My team has largely solved this with custom node implementations.

@dcposch
Copy link
Member Author

dcposch commented Aug 30, 2024

  • First one's easy in our case. These are simple ERC20 transfers, so can be split by sending account.
  • Gas issue looks interesting. Thanks for the info!

Curious why this requires a custom node implementation. Is this a forked L2 EL that provides RPC methods for accurate gas estimation?

@autoregressive
Copy link

Nonce/state
You may also want to consider eth_sendRawTransactionConditional if you haven’t already. Not sure if it’s been rolled out yet on base.

Gas issue
Most node implementations return for end of block n when you are targeting top of n+1.

You can tell if there estimation issues if you estimate gas or generate an access list, then define your gasLimit as a fixed ratio of estimate gas usage (e.g. gasUsed/0.9). Your total gas used should be 90% in that case and if it differs, something is wrong esp for something simple like transfers. This presumes you can consistently hit next block.

Why does next block matter? Because that’s the most relevant state you’re working with - your gas params should in theory be deterministic.

An experiment to run would be attempt to land transactions on specific blocks. You’ll find that you can’t do it consistently with node providers even if you bid crazy gas - it’s due to latency and/or bad l1 few estimates. The latency issue exacerbates everything.

In summary, what you’re getting is potentially incorrect but also behind out the gate and may be even further behind due to bad infra.

First step run your own nodes. Then either a custom rpc endpoint (heavy on infra costs though to scale) or the way we do it which is directly accessing state from the backend db the node writes to and load it into an evm implementation. You can achieve scale with the latter solution if architected correctly, as you won’t bog down your node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants