Add finalizer component to TXM #13638

amit-momin · 2024-06-20T17:42:21Z

Description

Implements a finalizer component that regularly checks confirmed transactions on new heads if they are considered finalized and marks them if they are. The finality validation uses the checks below

Is the tx's receipt block num older or equal to the latest finalized head
Does the tx's receipt block hash match the block hash in the cached longest chain
If the receipt is older than the cached longest chain history, check using BlockByHash

Note

To account for finality violations, the Confirmer has the ability to unmark a transaction as finalized if it finds the receipt hash missing in the longest chain

Ticket:
https://smartcontract-it.atlassian.net/browse/BCI-3486

Requires:

dimriou · 2024-06-21T12:08:45Z

common/txmgr/finalizer.go

+
+	// Check if block hashes exist for receipts on-chain older than the earliest cached head
+	// Transactions are grouped by their receipt block hash to minimize the number of RPC calls in case transactions were confirmed in the same block
+	// This check is only expected to be used in rare cases if there was an issue with the HeadTracker or if the node was down for significant time


That is not true. This is directly affected by the HistoryDepth config we have in Head Tracker, which controls how many blocks we store before the latest finalized. This is actually a very useful observation. Initially, we were aiming to have HistoryDepth set to 0 since most of HT's users were not interested in blocks past the latest finalized. However, based on your approach above it makes sense to set a reasonable value for that config so we can avoid excessive RPC calls in exchange for some in-memory storage. Once we introduce HT's optimizations, we could further increase it.
CAUTION: Recently we observed on certain chains that the depth of latest to 'finalized` is quite large, so we added a config to return an error when this happens. We should consider such cases here and perhaps gracefully handle them or even alert the user as transactions might not get finalized for a long period of time.

Good to know! Didn't know we were considering setting HistoryDepth=0. Surprised none of the other callers needed to do a block hash check to protect against re-orgs but I don't really know the use cases.

For the case that the depth of latest to 'finalized` becomes quite large, I can check with CCIP/Automation to see if they care. I could foresee warning on this adding special scenarios that become hard to debug especially if products act on them.

dimriou · 2024-06-21T12:17:15Z

common/txmgr/finalizer.go

+	// This check is only expected to be used in rare cases if there was an issue with the HeadTracker or if the node was down for significant time
+	var wg sync.WaitGroup
+	var txMu sync.RWMutex
+	for receiptBlockHash, txs := range receiptBlockHashToTx {


Why not BatchCall instead?

I would prefer using a batch call but I was trying to avoid introducing another method in the txmgr client just for this. Otherwise, using BatchCallContext directly would have introduced some chain specific code when specifying eth_getBlockByHash. Although, I'm reconsidering now if this component even needs to be in common. If we're pushing to degeneralize all of this code, I should put this in the EVM side where this isn't a problem.

I moved the Finalizer to the EVM code and changed this to a batch call in the latest commit

dimriou · 2024-06-21T12:21:18Z

common/txmgr/finalizer.go

+		wg.Add(1)
+		go func(hash BLOCK_HASH, txs []*txmgrtypes.Tx[CHAIN_ID, ADDR, TX_HASH, BLOCK_HASH, SEQ, FEE]) {
+			defer wg.Done()
+			if head, rpcErr := f.client.HeadByHash(ctx, hash); rpcErr == nil && head.IsValid() {


I would also check if the block number of the requested head is lower than the fetched latestFinalizedHead just to be on the safe side, otherwise we could throw an invariant violation error.

I'm might be talking about a different thing, but I agree that we need to check fetched block against the latest finalized block, but we should use HeadTracker's LatestAndFinalizedBlock from this PR.

The main reason is that MultiNode only provides a repeatable read guarantee for data that is older than the finalized block fetched with LatestAndFinalizedBlock. It does not provide this guarantee for blocks fetched by LatestFinalizedHead.
Why? Because of HeadTracker's FinalityTagBypass option. We plan to remove it with HeadTracker's performance optimization.

I added a check to ensure the fetched block's num is still earlier or equal to the latest finalized block num. Thanks for the heads up! I'll update to using LatestAndFinalizedBlock once your PR is merged. For now, I've marked this PR dependent on yours.

dhaidashenko · 2024-06-24T11:29:13Z

core/chains/evm/txmgr/evm_tx_store.go

@@ -181,6 +181,7 @@ type DbEthTx struct {
 	// InitialBroadcastAt is recorded once, the first ever time this eth_tx is sent


Should we modify ReapTxHistory to only delete finalized txs?

Good call! I was trying to avoid touching it in this PR to avoid scope creep but the changes seemed minimal enough. I made the updates in the latest commit

dhaidashenko · 2024-06-24T11:35:40Z

common/txmgr/finalizer.go

+	f.initSync.Lock()
+	defer f.initSync.Unlock()
+	if f.isStarted {
+		return errors.New("Finalizer is already started")


isStarted and initSync do not seem to be needed. StartOnce will return an error if it's called multiple times.
The same works for StopOnce

Good catch. This got carried over from another component. I removed it in the latest commit.

dhaidashenko · 2024-06-24T12:09:30Z

common/txmgr/finalizer.go

+	f.lggr.Debugw("processing latest finalized head", "block num", latestFinalizedHead.BlockNumber(), "block hash", latestFinalizedHead.BlockHash(), "earliest block num in chain", earliestBlockNumInChain)
+
+	// Retrieve all confirmed transactions, loaded with attempts and receipts
+	confirmedTxs, err := f.txStore.FindTransactionsByState(ctx, TxConfirmed, f.chainId)


Depending on the ReaperThreshold config option we might load a lot of transactions that were already finalized, which in theory could lead to out-of-memory issues. Can't we filter them on the db level?

Also it might be a good idea to process transactions in batches.
On release we'll have to process all confirmed transactions at once, which might be problematic. On some chains we store txs for 5 days (168h) by default and CCIP is running all chains on single node.

Good point. I was trying to use a generic method so that this would be easier to integrate with the in-memory store later on. But that's getting ahead of myself. I've updated to use a more targeted query. I still need to think on batching though. You're right about the large number of tx we'd finalize when this first runs but I also want to avoid slowing this process down for normal use.

I've added limits to the batch RPC call to respect the RPCDefaultBatchSize but don't think we need to worry about loading all of the confirmed tx into memory. I can discuss more offline in the thread.

dhaidashenko · 2024-06-24T12:12:40Z

common/txmgr/finalizer.go

+			continue
+		}
+		receipt := tx.GetReceipt()
+		if receipt == nil || receipt.IsZero() || receipt.IsUnmined() {


Shouldn't we log an error here to signal that we expect a valid receipt for confirmed tx?

Added an assumption violation log here in the latest commit

dhaidashenko · 2024-06-24T13:53:44Z

common/txmgr/finalizer.go

+		wg.Add(1)
+		go func(hash BLOCK_HASH, txs []*txmgrtypes.Tx[CHAIN_ID, ADDR, TX_HASH, BLOCK_HASH, SEQ, FEE]) {
+			defer wg.Done()
+			if head, rpcErr := f.client.HeadByHash(ctx, hash); rpcErr == nil && head.IsValid() {


I'm might be talking about a different thing, but I agree that we need to check fetched block against the latest finalized block, but we should use HeadTracker's LatestAndFinalizedBlock from this PR.

The main reason is that MultiNode only provides a repeatable read guarantee for data that is older than the finalized block fetched with LatestAndFinalizedBlock. It does not provide this guarantee for blocks fetched by LatestFinalizedHead.
Why? Because of HeadTracker's FinalityTagBypass option. We plan to remove it with HeadTracker's performance optimization.

dimriou · 2024-06-26T12:10:56Z

core/chains/evm/txmgr/finalizer.go

+	})
+}
+
+func (f *evmFinalizer) startInternal(_ context.Context) error {


You don't need startInternal scheme, which is something we want to deprecate eventually. This is only used for services that require to refetch the enabled addresses via EnabledAddressesForChain . You should follow the same pattern as reaper.go.

I ran into some problems with health checks when I followed the Reaper pattern. I removed startinternal but still kept the StartOnce pattern which has some health status changes built in.

dimriou · 2024-06-26T12:37:16Z

common/txmgr/types/config.go

You also need to remove reaper_chain_config.go from mocks.

dimriou · 2024-06-26T12:39:36Z

common/txmgr/types/client.go

@@ -30,6 +31,7 @@ type TxmClient[
 		ctx context.Context,
 		attempts []TxAttempt[CHAIN_ID, ADDR, TX_HASH, BLOCK_HASH, SEQ, FEE],
 	) (txReceipt []R, txErr []error, err error)
+	HeadByHash(ctx context.Context, hash BLOCK_HASH) (HEAD, error)


I think evm TXM client uses the general Client evm interface, so this is not needed.

dimriou · 2024-07-30T13:43:18Z

core/chains/evm/txmgr/finalizer.go

+	return nil
+}
+
+func (f *evmFinalizer) batchCheckReceiptHashesOnchain(ctx context.Context, blockNumToReceiptsMap map[int64][]Receipt) ([]Receipt, error) {


This method doesn't seem to return an error in any path. You can probably drop it.

Good catch! It's removed in the latest commit

…-get-transaction-status-method

core/chains/evm/txmgr/finalizer.go

dimriou · 2024-08-02T16:00:50Z

core/chains/evm/txmgr/models.go

@@ -36,12 +36,13 @@ type (
 	Tx                     = txmgrtypes.Tx[*big.Int, common.Address, common.Hash, common.Hash, evmtypes.Nonce, gas.EvmFee]
 	TxMeta                 = txmgrtypes.TxMeta[common.Address, common.Hash]
 	TxAttempt              = txmgrtypes.TxAttempt[*big.Int, common.Address, common.Hash, common.Hash, evmtypes.Nonce, gas.EvmFee]
-	Receipt                = dbReceipt // EvmReceipt is the exported DB table model for receipts
+	Receipt                = DbReceipt // DbReceipt is the exported DB table model for receipts


TODO: during clean up we need to switch to evtypes.Receipt instead

dimriou · 2024-08-02T16:03:41Z

core/chains/evm/headtracker/simulated_head_tracker.go

nit: Perhaps using a straight mock instead for Finalizer tests, might have been a cleaner approach.

…-get-transaction-status-method

cl-sonarqube-production · 2024-08-06T18:34:24Z

Quality Gate passed

Issues
12 New issues
1 Fixed issue
0 Accepted issues

Measures
0 Security Hotspots
87.0% Coverage on New Code
6.9% Duplication on New Code

See analysis details on SonarQube

* develop: Add finalizer component to TXM (#13638) auto: adjust cron contract imports (#13927) Set PriceMin to match pip-35 definition (#14014) update solana e2e test build deps (#13978) fix data race in syncer/launcher (#14050) [KS-411] Extra validation for FeedIDs in Streams Codec (#14038) [TT-1262] dump pg on failure (#14029) ks-409 fix the mock trigger to ensure events are sent (#14047) update readme's with information about CL node TOML config (#14028) [CCIP-Merge] OCR2 plugins [CCIP-2942] (#14043) [BCF - 3339] - Codec and CR hashed topics support (#14016) common version update to head of develop (#14030)

amit-momin temporarily deployed to sdlc June 20, 2024 17:42 — with GitHub Actions Inactive

amit-momin changed the title ~~Add a finalizer component to TXM~~ Add finalizer component to TXM Jun 20, 2024

amit-momin marked this pull request as ready for review June 20, 2024 18:06

amit-momin requested review from a team as code owners June 20, 2024 18:06

dimriou reviewed Jun 21, 2024

View reviewed changes

dimriou requested a review from dhaidashenko June 21, 2024 12:13

dimriou reviewed Jun 21, 2024

View reviewed changes

dhaidashenko reviewed Jun 24, 2024

View reviewed changes

amit-momin temporarily deployed to sdlc June 24, 2024 22:13 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc June 24, 2024 22:30 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc June 24, 2024 22:32 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc June 25, 2024 21:23 — with GitHub Actions Inactive

dimriou reviewed Jun 26, 2024

View reviewed changes

common/txmgr/types/config.go Outdated

Copy link

Collaborator

dimriou Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to remove reaper_chain_config.go from mocks.

dimriou reviewed Jun 26, 2024

View reviewed changes

amit-momin temporarily deployed to sdlc June 26, 2024 16:13 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc June 26, 2024 17:32 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc June 26, 2024 17:35 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc June 28, 2024 17:35 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc June 28, 2024 20:06 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc July 1, 2024 17:11 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc July 1, 2024 18:28 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc July 1, 2024 19:41 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc July 1, 2024 19:59 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc July 1, 2024 20:09 — with GitHub Actions Inactive

amit-momin temporarily deployed to sdlc July 1, 2024 21:23 — with GitHub Actions Inactive

Farber98 mentioned this pull request Jul 3, 2024

[BCI-3573] - Remove dependence on FinalityDepth in EVM TXM code #13755

Closed

amit-momin temporarily deployed to sdlc July 30, 2024 00:38 — with GitHub Actions Inactive

dimriou reviewed Jul 30, 2024

View reviewed changes

amit-momin added 2 commits July 30, 2024 09:25

Removed unused error

f8d2487

Merge branch 'develop' into BCI-3486-implement-finality-check-for-the…

5a7d52a

…-get-transaction-status-method

amit-momin temporarily deployed to sdlc July 30, 2024 14:26 — with GitHub Actions Inactive

Farber98 mentioned this pull request Jul 31, 2024

[BCI-3730] - Remove dependence on MinConfirmations in EVM TXM code #13973

Open

huangzhen1997 reviewed Jul 31, 2024

View reviewed changes

core/chains/evm/txmgr/finalizer.go Show resolved Hide resolved

dimriou reviewed Aug 2, 2024

View reviewed changes

dimriou approved these changes Aug 2, 2024

View reviewed changes

Merge branch 'develop' into BCI-3486-implement-finality-check-for-the…

da5c78e

…-get-transaction-status-method

silaslenihan temporarily deployed to sdlc August 5, 2024 13:41 — with GitHub Actions Inactive

dhaidashenko approved these changes Aug 5, 2024

View reviewed changes

huangzhen1997 mentioned this pull request Aug 5, 2024

Improve TXM performance by optimize Confirmer and Finalizer queries to stop pulling EVM receipt #14039

Merged

Merge branch 'develop' into BCI-3486-implement-finality-check-for-the…

e707ede

…-get-transaction-status-method

amit-momin temporarily deployed to sdlc August 6, 2024 18:18 — with GitHub Actions Inactive

prashantkumar1982 approved these changes Aug 6, 2024

View reviewed changes

prashantkumar1982 added this pull request to the merge queue Aug 6, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 6, 2024

prashantkumar1982 added this pull request to the merge queue Aug 6, 2024

Merged via the queue into develop with commit 2312827 Aug 6, 2024
118 checks passed

prashantkumar1982 deleted the BCI-3486-implement-finality-check-for-the-get-transaction-status-method branch August 6, 2024 19:51

github-actions bot mentioned this pull request Aug 6, 2024

[DO NOT MERGE] Changeset Release Preview - v2.19.0 #13148

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add finalizer component to TXM #13638

Add finalizer component to TXM #13638

amit-momin commented Jun 20, 2024

dimriou Jun 21, 2024

amit-momin Jun 21, 2024

dimriou Jun 21, 2024

amit-momin Jun 21, 2024

amit-momin Jun 24, 2024

dimriou Jun 21, 2024

dhaidashenko Jun 24, 2024

amit-momin Jun 24, 2024

dhaidashenko Jun 24, 2024

amit-momin Jun 24, 2024

dhaidashenko Jun 24, 2024

amit-momin Jun 24, 2024

dhaidashenko Jun 24, 2024

dhaidashenko Jun 24, 2024

amit-momin Jun 24, 2024

amit-momin Jun 25, 2024

dhaidashenko Jun 24, 2024

amit-momin Jun 24, 2024

dhaidashenko Jun 24, 2024

dimriou Jun 26, 2024 •

edited

Loading

amit-momin Jul 1, 2024

dimriou Jun 26, 2024

dimriou Jun 26, 2024

dimriou Jul 30, 2024

amit-momin Jul 30, 2024

dimriou Aug 2, 2024

dimriou Aug 2, 2024

cl-sonarqube-production bot commented Aug 6, 2024

		@@ -181,6 +181,7 @@ type DbEthTx struct {
		// InitialBroadcastAt is recorded once, the first ever time this eth_tx is sent

Add finalizer component to TXM #13638

Add finalizer component to TXM #13638

Conversation

amit-momin commented Jun 20, 2024

Description

Note

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimriou Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cl-sonarqube-production bot commented Aug 6, 2024

Quality Gate passed

dimriou Jun 26, 2024 •

edited

Loading