NONEVM-747 register unregister filters #938

dhaidashenko · 2024-11-21T15:42:55Z

Implemented Register/Unregister LogPoller methods.
Actual filter pruning occurs in the background to avoid timeouts of UnregisterFilter calls caused by cascade removal of matching logs.
Changed EventSignature type to [8]byte (as specified here) to use it as map key
Backfill starts after a block is processed to ensure there is no gap between blocks processed by backfill and the main forward-moving loop.

Depends on:

LogPoller DB Models

…NEVM-747-filters # Conflicts: # pkg/solana/logpoller/orm_test.go

pkg/solana/logpoller/log_poller.go

reductionista · 2024-11-22T19:46:21Z

pkg/solana/logpoller/log_poller.go

+	}
+
+	filtersByEventSig[filter.EventSig] = append(filtersByEventSig[filter.EventSig], filter)
+	lp.filtersToBackfill = append(lp.filtersToBackfill, filter)


Ah, I was thinking we could just call this directly and wait for it to complete before returning. Now that you've added this filtersToBackfill queue, I think maybe it is something that will be useful.

But let me share some of my thought process, curious to hear any thoughts you may have to add to it...

On evm there is an annoying pattern that most products do of calling RegisterFilter and then immediately calling Replay afterwards and waiting for that to return. So I was hoping in the future we can just absorb Replay into RegisterFilter so that they only have to call a single function but once it returns they can rely on the data being there so the plugin can start doing whatever it needs to.

For example, CCIP says they require 8 hours of log history for the plugin to work properly. So it seems like they're going to need some way of knowing that the 8 hours has been populated before they start doing their thing? I guess I should ask them directly about what the preferred behavior is here. But even if they do want to wait, I guess it's possible some products wouldn't want to wait--so perhaps it's better that you've built it in a flexible way that could handle both (assuming we add some async method that can wrap this, blocking until the filter is done backfilling).

On evm, there is a concern about having multiple go routines writing to the logs table at the same time. On Solana, I think this should be less of an issue since the db schema is set up to separate logs associated with each filter from one another. (If a log matches multiple filters, then multiple copies of it will be saved.) So backfilling for one filter will not interfere with any backfills going on for other filters simultaneously. There still may be a problem though, which I hadn't thought of until just now... since this would open the door for the latest block to be different for different filters. 🤔

(If a log matches multiple filters, then multiple copies of it will be saved.)

That reminds me, we'll need to include an extra deduping step in the ORM when logs are fetched

My motivation to implement backfill using queue:

If I recall correctly, you've mentioned that a job update triggered from the UI may lead to a RegisterFilter call. Thus, I've assumed that Register/Unregister should be as fast as possible, and additional processing should be done in the background.

I also assumed that the main loop logs' filtration and extraction of indexed fields may be time-consuming, so we can not lock filters for the whole iteration; we should only lock when filters are accessed. Which forced me to look for an alternative way to sync start of backfill and main loop processing with recently added filter.

I guess we can simplify the queue by replacing it with go startBackFill(from, latestProcessedBlock). But in that case we won't be able to use Filters as a separate type and will have to embed them into logpoller.

For example, CCIP says they require 8 hours of log history for the plugin to work properly. So it seems like they're going to need some way of knowing that the 8 hours has been populated before they start doing their thing?

I was under the impression that CCIP is more flexible and can handle eventual consistency. Thus, we only need to guarantee that all logs will eventually end up in the DB. I.e., if CCIP observes sender msgs in reverse order:

[4,3]

[4,3,2,1]

[4,3,2,1,0]
It will handle it gracefully and wait for call 3 to start processing.

Oh, now I recall main argument for the backfill queue.
As blocks are processed in effectively random order and Backfill operation itself is async we do not have "latest processed block" as in evm. Thus backfill should be done one every restart too. Having backfill queue allows us to do it in one place.
Also when RegisterFilter is called we might not have the latestProcessedBlock to start the backfill.

I'm fine with keeping it as a queue, but I don't follow (and/or agree with?) either the premise or the conclusion of this statement:

As blocks are processed in effectively random order random order and Backfill operation itself is async we do not have "latest processed block" as in evm.

Perhaps by "processed" you mean something else, but I thought we had all agreed that we must guarantee they are always saved in order of (block_number, log_index). In part to make sure that there is a well-defined last processed block, but mostly to avoid introducing gaps in the logs. You pointed out that Awbrey's ingestor fetches them in random order from the rpc, which I agreed was a big problem. So we discussed the problem with Awbrey, and we agreed that they must be re-ordered somewhere. But he made the argument that it makes more sense to do the reordering in the callback function passed to the ingestor rather than in his PR (the ingestor itself). The callback function will be a part of the PR I'm starting next and include the re-ordering so we can ensure this guarantee.

The design of LogPoller in the design doc is based around the assumption that we always have a well-defined "latest processed block". The only difference there from evm is that because we don't have a blocks table, we will get this by fetching the latest block number from the block_number column of the logs table instead of the block_number column of the blocks table. It's true that if the node crashes and the last few blocks had no logs in them, we may end up re-requesting a few empty blocks. But we shouldn't have to do anything special for that in terms of the way we're fetching logs.

A side digression on the term "backfill"... which I keep thinking we might want to avoid using on Solana due to confusion:

On evm, the distinction between a backfill and a non-backfill fetching of logs is whether we fetch logs from a range of block numbers or from a single block by block hash. On Solana, we don't have to worry about re-orgs so there is no reason for the second method of fetching logs--we always fetch them by block range rather than by individual blocks. However, we do have a choice of whether to iterate through specific contract addresses we're tracking and only fetch the logs from each of those or fetching all logs and then post-filtering on contract address. Under normal operations, the decision of whether to filter by contract address or to just get everything will be based on a combination of how many blocks have been mined since the last time LogPoller ran, and how many filters are registered. If it's a lot of filters and only a few new blocks have been added to the chain since last time we polled then it makes more sense to grab everything in a single rpc call and then filter. If on the other hand, there are only a few filters registered but we're thousands of blocks behind because the node has been down for a while, it makes more sense to first ask the rpc servers which blocks actually contain transactions associated with the addresses we're interested in and then only request the specific blocks and/or transactions we need instead of fetching every tx of every block.

RegisterFilter should be the only case where we know ahead of time which of those methods will be more optimal to use: since it's only for a single filter, we should always ask first about that address rather than getting all of the blocks... regardless of how far back the starting block is or how many filters have been registered.

For now, based on the way Awbrey named the methods I'm assuming when we say "backfill" on Solana we mean the case where we're fetching log history for a single filter rather than for all filters at once. But it might be there is a better term for that, so we don't confuse it with the use of "backfill" in evm which I think means something pretty different.

Oh, I think the reason I decided to clarify what I think we mean by "backfill" when we're using that term on Solana is because I got a little confused by this statement:

Backfill operation itself is async

By that, I'm guessing what you mean is that RegisterFilter is async... in that it can be called at any time by any thread, while the main LogPoller loop is doing its thing. The backfill itself we could have chosen to make synchronous or async, the way I see it is... you've chosen to make it async by having RegisterFilter return immediately rather than wait for it to complete. Yes?

I agree with your point about not wanting to lock the filters for a long time. Might have been some other workaround but your way of handling it sounds reasonable to me. I'm not sure about the issue of UI, I think most products are already doing something similar on evm (calling RegisterFilter then Replay immediately after, which blocks until everything is backfilled). AFAIK this has been working like this for a while and hasn't caused any problems. But you could be right that we should be extra careful with Solana since these operations could potentially be more time consuming than on evm.

I was not sure if we actually decided to do the ordering. Thus, I've tried to tailor work with filters to the current Loader implementation. But it works for the "ordered" version too.
As we both agree on keeping the queue, lets move discussion into slack or sync meeting. It would be nice to discuss how ordering will work.

pkg/solana/logpoller/log_poller.go

reductionista · 2024-11-22T23:05:57Z

pkg/solana/logpoller/orm.go

+
+func (o *DSORM) DeleteFilters(ctx context.Context, filters []Filter) error {
+	for _, filter := range filters {
+		err := o.DeleteFilter(ctx, filter.ID)


We might we need to create a transaction for this, and pass it down to DeleteFilter.
What happens if some filters are deleted properly but then partway through this for loop the network connection to the db momentarily goes out and returns an error? It would return from DeleteFilters with an error at that point, with some deleted and some still there.

(Alternatively, could do a single DELETE query with a list of all the filter ID's to delete.)

Even if db fails when only part of the filters is deleted, it's safe for us to retry as we are using ID. We'll add all filters scheduled for removal back into the queue and try again.
I hesitate to wrap it into the transaction as in case of timeout due to a large number of filters to be removed; we'll get stuck trying to delete filters and rolling back due to time.

Alright, I guess that's good enough.

reductionista · 2024-11-22T23:27:19Z

pkg/solana/logpoller/log_poller.go

+	filtersByName     map[string]Filter
+	filtersByAddress  map[PublicKey]map[EventSignature][]Filter
+	filtersToBackfill []Filter
+	filtersToDelete   []Filter // populated on start from db and pruned on first iteration of run


This may not matter since I don't expect the number of filters to be registered to be too huge, but I notice there are at least 2 different copies of the same Filter struct that get saved in memory and possibly 3 or 4 in some cases.

We could reduce memory footprint by making these slices of *Filter instead of slices of Filter so there is only one copy around. (See elsewhere for a different suggestion about making the slices into maps; this could apply to either.)

(Probably not worth worrying about for now.)

IMHO, using pointers decreases readability as it gives the impression that the filter might be modified.
Unless we expect IDL to be a large string,g I'd prefer to use struct instead of pointer.

…imary fields

…NEVM-747-filters # Conflicts: # pkg/solana/logpoller/orm_test.go # pkg/solana/logpoller/query.go

cl-sonarqube-production · 2024-11-26T14:50:46Z

Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 75%)

See analysis details on SonarQube

reductionista · 2024-11-27T19:05:30Z

Not sure what the Compile SOM / build-container failure is about, I assume we will need to fix that before merging unless it's some sort of flake or known infra issue... but the code itself looks good to me.

implement register unregister filter operations

ee63576

dhaidashenko requested a review from a team as a code owner November 21, 2024 15:42

dhaidashenko temporarily deployed to integration November 21, 2024 15:43 — with GitHub Actions Inactive

dhaidashenko requested review from reductionista, DylanTinianov and EasterTheBunny November 21, 2024 15:43

Merge branch 'feature/NONEVM-745-logpoller-db-models' into feature/NO…

1567d41

…NEVM-747-filters # Conflicts: # pkg/solana/logpoller/orm_test.go

dhaidashenko requested a review from a team as a code owner November 22, 2024 16:51

dhaidashenko temporarily deployed to integration November 22, 2024 16:51 — with GitHub Actions Inactive

dhaidashenko had a problem deploying to integration November 22, 2024 16:58 — with GitHub Actions Failure