Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NONEVM-747 register unregister filters #938

Open
wants to merge 10 commits into
base: feature/NONEVM-745-logpoller-db-models
Choose a base branch
from

Conversation

dhaidashenko
Copy link
Contributor

@dhaidashenko dhaidashenko commented Nov 21, 2024

  • Implemented Register/Unregister LogPoller methods.
  • Actual filter pruning occurs in the background to avoid timeouts of UnregisterFilter calls caused by cascade removal of matching logs.
  • Changed EventSignature type to [8]byte (as specified here) to use it as map key
  • Backfill starts after a block is processed to ensure there is no gap between blocks processed by backfill and the main forward-moving loop.

Depends on:

…NEVM-747-filters

# Conflicts:
#	pkg/solana/logpoller/orm_test.go
}

filtersByEventSig[filter.EventSig] = append(filtersByEventSig[filter.EventSig], filter)
lp.filtersToBackfill = append(lp.filtersToBackfill, filter)
Copy link
Contributor

@reductionista reductionista Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was thinking we could just call this directly and wait for it to complete before returning. Now that you've added this filtersToBackfill queue, I think maybe it is something that will be useful.

But let me share some of my thought process, curious to hear any thoughts you may have to add to it...

On evm there is an annoying pattern that most products do of calling RegisterFilter and then immediately calling Replay afterwards and waiting for that to return. So I was hoping in the future we can just absorb Replay into RegisterFilter so that they only have to call a single function but once it returns they can rely on the data being there so the plugin can start doing whatever it needs to.

For example, CCIP says they require 8 hours of log history for the plugin to work properly. So it seems like they're going to need some way of knowing that the 8 hours has been populated before they start doing their thing? I guess I should ask them directly about what the preferred behavior is here. But even if they do want to wait, I guess it's possible some products wouldn't want to wait--so perhaps it's better that you've built it in a flexible way that could handle both (assuming we add some async method that can wrap this, blocking until the filter is done backfilling).

On evm, there is a concern about having multiple go routines writing to the logs table at the same time. On Solana, I think this should be less of an issue since the db schema is set up to separate logs associated with each filter from one another. (If a log matches multiple filters, then multiple copies of it will be saved.) So backfilling for one filter will not interfere with any backfills going on for other filters simultaneously. There still may be a problem though, which I hadn't thought of until just now... since this would open the door for the latest block to be different for different filters. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If a log matches multiple filters, then multiple copies of it will be saved.)

That reminds me, we'll need to include an extra deduping step in the ORM when logs are fetched

Copy link
Contributor Author

@dhaidashenko dhaidashenko Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My motivation to implement backfill using queue:

  1. If I recall correctly, you've mentioned that a job update triggered from the UI may lead to a RegisterFilter call. Thus, I've assumed that Register/Unregister should be as fast as possible, and additional processing should be done in the background.

  2. I also assumed that the main loop logs' filtration and extraction of indexed fields may be time-consuming, so we can not lock filters for the whole iteration; we should only lock when filters are accessed. Which forced me to look for an alternative way to sync start of backfill and main loop processing with recently added filter.

I guess we can simplify the queue by replacing it with go startBackFill(from, latestProcessedBlock). But in that case we won't be able to use Filters as a separate type and will have to embed them into logpoller.

For example, CCIP says they require 8 hours of log history for the plugin to work properly. So it seems like they're going to need some way of knowing that the 8 hours has been populated before they start doing their thing?

I was under the impression that CCIP is more flexible and can handle eventual consistency. Thus, we only need to guarantee that all logs will eventually end up in the DB. I.e., if CCIP observes sender msgs in reverse order:

  1. [4,3]
  2. [4,3,2,1]
  3. [4,3,2,1,0]
    It will handle it gracefully and wait for call 3 to start processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I recall main argument for the backfill queue.
As blocks are processed in effectively random order and Backfill operation itself is async we do not have "latest processed block" as in evm. Thus backfill should be done one every restart too. Having backfill queue allows us to do it in one place.
Also when RegisterFilter is called we might not have the latestProcessedBlock to start the backfill.

Copy link
Contributor

@reductionista reductionista Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with keeping it as a queue, but I don't follow (and/or agree with?) either the premise or the conclusion of this statement:

As blocks are processed in effectively random order random order and Backfill operation itself is async we do not have "latest processed block" as in evm.

Perhaps by "processed" you mean something else, but I thought we had all agreed that we must guarantee they are always saved in order of (block_number, log_index). In part to make sure that there is a well-defined last processed block, but mostly to avoid introducing gaps in the logs. You pointed out that Awbrey's ingestor fetches them in random order from the rpc, which I agreed was a big problem. So we discussed the problem with Awbrey, and we agreed that they must be re-ordered somewhere. But he made the argument that it makes more sense to do the reordering in the callback function passed to the ingestor rather than in his PR (the ingestor itself). The callback function will be a part of the PR I'm starting next and include the re-ordering so we can ensure this guarantee.

The design of LogPoller in the design doc is based around the assumption that we always have a well-defined "latest processed block". The only difference there from evm is that because we don't have a blocks table, we will get this by fetching the latest block number from the block_number column of the logs table instead of the block_number column of the blocks table. It's true that if the node crashes and the last few blocks had no logs in them, we may end up re-requesting a few empty blocks. But we shouldn't have to do anything special for that in terms of the way we're fetching logs.

A side digression on the term "backfill"... which I keep thinking we might want to avoid using on Solana due to confusion:

On evm, the distinction between a backfill and a non-backfill fetching of logs is whether we fetch logs from a range of block numbers or from a single block by block hash. On Solana, we don't have to worry about re-orgs so there is no reason for the second method of fetching logs--we always fetch them by block range rather than by individual blocks. However, we do have a choice of whether to iterate through specific contract addresses we're tracking and only fetch the logs from each of those or fetching all logs and then post-filtering on contract address. Under normal operations, the decision of whether to filter by contract address or to just get everything will be based on a combination of how many blocks have been mined since the last time LogPoller ran, and how many filters are registered. If it's a lot of filters and only a few new blocks have been added to the chain since last time we polled then it makes more sense to grab everything in a single rpc call and then filter. If on the other hand, there are only a few filters registered but we're thousands of blocks behind because the node has been down for a while, it makes more sense to first ask the rpc servers which blocks actually contain transactions associated with the addresses we're interested in and then only request the specific blocks and/or transactions we need instead of fetching every tx of every block.

RegisterFilter should be the only case where we know ahead of time which of those methods will be more optimal to use: since it's only for a single filter, we should always ask first about that address rather than getting all of the blocks... regardless of how far back the starting block is or how many filters have been registered.

For now, based on the way Awbrey named the methods I'm assuming when we say "backfill" on Solana we mean the case where we're fetching log history for a single filter rather than for all filters at once. But it might be there is a better term for that, so we don't confuse it with the use of "backfill" in evm which I think means something pretty different.

Copy link
Contributor

@reductionista reductionista Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think the reason I decided to clarify what I think we mean by "backfill" when we're using that term on Solana is because I got a little confused by this statement:

Backfill operation itself is async

By that, I'm guessing what you mean is that RegisterFilter is async... in that it can be called at any time by any thread, while the main LogPoller loop is doing its thing. The backfill itself we could have chosen to make synchronous or async, the way I see it is... you've chosen to make it async by having RegisterFilter return immediately rather than wait for it to complete. Yes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your point about not wanting to lock the filters for a long time. Might have been some other workaround but your way of handling it sounds reasonable to me. I'm not sure about the issue of UI, I think most products are already doing something similar on evm (calling RegisterFilter then Replay immediately after, which blocks until everything is backfilled). AFAIK this has been working like this for a while and hasn't caused any problems. But you could be right that we should be extra careful with Solana since these operations could potentially be more time consuming than on evm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure if we actually decided to do the ordering. Thus, I've tried to tailor work with filters to the current Loader implementation. But it works for the "ordered" version too.
As we both agree on keeping the queue, lets move discussion into slack or sync meeting. It would be nice to discuss how ordering will work.


func (o *DSORM) DeleteFilters(ctx context.Context, filters []Filter) error {
for _, filter := range filters {
err := o.DeleteFilter(ctx, filter.ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might we need to create a transaction for this, and pass it down to DeleteFilter.
What happens if some filters are deleted properly but then partway through this for loop the network connection to the db momentarily goes out and returns an error? It would return from DeleteFilters with an error at that point, with some deleted and some still there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Alternatively, could do a single DELETE query with a list of all the filter ID's to delete.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if db fails when only part of the filters is deleted, it's safe for us to retry as we are using ID. We'll add all filters scheduled for removal back into the queue and try again.
I hesitate to wrap it into the transaction as in case of timeout due to a large number of filters to be removed; we'll get stuck trying to delete filters and rolling back due to time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I guess that's good enough.

filtersByName map[string]Filter
filtersByAddress map[PublicKey]map[EventSignature][]Filter
filtersToBackfill []Filter
filtersToDelete []Filter // populated on start from db and pruned on first iteration of run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not matter since I don't expect the number of filters to be registered to be too huge, but I notice there are at least 2 different copies of the same Filter struct that get saved in memory and possibly 3 or 4 in some cases.

We could reduce memory footprint by making these slices of *Filter instead of slices of Filter so there is only one copy around. (See elsewhere for a different suggestion about making the slices into maps; this could apply to either.)

(Probably not worth worrying about for now.)

Copy link
Contributor Author

@dhaidashenko dhaidashenko Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, using pointers decreases readability as it gives the impression that the filter might be modified.
Unless we expect IDL to be a large string,g I'd prefer to use struct instead of pointer.

@cl-sonarqube-production
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 75%)

See analysis details on SonarQube

@reductionista
Copy link
Contributor

Not sure what the Compile SOM / build-container failure is about, I assume we will need to fix that before merging unless it's some sort of flake or known infra issue... but the code itself looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants