Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offchain Indexers unreliably processing data #50

Open
ethyi opened this issue Apr 5, 2023 · 3 comments
Open

Offchain Indexers unreliably processing data #50

ethyi opened this issue Apr 5, 2023 · 3 comments

Comments

@ethyi
Copy link
Contributor

ethyi commented Apr 5, 2023

The offchain indexer is failing to reliably process offchain json uris.
This is making get_asset calls return NFTs without its offchain data populated, the most important being the 'files' field which usually holds the uri to the NFT's image.

Context

NFT metadata gets first indexed when the metadata account is processed via acc geyser. Note that this is different from the mint account.
The ingester initially populates the DB with the onchain metadata, a reference to the offchain uri, and puts a placeholder "processing" in the metadata field until the offchain indexer populates it.
The offchain indexer is a background task that is asynchronously invoked by the ingester. When a new task is submitted it is added to the "tasks" table, and each row contains data on the id that called it and its status.

The task makes a get request to the uri with a timeout of 3 seconds. If at any point the request returns a "reqwest::Error", the row in the tasks table will detail that a failure happened along with a short message identifying the error. Currently all reqwest::Error are converted to "Error downloading batch files" which is likely what most will receive.

  • Each NFT will create its own background task, regardless if many have the same json uri
  • The bgtask runner will purge tasks older than 1 hr
  • If the task is already present in the database, a new task job will not be submitted even if the metadata account is reprocessed.
    • Owning an FFF is a great way to test this since you can dynamically change the offchain data.
  • The timeout for offchain calls is 3 seconds.
  • There are a maximum of 3 attempts before the offchain indexer completely drops the data.
  • The bgtask runner will rerun tasks for all rows in asset_data that have their metadata as "processing"
  • (*on helius) there are more offchain failures than there are successes

Examples

3LBkptXr6N9y3S2eSRTxCega91qHd4Fd6hWyNmBdCzRu

Solutions

  • Priority background task runner with more flexibility. Addresses that see lots of activity will use this instead
  • Increase attempts, and increase timeout if possible.
@linuskendall
Copy link
Collaborator

I took a bit of look at this and a few ideas from my side:

  1. We should add a configurable timeout and a configurable number of retries, probably also MAX_TASK_BATCH_SIZE should be configurable too so we could tweak these values accordingly. In this way we can experiment with different values for these and track results (see 2)

  2. We should capture the metrics for error rates and compare, right now it looks like it's quite different across different envs. We have one env where it's about 30% at the moment. Is the bgtask_succes/bgtask_error sufficiently finegrained metric or is there another one we should look at ?

  3. When it comes to the priority, we could add a priority field to the DB and add a second ordering here: https://github.com/linuskendall/digital-asset-rpc-infrastructure/blob/fix_das_api_frontend_panic/nft_ingester/src/tasks/mod.rs#L149 ? We could also if we wanted to get fancy add the ability for a bgtask runner to only process tasks with priority > N, so you can deploy priority runners in your infra and others. This would require some tooling to interface with the DB and mark certain assets as priority for downloads.

Potential future:

  1. Maybe we should have something like a download manager for these downloads? For example I am wondering if we should ratelimit our fetches from a single upstream service.

@austbot
Copy link
Collaborator

austbot commented Apr 15, 2023

This is an interesting problem and has to do with the number of available file descriptors, networking capacity and reliability of thr places the json is hosted.

I have a timeout in there and batch size. But they are t configurable now but easy to put in. The arweave json uris seem to have alot of trouble.

It's possible that the http lib we are using doesnt handle many many connections to the same host very well.

linuskendall added a commit to rpcpool/digital-asset-rpc-infrastructure that referenced this issue Apr 17, 2023
@NicolasPennie
Copy link
Collaborator

This is an interesting problem and has to do with the number of available file descriptors, networking capacity and reliability of thr places the json is hosted.

I have a timeout in there and batch size. But they are t configurable now but easy to put in. The arweave json uris seem to have alot of trouble.

It's possible that the http lib we are using doesnt handle many many connections to the same host very well.

We would really benefit form setting a TTL on off-chain data. So we don't re-index an offchain until we have passed the TTL duration. This would dramatically decrease the load and especially help with services like arweave or ipfs gateways that rate limit by IP.

tahsintunan added a commit to helius-labs/digital-asset-rpc-infrastructure that referenced this issue Jun 22, 2023
* Add structured logging and reduce log noisiness (#48)

* Fix GetAssetsByCreator query (#8)

* Bump to 0.7.1

* Handle HTTP errors

Handle and log http errors separately.

* Make bgtask runner configurable

Partially addresses metaplex-foundation#50.

* Enable configurable retries

Enable configurable retries for the bgtask.

* Add purged task metric

* add new task metric

* Fixed nft ingester

* Fix the reading of task timeout

* Set slow statements to debug

* Remove task name

* Fix purge task

* acc forwarder (#11)

* acc forwarder

* version to .0.1

* Fixing unused imports

Cleaning out unused imports.

* Hopefully final fix for interval handling

* Fix interval

* fix(das-api): fix getAssetProof

* Fix before/after cursors. (#67)

* Fix edition nonce (#78)

* fix(ingester): fix compressed nft burn indexing (#24) (#80)

* fix(ingester): fix compressed nft burn indexing

* Small fix for asssetId calculation

* feat(das-api): support searching assets by `jsonUri` parameter (#79)

* add dto parameter

* add dao condition

* upd asset_data fetching to ensure single presence of asset_data in query

* use asset.asset_data key for linking between asset/asset_data

* fix: breaking arg order for array queries

* chore: consistent arg order in `SearchAssets` destructuring

* fix: breaking arg count for array params in searchAssets

* fix: use IndexMap instead of BTreeMap

---------

Co-authored-by: Nicolas Pennie <[email protected]>
Co-authored-by: Linus Kendall <[email protected]>
Co-authored-by: ethyi <[email protected]>
Co-authored-by: Richard Wu <[email protected]>
Co-authored-by: Austin Adams <[email protected]>
Co-authored-by: Alex Tsymbal <[email protected]>
muhitrhn pushed a commit to muhitrhn/digital-asset-rpc-infrastructure that referenced this issue Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants