Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SENTINEL] provide snapshots for GDA release #285

Open
3 tasks
d10r opened this issue Feb 12, 2024 · 2 comments
Open
3 tasks

[SENTINEL] provide snapshots for GDA release #285

d10r opened this issue Feb 12, 2024 · 2 comments
Assignees
Labels
Project: SENTINEL Type: DevOps Improvements to the processes part of DevOps Type: Task A piece of work to be done

Comments

@d10r
Copy link
Collaborator

d10r commented Feb 12, 2024

What & Why

We provide DB snapshots for sentinels in order to facilitate sentinel operation with cheap or free rate limited RPC providers.
The GDA release came with a breaking change of the DB schema.

Since newly bootstrapping sentinels fetch the latest manifest.json (which contains IPFS hashes of the latest DB snapshots), we couldn't just switch the snapshot backend to the latest version.

Once the GDA sentinel is released, compatible snapshots shall be in place.
They were not released before because that would have broken the fast-sync for pre-GDA sentinels (which is still the latest public release).

Acceptance Criteria

  • A newly bootstrapped sentinel for an officially supported mainnet syncs using a snapshot not older than 1 month
  • The new manifest.json is provided by an automatically created PR
  • some kind of improvement of the process (to be documented what) - see list of current "smells" below

How

Status quo

The backend for snapshot generation is set up at [email protected].
Before being disabled, it was monthly triggered by a cronjob:

25 22 18 * *  . $NVM_DIR/nvm.sh; ./generate-all.sh -cgup >> "logs/snapshotgen_`date '+\%Y-\%m-\%d'`.log" 2>&1;

The bash script generate-all.sh is currently not regular part of any repo (but covered by the backups for ad-hoc code here - it contains credentials, thus do NOT add to any public repo as is!)

Running this script does:

  • clean: delete the content of snapshots/
  • get the latest sentinel version from master branch
  • run buildSnapshot.js for all networks listed in networks (format: non-canonical name (used only for logging), rpc). This creates a file snapshots_<chainId>_<timestamp>.sqlite.gz for each network
  • run ipfs add for each generated snapshot file, using the SF IPFS node, writing the log to ipfs_log.txt
  • run generateManifest.js, which parses the IPFS cid of each snapshot file just added and updates a copy of the latest manifest.json to this cid's.
  • create a PR in the sentinel repo, with the new manifest.json (example).

Smells

Since we're touching the process, there's an opportunity to look at what's not great about it and do some incremental improvements. Some of the smells:

  • uses a mix of scripts included in the sentinel repo and not included anywhere (other than in backups)
  • process not documented anywhere (I think)
  • low observability (all we see is PRs being created by the bot - or the lack thereof)
  • file networks to be manually maintained
  • no validation of snapshot correctness (how much do we trust what buildSnapshot.js does and what happens between that and the IPFS upload)
  • no check of content integrity after download, since it's using an http IPFS gateway instead of a native IPFS client
@d10r d10r added Project: SENTINEL Type: Task A piece of work to be done labels Feb 12, 2024
@mmd-afegbua mmd-afegbua added the Type: DevOps Improvements to the processes part of DevOps label Feb 14, 2024
@mmd-afegbua
Copy link
Collaborator

@d10r any update on the IPFS server issue on Hetzner?

Also, considering the number quota for third party providers, how often should we run the snapshot script?

@d10r
Copy link
Collaborator Author

d10r commented Feb 29, 2024

IPFS issue is solved.
We had the cronjob monthly, would keep it at that. However in the future we shouldn't re-run all if it fails for any network. Instead skip that network.

@hellwolf hellwolf transferred this issue from another repository Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Project: SENTINEL Type: DevOps Improvements to the processes part of DevOps Type: Task A piece of work to be done
Projects
None yet
Development

No branches or pull requests

2 participants