Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated ingestion to Staging and Production Catalogs #180

Open
1 task
smohiudd opened this issue Oct 28, 2024 · 2 comments
Open
1 task

Automated ingestion to Staging and Production Catalogs #180

smohiudd opened this issue Oct 28, 2024 · 2 comments
Assignees

Comments

@smohiudd
Copy link
Contributor

smohiudd commented Oct 28, 2024

Description

Based on previous work and discussion around the dataset publication and promotion process, this was the agreed workflow:
Screenshot 2024-10-30 at 1 30 03 PM

A dataset config PR would kick off a staging ingest; approval of the PR would automate the ingestion to the production catalog.

There is an existing PR that implements most of this process however it also includes creating a PR in veda-config. To create a more modular data promotion process, we need to break apart the ingestion automation the veda-config PR creation. This ticket involves only automating ingestion into the staging and production catalogs and not around creating PRs in veda-config.

Acceptance Criteria

  • Include Github workflow action that publishes to Staging and Production catalogs
@anayeaye
Copy link
Contributor

This is my understanding of the events we want to automate, I'm dropping the notes here for discussion but probably this belongs in more of a design issue. I think automating these actions could be tackled in sequential tasks

veda-data auto publish baseline:

When a PR is opened:

  1. if that PR contains a file in ingestion-data/staging/collections: call the ingest api and publish that collection to staging
  2. if that pr contains a file in ingestion-data/staging/dataset-config: call the workflows api and publish that dataset

When a PR is approved and merged:

  1. if the PR was a new collection only ingest, add the collection json to ingestion-data/production/collections and post the collection to production ingest api
  2. if the PR was a dataset config json,
    • modify the config for the production bucket
    • trigger a transfer
    • add the new dataset-config to ingestion-data/production/dataset-config
    • publish to the production workflows API

veda-data auto publish round 2: dashboard additions

These additional steps will be added to what was established in the baseline action.

When a PR is opened:

  1. a dataset.mdx file is generated from the collection metadata with temp stac api and titiler api overrides for staging (in .data.mdx)
  2. a pr is opened in veda config generating a dashboard preview

When a PR is approved and merged:

  1. the temp staging urls are removed from the .data.mdx in the veda-config PR
  2. end. the veda-config PR approval and merge is outside of the veda-data action's scope

@smohiudd
Copy link
Contributor Author

if the PR was a dataset config json,
modify the config for the production bucket
trigger a transfer
add the new dataset-config to ingestion-data/production/dataset-config
publish to the production workflows API

For these ingestion steps in production, we'll need to tie the transfer DAG to the dataset ingest since the transfer needs to be complete before item ingest can happen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants