Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduled data load workflows / source data cache? #467

Open
smnorris opened this issue Mar 6, 2024 · 1 comment
Open

scheduled data load workflows / source data cache? #467

smnorris opened this issue Mar 6, 2024 · 1 comment

Comments

@smnorris
Copy link
Owner

smnorris commented Mar 6, 2024

Current workflows should be fine for replicating to production env.

For testing, lots of options:

a. run same scheduled loads on test as on prod
b. do not run scheduled loads, test on old data
c. run scheduled loads quarterly or similar
d. do not run scheduled loads from sources, replicate from prod on schedule
e. do not run scheduled loads to test, access latest data in prod via fdw

Seems easiest to start with option b, then move to d/e at a later date.
Only caveat is that CABD and PSCIS should be refreshed on test when bcfishpass is run.

@smnorris
Copy link
Owner Author

With (at least) 4 different databases to replicate to, an intermediate cache of key inputs would reduce load on WFS and make data refreshes generally much faster. BC is now providing an object storage bucket, we could create a workflow/job that dumps these layers to file, to be picked up as needed by scheduled loads to the database:

  • CABD (not really an issue, but a cache is still more efficient)
  • PSCIS assessments/confirmations/etc
  • DRA (pending resolution of DRA load failing #496 , DRA is already available as file)
  • FTEN roads
  • FISS observations - or (even better) use the file based bcfishobs output to load to bcfishpass databases

In theory, we could get even more efficient - access models output to file from the BC db could be fed into the CWF dbs rather than re-processing.

@smnorris smnorris changed the title scheduled data load workflows scheduled data load workflows / source data cache? Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant