Consider ingest/updates via changes (delta) #79

dr-rodriguez · 2024-08-13T14:03:12Z

Currently, updating the database is the same as creating a new one- all records are deleted and re-ingested. This works well to ensure that objects that are deleted are properly handled. However, once the database reaches a certain size this can be an expensive operation.

Instead, we may want to figure out how to handle a delta-style ingest, that is, only process those JSON documents that have been updated. This may be tricky and may require several iterations.
I do think for purposes of Production-level databases and testing one may want to build the entire database, so this is more thinking about the development aspect or user's local copies were they may not want/need a production-ready instance. Or for instances were the production-level database is so large we only want to apply deltas.

I can think of several aspects we may want to check out:

Perform no deletions, only insert JSON files that have been produced. This is already supported, but when saving the database by default this saves all JSON output. It's also not clear if we'd have foreign key violations, particularly if reference tables have been updated.
Use git diff to determine which JSON documents have changed. Do not delete or change anything else. This requires git installed and for the data to be version controlled, both likely true in development situations.
Figure out a way to export only records that have changed when saving the database. I do not know if there is a way to capture all changes to the DB since a connection was made. It might be architecture-dependent, but it starts getting into DB data migration tools of which several exist but our DB/Tool architecture wasn't built with them in mind.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider ingest/updates via changes (delta) #79

Consider ingest/updates via changes (delta) #79

dr-rodriguez commented Aug 13, 2024

Consider ingest/updates via changes (delta) #79

Consider ingest/updates via changes (delta) #79

Comments

dr-rodriguez commented Aug 13, 2024