Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

Database for workflow status #1056

Open
emthompson-usgs opened this issue Oct 31, 2022 · 2 comments
Open

Database for workflow status #1056

emthompson-usgs opened this issue Oct 31, 2022 · 2 comments
Labels
feature Major new feature

Comments

@emthompson-usgs
Copy link
Member

emthompson-usgs commented Oct 31, 2022

I'm thinking that a simple sqlite database to track the command history and status would be useful. I think that having a table for each of these subcommands:

  • download
  • assemble
  • process_waveforms
  • compute_station_metrics
  • compute_waveform_metrics
  • generate_report
  • generate_station_maps

And each subcommand table would have the following columns:

  • project
  • label
  • eqid
  • start_time
  • end_time
  • success
  • error_message

I think this should help keep track of when some events have had problems in projects with lots of events.

We could also add a subcommand to summarize the command status. One idea would be a table with rows for eqid and columns for each subcommand with cell values for the most recent end_time (empty if last run is not successful).

@emthompson-usgs emthompson-usgs added the feature Major new feature label Oct 31, 2022
@mhearne-usgs
Copy link
Member

You probably only really need one table, called status (or something). It would look just like your subcommand table except with the first column of "command". If this was going to be a large database (millions of records) then you would want to split out command into its own table, and put in a foreign key for the relevant command into your status table. I don't think this will have that many records.

@baagaard-usgs baagaard-usgs changed the title Status database Datatbase for workflow status Oct 31, 2022
@baagaard-usgs
Copy link
Collaborator

I strongly recommend that we consider using a workflow management tool for this feature rather than implementing something ourselves. This could either be an optional feature or something done by the user outside of gmprocess.

Apache Airflow seems like a good compromise between features and number+complexity of dependencies. It is pure Python and pip installable. It not only keeps track of the state of tasks, but it allows a user to visualize the workflow (tasks dependencies), monitor progress, and rerun failed pieces. From a user perspective, I would like to be able to construct the full workflow for compiling, processing, and analyzing the ground-motion records, which includes steps outside gmprocess.

@emthompson-usgs emthompson-usgs changed the title Datatbase for workflow status Database for workflow status Oct 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature Major new feature
Projects
None yet
Development

No branches or pull requests

3 participants