Spate is a lightweight Python-based file processing workflow composition and visualization toolkit. It offers an API to create and export workflows to various popular execution engines, as well as shells and job schedulers.
The purpose of Spate is to offer a simple way to design a dataflow instance (i.e., file-based data processing steps, and how they depend on each others) and export it to one or more execution resources (i.e., hardware and software environments) where it will run. Add jobs, remove jobs, merge workflows; Spate will take care of the details.
Spate can export the whole workflow, or only the steps that need to be executed. Spate checks input and output files for modification time, and iteratively identify all steps that need to be run or updated. No more worries about re-running a whole time-consuming workflow when you update few files; Spate will ensure that only the affected steps are executed.
This workflow will take two input files, file_A
and file_B
, and process them to create an output file file_C
by running the Unix cat
command:
import spate
# create a new empty workflow
workflow = spate.new_workflow("my_workflow")
# add a simple job; here a Unix shell
# command to merge two files into one
workflow.add_job(
inputs = ("file_A", "file_B"), # two input files
outputs = "file_C", # one output file
content = """
cat {{INPUT0}} {{INPUT1}} > {{OUTPUT}}
""")
# export this workflow as a shell script; to run
# it, type `./my_workflow.sh` the the shell
spate.to_shell_script(workflow, "my_workflow.sh")
# export this workflow as a SLURM sbatch script; to run it,
# type `sbatch my_workflow.slurm` on a SLURM-enabled cluster
spate.to_slurm(workflow, "my_workflow.slurm")
see
doc/
for additional examples
Spate is hosted both on PyPI and GitHub, thus offering two options to install it:
- If using PyPI (recommended to get the latest release) and the pip package manager, type
pip install spate
on the command line. - If using GitHub (in case you want one of the development version) you can download the archive of one of the releases or commits and run
pip install <archive>
.
Environment | Status | Comment |
---|---|---|
Unix shell scripts | Stable | No parallel job execution; jobs are processed sequencially. |
Make files | Stable | |
Makeflow scripts | Experimental | Not fully tested. |
Drake scripts | Experimental | Not fully tested. |
SLURM job scheduler | Stable | Produces sbatch scripts with directed acyclic graph of jobs. Tested with SLURM 14.11.11. |
TORQUE/PBS job scheduler | Experimental | Produces job arrays; does not handle job dependencies. Currently untested. |