-
Notifications
You must be signed in to change notification settings - Fork 3
The zen of ZIPPY
Wise, Aaron edited this page Feb 15, 2018
·
1 revision
Here we list some of the design principles for ZIPPY. The aim is to give you some intuition of how ZIPPY works.
- Managing the I/O handoff between modules
- ZIPPY modules should aim to present an output that conforms to standards, and has a broad degree of usefulness. For example, though not every bam can be used for every purpose, by outputting sorted, indexed bams, there is a degree of modularity that is possible.
- For complicated use cases, it may make sense for a stage to return a specific file under two different names in the get_output dictionary. For example, if your bam contains a special kind of annotation, you might want to return it as 'bam' as well as 'special_annotated_bam'. Then, if you create a stage that requires your annotated bam, you can ask for collect_input(sample, 'special_annotated_bam')
- Dependency is entirely local. Outputs and inputs are exchanged only between direct descendants. When a module collects its inputs, input will come only from its direct parents. Stages can inherit from more than one previous stage.
- Module design
- ZIPPY should contain named parameters for required arguments to a module. Non-required arguments should be passed-through ZIPPY using self.params.self.args. Parameters should not be hard-coded.
- Stages generally conform to two I/O patterns: 1-in-1-out (e.g., a bam goes in, and a realigned bam comes out) or many-in-1-out (e.g., many bams come in, and a merged bam comes out). The 'default' case is the 1/1 format. When designing merge stages, you should look to examples like the MergeBamRunner. In general, when creating a merge stage, you have to also override get_samples (since you are now returning a different set of samples than you took in) as well as get_dependencies (get_dependencies usually assumes that your dependencies are per-sample).