Skip to content

The zen of ZIPPY

Wise, Aaron edited this page Feb 15, 2018 · 1 revision

Here we list some of the design principles for ZIPPY. The aim is to give you some intuition of how ZIPPY works.

  • Managing the I/O handoff between modules
    • ZIPPY modules should aim to present an output that conforms to standards, and has a broad degree of usefulness. For example, though not every bam can be used for every purpose, by outputting sorted, indexed bams, there is a degree of modularity that is possible.
    • For complicated use cases, it may make sense for a stage to return a specific file under two different names in the get_output dictionary. For example, if your bam contains a special kind of annotation, you might want to return it as 'bam' as well as 'special_annotated_bam'. Then, if you create a stage that requires your annotated bam, you can ask for collect_input(sample, 'special_annotated_bam')
    • Dependency is entirely local. Outputs and inputs are exchanged only between direct descendants. When a module collects its inputs, input will come only from its direct parents. Stages can inherit from more than one previous stage.
  • Module design
    • ZIPPY should contain named parameters for required arguments to a module. Non-required arguments should be passed-through ZIPPY using self.params.self.args. Parameters should not be hard-coded.
    • Stages generally conform to two I/O patterns: 1-in-1-out (e.g., a bam goes in, and a realigned bam comes out) or many-in-1-out (e.g., many bams come in, and a merged bam comes out). The 'default' case is the 1/1 format. When designing merge stages, you should look to examples like the MergeBamRunner. In general, when creating a merge stage, you have to also override get_samples (since you are now returning a different set of samples than you took in) as well as get_dependencies (get_dependencies usually assumes that your dependencies are per-sample).
Clone this wiki locally