Skip to content
Mark de Haan edited this page Sep 1, 2015 · 3 revisions

Loading parameters:

Parameter files are either *.csv files in which parameters are mentioned horizontally, and in which every line consists of a new combination of values. Or it contains java-style *.properties in which the parameters are mentioned below eachother, and every parameter has exactly one value.

Parameters can reference eachother, eg:

root=/gcc/rawdata ngsdata=${root}/ngs

Merging parameter files:

If parameters are shared between csv files then those files are merged on load time. If there are no shared parameters, then everything is combined with everything.

Parameter values are available during generation time:

Parameters origination from loaded *.csv and *.properties files have a value which is available during generation time. These values are added directly to any generated scripts.

Run-time parameters:

It is also possible to calculate parameters at runtime, and to pass to these on to the next step. You can specify this in the dependency-column. The syntax is; local_parameter=<stepname_parameter-from-another-step>

Parameters in a protocol:

Assuming your parameters have been loaded, you will have one big table containing all the parameters on the first line, and a new combination of values on every subsequent line. This looks similar to the loaded csv. These lines are called targets.

Folding:

For this example we have a worksheet.csv file

project, sample p1; s1p1 p1; s2p1 p2; s1p2

You want your protocol to do something for each project, so you start writing your protocol with:

#string project

The worksheet will then be folded as follows:

project, sample
p1; s1p1,s2p1
p2; s1p2

And for every target (line) your protocol will be generated, in this case two in total.

If you want to do something for each sample, you add the following to your protocol:

#string sample

The worksheet will stay unfolded, like the original, and three scripts will be generated. One for each sample.

If you want your protocol to do something for all samples in a given project:

#string project
#list sample

In your protocol, you can use #string variables as string, and #list variables as a bash array

If your protocol starts with:

#string project
#string sample

Then compute will fold the worksheet in such a manner that every new line will consist of a unique combination of project-sample. In this case you would end up with the original worksheet.

In the following example, you won't:

project, sample, barcode
p1; s1p1; b
p1; s2p1; b
p2; s1p2; b
p2; s1p2; c

This will be turned into:

p1; s1p1; b
p1; s2p1; b
p2; s1p2; b,c

The tricky bit is when you define two #list parameters in your protocol:

#string sample
#list project
#list barcode

In this example, you want to have one script for every sample (so three in total). How will the arrays for project and barcode look like?

  • Opinion 1: Arrays can differ in length, just fold the worksheet and be done with it.
  • Opinion 2: Arrays have to be the same length in which the values for each position between arrays have to match the order from the original worksheet.

De-folding in a protocol:

In some protocols you want the worksheet (which is also passed to the protocol as a parameter) to be unfolded, and to fold it again according to own insights to make iterating over them easier. This feature is no longer present in the current compute release.

Clone this wiki locally