-
Notifications
You must be signed in to change notification settings - Fork 16
Home
Parameter files are either *.csv files in which parameters are mentioned horizontally, and in which every line consists of a new combination of values. Or it contains java-style *.properties in which the parameters are mentioned below eachother, and every parameter has exactly one value.
root=/gcc/rawdata ngsdata=${root}/ngs
If parameters are shared between csv files then those files are merged on load time. If there are no shared parameters, then everything is combined with everything.
Parameters origination from loaded *.csv and *.properties files have a value which is available during generation time. These values are added directly to any generated scripts.
It is also possible to calculate parameters at runtime, and to pass to these on to the next step. You can specify this in the dependency-column. The syntax is; local_parameter=<stepname_parameter-from-another-step>
Assuming your parameters have been loaded, you will have one big table containing all the parameters on the first line, and a new combination of values on every subsequent line. This looks similar to the loaded csv. These lines are called targets.
For this example we have a worksheet.csv file
project, sample p1; s1p1 p1; s2p1 p2; s1p2
You want your protocol to do something for each project, so you start writing your protocol with:
#string project
The worksheet will then be folded as follows:
project, sample
p1; s1p1,s2p1
p2; s1p2
And for every target (line) your protocol will be generated, in this case two in total.
If you want to do something for each sample, you add the following to your protocol:
#string sample
The worksheet will stay unfolded, like the original, and three scripts will be generated. One for each sample.
If you want your protocol to do something for all samples in a given project:
#string project
#list sample
In your protocol, you can use #string variables as string, and #list variables as a bash array
If your protocol starts with:
#string project
#string sample
Then compute will fold the worksheet in such a manner that every new line will consist of a unique combination of project-sample. In this case you would end up with the original worksheet.
In the following example, you won't:
project, sample, barcode
p1; s1p1; b
p1; s2p1; b
p2; s1p2; b
p2; s1p2; c
This will be turned into:
p1; s1p1; b
p1; s2p1; b
p2; s1p2; b,c
The tricky bit is when you define two #list parameters in your protocol:
#string sample
#list project
#list barcode
In this example, you want to have one script for every sample (so three in total). How will the arrays for project and barcode look like?
- Opinion 1: Arrays can differ in length, just fold the worksheet and be done with it.
- Opinion 2: Arrays have to be the same length in which the values for each position between arrays have to match the order from the original worksheet.
In some protocols you want the worksheet (which is also passed to the protocol as a parameter) to be unfolded, and to fold it again according to own insights to make iterating over them easier. This feature is no longer present in the current compute release.