-
Notifications
You must be signed in to change notification settings - Fork 6
JSON Job Templates
This is a short introduction to creating and using job templates. Job templates are typically called job.json.tmpl
and are used to create a set of jobs that are written to another json file, usually called jobs.json
. The templates are written using jinja2 syntax.
In hps-mc, components of a job are executed as specified by the job script and the job parameters that are set in using the job(s).json
files. Job parameters may include the input_files
, output_files
, seed
, and many more. The three components needed to create a set of job parameters are
-
job.json.tmpl
: the job (parameter) template -
vars.json
: a file containing all the relevant parameter values -
mkjobs.sh
: a script that creates thejobs.json
file
- Figure out which job parameters are needed by the job script you want to run. Usually, these are (not a complete list):
input_files
output_files
output_dir
detector
run_params
run_number
seed
- Write a basic template (or copy one that is similar to what you need).
Everything in the double brackets
{ "run_params": "{{ run_params }}", "run_number": {{ run_number }}, "seed": {{ job_id + 1234 }}, "detector": "{{ detector }}", "input_files": {"some/path/to/input/{{ run_params }}/file_{{ job_id }}.stdhep": "input1.stdhep", "some/path/to/input/{{ run_params }}/file_{{ job_id+1 }}.stdhep": "input2.stdhep"}, "output_files": {"output_file.stdhep": "output_{{ run_params }}_{{ job_id }}.stdhep"}, "output_dir": "some/path/to/output/{{ run_params }}/{{ detector }}/" }
{{ }}
is a variable that will be replaced by a value specified invars.json
when you run themkjobs.json
script. Thejob_id
variable is a running number starting at 1 for the first job and increasing by +1 for each job. Variables that are strings are written like this"{{ string }}"
. You can perform basic math operations with variables that are numbers. - Specify the values of the variables in
vars.json
.{ "run_params": ["1pt1"], "run_number": [1], "detector": ["someDetector"] }
- Run
mkjobs.sh
. For this, you may want to change the number of jobs to be written in the script:hps-mc-job-template -j 1 -r <number_of_jobs> -a vars.json job.json.tmpl jobs.json
Withnumber_of_jobs=2
, we havehps-mc-job-template -j 1 -r 2 -a vars.json job.json.tmpl jobs.json
which produces the followingjobs.json
file:{ "run_params": "1pt1", "run_number": 1, "seed": 1235, "detector": "someDetector", "input_files": {"some/path/to/input/1pt1/file_1.stdhep": "input1.stdhep", "some/path/to/input/1pt1/file_2.stdhep": "input2.stdhep"}, "output_files": {"output_file.stdhep": "output_1pt1_1.stdhep"}, "output_dir": "some/path/to/output/1pt1/someDetector/", "job_id": 1 }, { "run_params": "1pt1", "run_number": 1, "seed": 1236, "detector": "someDetector", "input_files": {"some/path/to/input/1pt1/file_2.stdhep": "input1.stdhep", "some/path/to/input/1pt1/file_3.stdhep": "input2.stdhep"}, "output_files": {"output_file.stdhep": "output_1pt1_2.stdhep"}, "output_dir": "some/path/to/output/1pt1/someDetector/", "job_id": 2 }
If you have several jobs that differ in some but not all job parameters, you might want to employ some jinja magic to create the jobs.json
file.
-
Let's assume that we want to run jobs with different run parameters where, depending on
run_param
thedetector
and therun_number
have to be changed. This can be done by defining additional variables at the beginning of thejob.json.tmpl
file.{% set detector = {"1pt1":"1pt1GeV_detector", "2pt2":"2pt2GeV_detector", "4pt4":"4pt4GeV_detector"} %} {% set run_number = {"1pt1":1, "2pt2":2, "4pt4":4} %} { "run_params": "{{ run_params }}", "run_number": {{ run_number[run_params] }}, "seed": {{ job_id%njobs + 1234 }}, "detector": "{{ detector[run_params] }}", "input_files": {"some/path/to/input/{{ run_params }}/file_{{ job_id%njobs }}.stdhep": "input1.stdhep", "some/path/to/input/{{ run_params }}/file_{{ job_id%njobs + 1 }}.stdhep": "input2.stdhep"}, "output_files": {"output_file.stdhep": "output_{{ run_params }}_{{ job_id%njobs }}.stdhep"}, "output_dir": "some/path/to/output/{{ run_params }}/{{ detector[run_params] }}/" }
Now, the
vars.json
file can look something like this:{ "run_params":["1pt1","4pt4"], "njobs":[2] }
When running
hps-mc-job-template -j 1 -r 2 -a vars.json job.json.tmpl jobs.json
, this will produce the followingjobs.json
.{ "run_params": "1pt1", "run_number": 1, "seed": 1235, "detector": "1pt1GeV_detector", "input_files": {"some/path/to/input/1pt1/file_1.stdhep": "input1.stdhep", "some/path/to/input/1pt1/file_2.stdhep": "input2.stdhep"}, "output_files": {"output_file.stdhep": "output_1pt1_1.stdhep"}, "output_dir": "some/path/to/output/1pt1/1pt1GeV_detector/", "job_id": 1 }, { "run_params": "1pt1", "run_number": 1, "seed": 1236, "detector": "1pt1GeV_detector", "input_files": {"some/path/to/input/1pt1/file_2.stdhep": "input1.stdhep", "some/path/to/input/1pt1/file_3.stdhep": "input2.stdhep"}, "output_files": {"output_file.stdhep": "output_1pt1_2.stdhep"}, "output_dir": "some/path/to/output/1pt1/1pt1GeV_detector/", "job_id": 2 }, { "run_params": "4pt4", "run_number": 4, "seed": 1235, "detector": "4pt4GeV_detector", "input_files": {"some/path/to/input/4pt4/file_1.stdhep": "input1.stdhep", "some/path/to/input/4pt4/file_2.stdhep": "input2.stdhep"}, "output_files": {"output_file.stdhep": "output_4pt4_1.stdhep"}, "output_dir": "some/path/to/output/4pt4/4pt4GeV_detector/", "job_id": 3 }, { "run_params": "4pt4", "run_number": 4, "seed": 1236, "detector": "4pt4GeV_detector", "input_files": {"some/path/to/input/4pt4/file_2.stdhep": "input1.stdhep", "some/path/to/input/4pt4/file_3.stdhep": "input2.stdhep"}, "output_files": {"output_file.stdhep": "output_4pt4_2.stdhep"}, "output_dir": "some/path/to/output/4pt4/4pt4GeV_detector/", "job_id": 4 }
-
It is also possible, to include more advanced inline calculations to determine job parameters like particle masses. An example
job.json.tmpl
for this can look like this:{ ... "masstotal": {% set masstotal = "%0.2f"%(mass1*mass2/mass3) %}{{masstotal|float}}, "mass1": {{ mass1 }}, ... }
With this
vars.json
{ ... "mass1": [111,222], "mass2": [2], "mass3": [2000], ... }
and our usual command to write the jobs to
jobs.json
we get:{ ... "masstotal": 0.11, "mass1": 111, ... }, ... { ... "masstotal": 0.22, "mass1": 222, ... }
-
We can also use a text file with a list of input files, e.g.
input_file_list.txt
, to generate ajobs.json
file. For example,input_file_list.txt
might look something like this:/path/to/input/files/detector1/someName_1.stdhep /path/to/input/files/detector1/someName_2.stdhep /path/to/input/files/detector2/someName_3.stdhep
Here, all input files are of the format
/path/to/input/files/detectorName/fileName_runNumber.stdhep
which we can use in our job template that will look something like this:{ {% set splitPath = input_files['data'][0].split('/') %} {% set name_no_ext = splitPath[6].split('.')[0] %} {% set run_number = name_no_ext.split('_')[1]|int %} {% set detector = splitPath[5] %} "input_files": { "{{ input_files['data'][0] }}": "input.stdhep" }, "output_files": { "output_file.root": "{{ name_no_ext }}.root" }, "output_dir": "output", "detector": "{{ detector }}", "run_number": {{ run_number }} }
We can then generate the jobs by running
hps-mc-job-template -i <variable_name> <file_name> <n_reads> -j <job_start> <template_files> <output_file>
which becomes
hps-mc-job-template -i data input_file_list.txt 1 -j 1 job.json.tmpl jobs.json
inour case. Finally,
jobs.json
then should read{ "input_files": { "/path/to/input/files/someName_1.stdhep": "input.stdhep" }, "output_files": { "output_file.root": "someName_1.root" }, "output_dir": "output", "detector": "detector1", "run_number": 1 }, { "input_files": { "/path/to/input/files/someName_2.stdhep": "input.stdhep" }, "output_files": { "output_file.root": "someName_2.root" }, "output_dir": "output", "detector": "detector1", "run_number": 2 }, { "input_files": { "/path/to/input/files/someName_3.stdhep": "input.stdhep" }, "output_files": { "output_file.root": "someName_3.root" }, "output_dir": "output", "detector": "detector2", "run_number": 3 }