Skip to content

JSON Job Templates

sarahgaiser edited this page Jan 24, 2023 · 3 revisions

This is a short introduction to creating and using job templates. Job templates are typically called job.json.tmpl and are used to create a set of jobs that are written to another json file, usually called jobs.json. The templates are written using jinja2 syntax.

General structure

In hps-mc, components of a job are executed as specified by the job script and the job parameters that are set in using the job(s).json files. Job parameters may include the input_files, output_files, seed, and many more. The three components needed to create a set of job parameters are

  • job.json.tmpl: the job (parameter) template
  • vars.json: a file containing all the relevant parameter values
  • mkjobs.sh: a script that creates the jobs.json file

Writing and using a template

Basic example

  1. Figure out which job parameters are needed by the job script you want to run. Usually, these are (not a complete list):
    • input_files
    • output_files
    • output_dir
    • detector
    • run_params
    • run_number
    • seed
  2. Write a basic template (or copy one that is similar to what you need).
    {
        "run_params": "{{ run_params }}",
        "run_number": {{ run_number }},
        "seed": {{ job_id + 1234 }},
        "detector": "{{ detector }}",
        "input_files": {"some/path/to/input/{{ run_params }}/file_{{ job_id }}.stdhep": "input1.stdhep",
                        "some/path/to/input/{{ run_params }}/file_{{ job_id+1 }}.stdhep": "input2.stdhep"},
        "output_files": {"output_file.stdhep": "output_{{ run_params }}_{{ job_id }}.stdhep"},
        "output_dir": "some/path/to/output/{{ run_params }}/{{ detector }}/"
    }
    
    Everything in the double brackets {{ }} is a variable that will be replaced by a value specified in vars.json when you run the mkjobs.json script. The job_id variable is a running number starting at 1 for the first job and increasing by +1 for each job. Variables that are strings are written like this "{{ string }}". You can perform basic math operations with variables that are numbers.
  3. Specify the values of the variables in vars.json.
    {
        "run_params": ["1pt1"],
        "run_number": [1],
        "detector": ["someDetector"]
    }
    
  4. Run mkjobs.sh. For this, you may want to change the number of jobs to be written in the script: hps-mc-job-template -j 1 -r <number_of_jobs> -a vars.json job.json.tmpl jobs.json With number_of_jobs=2, we have hps-mc-job-template -j 1 -r 2 -a vars.json job.json.tmpl jobs.json which produces the following jobs.json file:
    {
        "run_params": "1pt1",
        "run_number": 1,
        "seed": 1235,
        "detector": "someDetector",
        "input_files": {"some/path/to/input/1pt1/file_1.stdhep": "input1.stdhep",
                        "some/path/to/input/1pt1/file_2.stdhep": "input2.stdhep"},
        "output_files": {"output_file.stdhep": "output_1pt1_1.stdhep"},
        "output_dir": "some/path/to/output/1pt1/someDetector/",
        "job_id": 1
    },
    {
        "run_params": "1pt1",
        "run_number": 1,
        "seed": 1236,
        "detector": "someDetector",
        "input_files": {"some/path/to/input/1pt1/file_2.stdhep": "input1.stdhep",
                        "some/path/to/input/1pt1/file_3.stdhep": "input2.stdhep"},
        "output_files": {"output_file.stdhep": "output_1pt1_2.stdhep"},
        "output_dir": "some/path/to/output/1pt1/someDetector/",
        "job_id": 2
    }
    

Advanced examples

If you have several jobs that differ in some but not all job parameters, you might want to employ some jinja magic to create the jobs.json file.

  1. Let's assume that we want to run jobs with different run parameters where, depending on run_param the detector and the run_number have to be changed. This can be done by defining additional variables at the beginning of the job.json.tmpl file.

    {% set detector = {"1pt1":"1pt1GeV_detector",
                       "2pt2":"2pt2GeV_detector",
                       "4pt4":"4pt4GeV_detector"}
    %}
    
    {% set run_number = {"1pt1":1,
                         "2pt2":2,
                         "4pt4":4}
    %}
    
    {
        "run_params": "{{ run_params }}",
        "run_number": {{ run_number[run_params] }},
        "seed": {{ job_id%njobs + 1234 }},
        "detector": "{{ detector[run_params] }}",
        "input_files": {"some/path/to/input/{{ run_params }}/file_{{ job_id%njobs }}.stdhep": "input1.stdhep",
                        "some/path/to/input/{{ run_params }}/file_{{ job_id%njobs + 1 }}.stdhep": "input2.stdhep"},
        "output_files": {"output_file.stdhep": "output_{{ run_params }}_{{ job_id%njobs }}.stdhep"},
        "output_dir": "some/path/to/output/{{ run_params }}/{{ detector[run_params] }}/"
    }
    

    Now, the vars.json file can look something like this:

    {
        "run_params":["1pt1","4pt4"],
        "njobs":[2]
    }
    

    When running hps-mc-job-template -j 1 -r 2 -a vars.json job.json.tmpl jobs.json, this will produce the following jobs.json.

    {
        "run_params": "1pt1",
        "run_number": 1,
        "seed": 1235,
        "detector": "1pt1GeV_detector",
        "input_files": {"some/path/to/input/1pt1/file_1.stdhep": "input1.stdhep",
                        "some/path/to/input/1pt1/file_2.stdhep": "input2.stdhep"},
        "output_files": {"output_file.stdhep": "output_1pt1_1.stdhep"},
        "output_dir": "some/path/to/output/1pt1/1pt1GeV_detector/",
        "job_id": 1
    },
    {
        "run_params": "1pt1",
        "run_number": 1,
        "seed": 1236,
        "detector": "1pt1GeV_detector",
        "input_files": {"some/path/to/input/1pt1/file_2.stdhep": "input1.stdhep",
                        "some/path/to/input/1pt1/file_3.stdhep": "input2.stdhep"},
        "output_files": {"output_file.stdhep": "output_1pt1_2.stdhep"},
        "output_dir": "some/path/to/output/1pt1/1pt1GeV_detector/",
        "job_id": 2
    },
    {
        "run_params": "4pt4",
        "run_number": 4,
        "seed": 1235,
        "detector": "4pt4GeV_detector",
        "input_files": {"some/path/to/input/4pt4/file_1.stdhep": "input1.stdhep",
                        "some/path/to/input/4pt4/file_2.stdhep": "input2.stdhep"},
        "output_files": {"output_file.stdhep": "output_4pt4_1.stdhep"},
        "output_dir": "some/path/to/output/4pt4/4pt4GeV_detector/",
        "job_id": 3
    },
    {
        "run_params": "4pt4",
        "run_number": 4,
        "seed": 1236,
        "detector": "4pt4GeV_detector",
        "input_files": {"some/path/to/input/4pt4/file_2.stdhep": "input1.stdhep",
                        "some/path/to/input/4pt4/file_3.stdhep": "input2.stdhep"},
        "output_files": {"output_file.stdhep": "output_4pt4_2.stdhep"},
        "output_dir": "some/path/to/output/4pt4/4pt4GeV_detector/",
        "job_id": 4
    }
    
  2. It is also possible, to include more advanced inline calculations to determine job parameters like particle masses. An example job.json.tmpl for this can look like this:

    {
        ...
        "masstotal": {% set masstotal = "%0.2f"%(mass1*mass2/mass3) %}{{masstotal|float}},
        "mass1": {{ mass1 }},
        ...
    }
    

    With this vars.json

    {
        ...
        "mass1": [111,222],
        "mass2": [2],
        "mass3": [2000],
        ...
    }
    

    and our usual command to write the jobs to jobs.json we get:

    {
        ...
        "masstotal": 0.11,
        "mass1": 111,
        ...
    },
    ...
    {
        ...
        "masstotal": 0.22,
        "mass1": 222,
        ...
    }
    
  3. We can also use a text file with a list of input files, e.g. input_file_list.txt, to generate a jobs.json file. For example, input_file_list.txt might look something like this:

        /path/to/input/files/detector1/someName_1.stdhep
        /path/to/input/files/detector1/someName_2.stdhep
        /path/to/input/files/detector2/someName_3.stdhep
    

    Here, all input files are of the format /path/to/input/files/detectorName/fileName_runNumber.stdhep which we can use in our job template that will look something like this:

    {
        {% set splitPath = input_files['data'][0].split('/') %}
        {% set name_no_ext = splitPath[6].split('.')[0] %}
        {% set run_number = name_no_ext.split('_')[1]|int %}
        {% set detector = splitPath[5] %}
    
        "input_files": {
            "{{ input_files['data'][0] }}": "input.stdhep"
        },
        "output_files": {
            "output_file.root": "{{ name_no_ext }}.root"
        },
        "output_dir": "output",
        "detector": "{{ detector }}",
        "run_number": {{ run_number }}
    }
    

    We can then generate the jobs by running

    hps-mc-job-template -i <variable_name> <file_name> <n_reads> -j <job_start> <template_files> <output_file>
    

    which becomes

    hps-mc-job-template -i data input_file_list.txt 1 -j 1 job.json.tmpl jobs.json
    

    inour case. Finally, jobs.json then should read

    {
        "input_files": {
            "/path/to/input/files/someName_1.stdhep": "input.stdhep"
        },
        "output_files": {
            "output_file.root": "someName_1.root"
        },
        "output_dir": "output",
        "detector": "detector1",
        "run_number": 1
    },
    {
        "input_files": {
            "/path/to/input/files/someName_2.stdhep": "input.stdhep"
        },
        "output_files": {
            "output_file.root": "someName_2.root"
        },
        "output_dir": "output",
        "detector": "detector1",
        "run_number": 2
    },
    {
        "input_files": {
            "/path/to/input/files/someName_3.stdhep": "input.stdhep"
        },
        "output_files": {
            "output_file.root": "someName_3.root"
        },
        "output_dir": "output",
        "detector": "detector2",
        "run_number": 3
    }
    
Clone this wiki locally