KingMaker is the workflow management for producing ntuples with the CROWN framework. The workflow management is based on law, which is using luigi as backend.
Check out this forked repo with lxplus setup
git clone --recursive [email protected]:xiaohu-cern/KingMaker.git
cd KingMaker
source setup-ca.sh
source setup-lxplus.sh KingMaker
Modifications are made on top the KIT setup, mainly including
- initialize a x509 proxy file at the current directory for easy passing to condor later, see
setup-ca.sh
- adapt the env setup for lxplus in
setup-lxplus.sh
(start luigi by listening to port 8082, patch CROWN/CMakeLists.txt for fixing the git version issue of GitPython, etc.) - direct all output to cern eos in
lawluigi_configs/KingMaker_lxplus_law.cfg
- setup cern condor properly with
lawluigi_configs/KingMaker_lxplus_luigi.cfg
- make the whole law workflow model in
processor-lxplus/
for lxplus env
AT the first time, setup-lxplus.sh
will install miniconda and relevant libs required in KingMaker_env.yml
in the current directory. It will also create and activate a new virtual env called KingMaker
in this case. The process will take tens of minutes.
The script will clone CROWN from zhiyuanlcern's fork.
After checking out the repo and sourcing setup scripts, you will need to modify the paths specified in lawluigi_configs/KingMaker_lxplus_law.cfg
and lawluigi_configs/KingMaker_lxplus_luigi.cfg
to your own path.
Note: the default settings KingMaker_lxplus_luigi.cfg
will run for the mt and tt final states with no systematics.
This step makes sure you get the neccessary tau analysis package.
cd CROWN
source init.sh tau
Normally, KingMaker
can be run by running the ProduceSamples
task. This is done using e.g.
law run ProduceSamples --local-scheduler False --analysis tau --config config --sample-list samples.txt --workers 1 --production-tag TestingCROWN
The required paramters for the task are:
--local-scheduler False
- With this setting, the luigid scheduler is used--analysis tau
- The CROWN config to be used (in this case,tau
analysis)--sample-list samples.txt
- path to a txt file, which contains a list of nicks to be processed--production-tag TestingCROWN
- tag for your submission, changing tag will trigger building the project
Note:
Additionally, some optional paramters are beneficial:
--workers 1
- number of workers, currently, this number should not be larger than the number of tarballs to be built--print-status 2
- print the current status of the task--remove-output 2
- remove all output files--CROWNRun-workflow local
- run everything local instead of using HTCondor
Normally, KingMaker will automatically retry failed jobs. In case of errors persisting due to root file not accessible (server glitches, or cms.infn server issuses), you can change the problematic files to cms.fnal or comment it out in corresponding yaml file.
The output log files can be found at:/afs/cern.ch/user/your-user-path/KingMaker/data/jobs/
.
The output root files can be found at: /eos/user/your-user-path/CROWN/ntuples/your-submission-tag/CROWNRun/2018/your-sample-nick/your-final-state
.
The Samples are tracked and handled via nicks. For each sample, a unique nick has to be used. A collection of all samples and their settings is stored in the datasets.yaml
file found in the sample_database
folder. Additionally, a nick.yaml
is generated for each individual sample, which contains all sample settings and a filelist of all .root
files belonging to this sample.
campaign: RunIISummer20UL18NanoAODv2
datasetname: DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8
datatier: NANOAODSIM
dbs: /DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18NanoAODv2-106X_upgrade2018_realistic_v15_L1v1-v1/NANOAODSIM
energy: 13
era: 2018
extension: ''
filelist:
- root://cms-xrd-global.cern.ch///store/mc/RunIISummer20UL18NanoAODv2/DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v15_L1v1-v1/270000/D1972EE1-2627-2D4E-A809-32127A576CF2.root
- root://cms-xrd-global.cern.ch///store/mc/RunIISummer20UL18NanoAODv2/DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v15_L1v1-v1/50000/394775F4-CEDE-C34D-B56E-6C4839D7A027.root
- root://cms-xrd-global.cern.ch///store/mc/RunIISummer20UL18NanoAODv2/DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v15_L1v1-v1/50000/359B11BC-AE08-9B45-80A2-CC5EED138AB7.root
[....]
generators: amcatnloFXFX-pythia8
nevents: 85259315
nfiles: 85
nick: temp_nick
prepid: SMP-RunIISummer20UL18NanoAODv2-00030
sample_type: mc
status: VALID
version: 1
If a sample specific config is not available yet, ConfigureDatasets
will perform a DAS query to get a filelist for this sample.
Setting up KingMaker should be straight forward:
git clone --recursive [email protected]:KIT-CMS/KingMaker.git
cd KingMaker
source setup.sh <Analysis Name>
this should setup the environment specified in the luigi.cfg file (located at lawluigi_configs/<Analysis Name>_luigi.cfg
), which includes all needed packages.
The environment is sourced from the conda instance located at /cvmfs/etp.kit.edu/LAW_envs/conda_envs/miniconda/
if possible.
If the relevant environment is not available this way, the environment will be set up in a local conda instance.
The environment files are located at conda_environments/<Analysis Name>_env.cfg
.
In addition other files are installed dependding on the analysis.
A list of available analyses can be found in the setup.sh
skript or by running
source setup.sh -l
In addition a luigid
scheduler is also started if there isn't one running already.
When setting up an already cloned version, a
source setup.sh <Analysis Name>
Currently, the workflow of the KingMaker analysis consists of four distinct tasks:
- ProduceSamples The main task, which is used to steer the Production of multiple samples at once
- CROWNRun The task used to run CROWN with a specific file
- CROWNBuild This task is used to compile CROWN from source, and create a tarball, which is used by CROWNRun
- ConfigureDatasets This task is used to create NanoAOD filelists (if not existent) and readout the needed configuration parameters for each sample. This determines the CROWN tarball that is used for that job
Normally, KingMaker
can be run by running the ProduceSamples
task. This is done using e.g.
law run ProduceSamples --local-scheduler False --analysis config --sample-list samples.txt --workers 1 --production-tag TestingCROWN
The required paramters for the task are:
--local-scheduler False
- With this setting, the luigid scheduler is used--analysis config
- The CROWN config to be used--sample-list samples.txt
- path to a txt file, which contains a list of nicks to be processed--production-tag TestingCROWN
Additionally, some optional paramters are beneficial:
--workers 1
- number of workers, currently, this number should not be larger than the number of tarballs to be built--print-status 2
- print the current status of the task--remove-output 2
- remove all output files--CROWNRun-workflow local
- run everything local instead of using HTCondor
The Samples are tracked and handled via nicks. For each sample, a unique nick has to be used. A collection of all samples and their settings is stored in the datasets.yaml
file found in the sample_database
folder. Additionally, a nick.yaml
is generated for each individual sample, which contains all sample settings and a filelist of all .root
files belonging to this sample.
campaign: RunIISummer20UL18NanoAODv2
datasetname: DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8
datatier: NANOAODSIM
dbs: /DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer20UL18NanoAODv2-106X_upgrade2018_realistic_v15_L1v1-v1/NANOAODSIM
energy: 13
era: 2018
extension: ''
filelist:
- root://cms-xrd-global.cern.ch///store/mc/RunIISummer20UL18NanoAODv2/DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v15_L1v1-v1/270000/D1972EE1-2627-2D4E-A809-32127A576CF2.root
- root://cms-xrd-global.cern.ch///store/mc/RunIISummer20UL18NanoAODv2/DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v15_L1v1-v1/50000/394775F4-CEDE-C34D-B56E-6C4839D7A027.root
- root://cms-xrd-global.cern.ch///store/mc/RunIISummer20UL18NanoAODv2/DYJetsToLL_0J_TuneCP5_13TeV-amcatnloFXFX-pythia8/NANOAODSIM/106X_upgrade2018_realistic_v15_L1v1-v1/50000/359B11BC-AE08-9B45-80A2-CC5EED138AB7.root
[....]
generators: amcatnloFXFX-pythia8
nevents: 85259315
nfiles: 85
nick: temp_nick
prepid: SMP-RunIISummer20UL18NanoAODv2-00030
sample_type: mc
status: VALID
version: 1
If a sample specific config is not available yet, ConfigureDatasets
will perform a DAS query to get a filelist for this sample.
The default configuration provided in the repository should work out of the box. However some parameters might be changed. The configuration is spread across two files lawluigi_configs/KingMaker_luigi.cfg
and lawluigi_configs/KingMaker_law.cfg
. The HTCondor setting can also be found there.
The ML_train workflow currently contains a number of the tasks necessary for the htt-ml NMSSM
analysis. Non-NMSSM
analyses are currently not yet supported. The workflow uses the PuppetMaster
task to speed up status queries. It should also be noted, that all created files are stored in remote storage and might be subject to file caching under certain circumstances.
The tasks are:
- CreateTrainingDataShardConfig
Task that creates configuration files from which the process-datasets for the machine learning tasks are created. Uses the write_datashard_config script.
Some aspects of the process-datasets depend on the decay channel. For this reason the datasets for different channels are handled in seperate tasks. Although not specifically necessary, the different run eras are also handled in seperate tasks. - CreateTrainingDataShard
Remote workflow task that creates the process-datasets for the machine learning tasks from the config files created by the
CreateTrainingDataShardConfig
Task. The task uses thentuples
andfriend trees
described in the Sample setup. These dependencies are currently not checked by LAW. Also uses the create_training_datashard script.
The task branches each return a root file that consists of only one fold of one process of one run era and one decay channel. These files can then be used for the machine learning tasks. Some aspects of the process-datasets depend on the decay channel. For this reason the datasets for different channels are handled in seperate tasks. Although not specifically necessary, the different run eras are also handled in seperate tasks. - CreateTrainingConfig
Task that creates configuration files used for the machine learning tasks. Uses the write_datashard_config and the create_combined_config scripts.
The resulting config files contain information about which processes are used in the training, the combined event weights of the classes, and multiple hyperparameters that influence the training process. The training task is able to merge the datasets of the different run eras for a single training. For this reason the config file for such a training is a combination of the config files of the different run eras. At this point the seperate run era tasks are merged and only the different decay channels are still treated as seperate tasks. - RunTraining
Remote workflow task that performs the neural network training using GPU resources if possible. Uses the keras_training script. The hyperparameters of this training are provided by the config files of the
CreateTrainingConfig
task. The get_processes script is used to select the signal and background processes for the individual training groups. The script can currntly only handle theNMSSM
groups.
Each branch task returns a set of files for one fold of one training group of one decay channel. Each set includes the trained.h5
model, the preprocessing object as a.pickle
file and a graph of the loss as a.pdf
and.png
. This task can run all trainings in parallel without splitting them into seperate tasks, so the tasks of the different decay channels are merged here. - RunAllTrainings Task to run all possible trainings.
Normally, the ML_train
workflow can be run by running the RunTraining
task. This is done using e.g.
law run RunTraining --workers 2
There are a number of parameters to be set in the luigi and law config files:
- The run era. Can be either
2016
,2017
,2018
orall_eras
. - The decay channels. A list of channels that can include
tt
,et
andmt
. - The masses of the heavy additional higgs boson. A list that can include
240
,280
,320
,360
,400
,450
,500
,550
,600
,700
,800
,900
,1000
andheavier
. - The groups of masses of the light additional higgs boson. A list that can include
1
,2
,3
,4
,5
,6
and7
.
Note: Only valid combinations of heavy and light masses are used, as determined by thevalid_batches
function ofRunTraining
.
By default these parameters are already set to show the proper syntax.
If theRunAllTrainings
task is used, all of the parameters are set, ignoring the config file. - Optional: The production_tag. Can be any string. Used to differentiate the runs. Default is a unique timestamp.
Command line arguments:
--workers
; The number of tasks that are handled simultaneously. Due to the usage of thePuppetMaster
Task in this workflow, this parameter should be set to 2 times the maximum number of tasks that will be run in parallel. ForRunAllTrainings
this means3(eras) x 3(channels) x 2 = 18
. Default is 1.--print-status -1
; Return the current status of all tasks involved in the workflow. Shows only the status of thePuppetMaster
Tasks for all requirements, which is a summary of the puppeteered task statuses.--remove-output -1
; Remove allPuppetMaster
output files as well as the output files of the explicitly called task (likeRunTraining
). Output files of puppeteered tasks are currently not removed and have to be cleaned up by hand.
A number of necessary scripts are not yet implemented. This includes the ML testing scripts as well as the conversion of the .h5
network files to a different format.
Analyses apart from KingMaker itself are still beeing worked on.
A collection of additional function and tasks have added to the central framework.
A task to speed up the gathering of task statuses when many files are involved. More relevant for remote file storage.
Acts as the tasks it is puppeteering during the status query. Dynamically adds its task to the shedduler at runtime. Checks if given task is still the same as during previous (succesfull) executions by comparing their output targets.
Only prints full representation of puppet task during status queries, if the fulltask
parameter is set.
Used by giving the required task to the puppet before returning it in a tasks requires function. If multiple tasks of the same kind are used in a workflow the identifier
parameter has to be used to distinguish them.
Example:
return PuppetMaster(puppet_task=Task(**requirements), identifier=[channel])
For workflow tasks only the workflow_requires
should require a PuppetMaster
tasks as the individual branches require the actual task data of the puppet. In addition the workflow_requires
function should be skipped in remote branch tasks as the PuppetMaster
output files are only stored locally. This can be done by adding
if self.is_branch():
return None
to the top of the workflow_requires
function.
Some syntax has to be altered to recieve the correct input data in non-workflow tasks.
Instead of using self.input()["Name"]
, self.requires()["Name"].give_puppet_outputs()
should be used.
An example how all of this should be used can be found in the ML_train
workflow.
The C++-based ROot Workflow for N-tuples (CROWN) framework is a fast way, of converting CMS NanoAOD samples into analysis N-tuples.
Installing the framework is easy (not much more than a git clone
), check the Installation Guide
A small introduction on how to run the framework can be found in this Running the framework Guide
The full documentation can be found at https://crown.readthedocs.io/en/latest/.