Skip to content
dglazier edited this page Jun 11, 2020 · 14 revisions

ROOT provides a flexible interface for a number of Multivariate Analysis, now more commonly referred to as Machine Learning. As well as their own classifiers you can use popular machine learning libraries such as R or Keras.

chanser provides an additional layer to simplify the steering of TMVA and include the classifiers as cuts in event loop analysis.

Machine learning has helped popularise python and Jupyter notebooks and so chanser is configured via pyROOT to run in python. This actually applies to all chanser analysis not just chanser_mva.

Please see the chanser interpretation of the TMVA tutorials TMVAClassifier and TMVAClassifierApplication in $CHANSER/tutorials/tmva

We will not go into any detail here on the different types of clssifier, please see TMVA resource for that and we will assume you know which you would like to use.

Training a TMVA classifier (or two)

To include a classifier we must first choose one or some, train them, then include as a MVASignalIDAction

The darkest art of machine learning is the middle part (in principle ROOT/chanser_mva simplify the first and last parts) and you will need to work out what you are going to train your classifier with, i.e. how to get a sample of signal events and background events. Here we will outline some possibilities, but great care must be taken with each and it is ultimately the analysers own responsibility to make sure what they are doing is sensible and works in the way they expect.

Including a TMVA classifier (or two) in your chanser analysis workflow

The basic point of a MVA classifier is to take many input variables and produce a single output. In the case of signal identification this output will reflect the probability of having a signal or background event. In our analysis we can then cut on this output to reject or accept an event as signal.

The input variables use will in general be related to particle tracks, for example, times, energy deposits, Cherenkov photons,... . These can be accessed from the DST banks via the clas12root region_particles. To select which track variables we wish to use we create particle data classes, see the section on particle data for more details.

We then run our chanser final state with the desired particle data to produce output trees which can be used for training. But be aware, are you using Event Builder PID for you particle iterator ? Are you applying your own ParticleCuts class when making these tree outputs ? Both things will effect the training data, so you be sure of exactly what you want to train with.

Once training has been completed we can load the classifiers into a MVASignalIDAction and re-run the data. The MVASignalIDAction just needs the same particle data classes which were used to produce the training data, the name of the output training directory and the name of the particular classifier you would like to use. You may use multiple classifiers in your analysis by creating multiple MVASignalIDActions . This looks like

MVASignalIDAction mva_mlp{"mva1","MLP","Electron:Proton:Pip:Pim"};
mva_mlp.SetParticleOut(new MyParticleOutEvent); //default
mva_mlp.SetParticleOut("e-",new MVA_El_Data); //just for electrons
FS->RegisterPostTopoAction(mva_mlp);

The resulting workflow is included as an example in finalstates/Pi2 (note to use the MVAParticleOutEvent you can set $CHANSER_CLASSES to $CHANSER/finalstates/Pi2/mvaClasses),

Create Training Data

Train some classifiers

Run FinalState analysis with TMVA classifier