-
Notifications
You must be signed in to change notification settings - Fork 2
Train some classifiers
Runnning training can be done with the chanser_mva interface or via a Jupyter notebook (see tutorials/tmva) .
Warning if your creation of the training data produced event with variables which are NaN or infinite then the training will fail. You can try and purge them at this point by filtering the trees, but better to put a check in the MVAParticleEventOut You can search for such variables in rOOT trees by doing things like
Here we use a chanser_mva and a ROOT script, RunTrainSignalID.C, which we can run with
chanser_mva RunTrainSignalID.C
The code include the following parts,
Get a tree of particle data that includes Final variables, I use the chanser::FiledTree class to simplify tree usage. Note the path to the data is given by the output directory giving in CreateTrainingData + $USERNAME/CANO_FILENAME/particleData/ParticleVariables_0.root
auto full = FiledTree::Read("particle",
"/work/dump/mvatraining/dglazier/Pi2_Pi2_Training__/particleData/ParticleVariables_0.root");
Turn off all Pi2 final variables apart from MissMass2 which we will cut on to define signal and background
full->Tree()->SetBranchStatus("Pi2*",0);
full->Tree()->SetBranchStatus("Pi2MissMass2",1);
Make a copy of the full data set filtered on signal-like events, in this case a cut on missing mass squared and DeltaTimeCuts, "TMath::Abs(Pi2MissMass2)<0.1&&DTCuts2==4" you may also filter events for NaNs and Infinity with code like "&&TMath::IsNaN(ElectronDeltaTime)==0&&TMath::Infinity()!=(PipDeltaTime)".
auto signal = FiledTree::RecreateCopyFull(full->Tree(),
"/work/dump/mvatraining/dglazier/Pi2_Pi2_Training__/particleData/Signal.root",
"TMath::Abs(Pi2MissMass2)<0.1&&DTCuts2==4");
And similar for background, now I select events outwith the exclusive peak (but still with some limit to try and descriminate better events that are closer to signal) and do not enforce particle cuts,
auto background = FiledTree::RecreateCopyFull(full->Tree(),
"/work/dump/mvatraining/dglazier/Pi2_Pi2_Training__/particleData/Background.root",
"TMath::Abs(Pi2MissMass2)<2&&TMath::Abs(Pi2MissMass2)>0.2");
Now we create our signalID training interface and give it a name, in case we try changing signal/background cuts above we can just give different names here then use them all in the same data processing by adding as different actions. The name string will be appended to the output directory, this (OutDir + name) must be specified when using MVASignalIDAction
auto train = TrainSignalID("mva2"); //
Give an output directory destination. Note the name in the cell above will also be added for this instance of the training.
train.SetOutDir("");
Set any branches in the tree that you which to ignore in the training. At this point we want to make sure we just have particle data branches left in the tree and remove everything else if we did not do so when filtering events.
train.IgnoreBranches("Pi2MissMass2:DTCuts2:EBCuts:Topo:NPerm:ElectronRegion");
Provide the trees we got from the ROOT file above, note as they are FiledTrees I must use the .Tree() function.
train.AddSignalTree(signal->Tree());
train.AddBackgroundTree(background->Tree());
Set how many events you would like to use from each for testing and training
train.SetNTrainTest(10000,10000);
train.PrepareTrees();
Now I must decide which classifiers to use and how they should be configured. You will need to read further into TMVA to understand this....
//Standard TMVA Factory Method Booking
train.BookMethod(TMVA::Types::kBDT,
"BDT","!H:!V:NTrees=850:MinNodeSize=2.5%:MaxDepth=3:
BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:
BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20");
At this point I could Book as many different classifiers as I like, but for calrity I just use 1 here.
Now run the training a show the response distributions and ROC curve.
train.DoTraining();
train.DrawROCCurve();
train.DrawResponses();