Skip to content

Commit

Permalink
add description for another features and todo list
Browse files Browse the repository at this point in the history
  • Loading branch information
tata-antares committed Mar 4, 2016
1 parent e409a42 commit 8cedf1c
Showing 1 changed file with 72 additions and 29 deletions.
101 changes: 72 additions & 29 deletions description.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,24 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Previous analysis\n",
"## [Previous analysis](http://arxiv.org/pdf/1504.07670v3.pdf)\n",
"\n",
"http://arxiv.org/pdf/1504.07670v3.pdf\n",
"## Jet Parton\n",
"\n",
"JetParton is a truth-level feature. These are: \n",
"\n",
"* +-5 = b\n",
"* +- 4 = c\n",
"* +-3 = s\n",
"* +-2 = u\n",
"* +- 1 = d\n",
"* 0 = gluon.\n",
"\n",
"## Features\n",
"\n",
"SV - secondary vertex\n",
"We should not use the absolute JetP and JetPt. The fact that JetPT looks different for b-jets in the training is simply because we used some top events which have higher jet PT. In the real analysis, we want to measure in bins of JetPT the yields of b, c, and light. So we don't want to have the classifier using PT as a feature (or P).\n",
"\n",
"### SV - secondary vertex\n",
"\n",
"* `SVM`: SV mass;\n",
"* `SVMC`: SV corrected mass;\n",
Expand All @@ -20,39 +31,71 @@
"* `SVN`: number of tracks in the SV (>= 2);\n",
"* `SVQ`: abs value of the net charge of tracks in the SV;\n",
"* `SVFDChi2`: FD chi2 of the SV from the PV - we used log of this in our algorithm;\n",
"* `SVSumIPChi2`: sum of IPChi2 of all tracks in the SV - we used log of this in our algorithm."
"* `SVSumIPChi2`: sum of IPChi2 of all tracks in the SV - we used log of this in our algorithm.\n",
"\n",
"### Jet features (used by [CMS](http://arxiv.org/pdf/1409.3072v1.pdf)):\n",
"\n",
"* `JetQ`: This is a pt-weighted \"jet charge\" observable (each particle contributes its electric charge but weighted by the particle's PT relative to the jet PT). It roughly on average corresponds to the initial quark or gluon charge (which is either +- 1/3, +- 2/3 or 0).\n",
"\n",
"* `JetSigma1` and `JetSigma2`: These are the major and minor jet \"widths\" (the jet is a cone, but the energy is distributed within the cone unevenly, so one can define elliptical axes and define the spread along them).\n",
"\n",
"* `JetMult`: Number of particles in the jet. In MC this is the best quark-gluon discriminant. In reality it should be a power one, but we know that the MC is overly optimistic. \n",
"\n",
"* `JetPTHard`: Fraction of jet PT carried by highest PT particle in the jet.\n",
"\n",
"* `JetPTD`: This is another pt weighted variable. \n",
"\n",
"* `JetNDis`: I added this feature mainly for b vs c vs light. It's possible to have displaced tracks in the jet that do not enter the SV. This can especially be true for b jets. This feature may help in identifying b and c jets.\n",
"\n",
"### Check if they help:\n",
"\n",
"* `d['jetM'] = numpy.sqrt(d['JetE'] ** 2 - d['jetP'] ** 2 )`\n",
"* `d['SV_jet_M_rel'] = d['SVM'] / d['jetM']`\n",
"* `d['SV_jet_MC_rel'] = d['SVMC'] / d['jetM’]`\n",
"\n",
"### Hard to calibrate:\n",
"\n",
"* JetPx,\n",
"* JetPy, \n",
"* JetPz, \n",
"* JetE\n",
"\n",
"## Training TODO\n",
"\n",
"Compare separately on:\n",
"\n",
"* SV features\n",
"* jet features\n",
"* SV+jet features, how this helps"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" total used free shared buffers cached\r\n",
"Mem: 100721 100197 523 0 605 45839\r\n",
"-/+ buffers/cache: 53752 46968\r\n",
"Swap: 0 0 0\r\n"
]
}
],
"cell_type": "markdown",
"metadata": {},
"source": [
"!free -m"
"### Additional features"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
"cell_type": "markdown",
"metadata": {},
"source": [
"`data['log_SVFDChi2'] = numpy.log(data['SVFDChi2'].values)`\n",
"\n",
"`data['log_SVSumIPChi2'] = numpy.log(data['SVSumIPChi2'].values)`\n",
"\n",
"`data['SVM_diff'] = numpy.log(data['SVMC'] ** 2 - data['SVM']**2)`\n",
"\n",
"`data['SV_theta'] = (data['SVMC'] ** 2 - data['SVM']**2) / data['SVPT']`\n",
"\n",
"`data['SVM_rel'] = numpy.log(data['SVM'] / data['SVMC'] + 0.01)`\n",
"\n",
"`data['SV_R_FD_rel'] = numpy.tanh(data['SVR'] / data['SVFDChi2'])`\n",
"\n",
"`data['SV_Q_N_rel'] = 1. * data['SVQ'] / data['SVN']`\n",
"\n",
"`data = data.drop(['SVFDChi2', 'SVSumIPChi2'], axis=1)`"
]
}
],
"metadata": {
Expand Down

0 comments on commit 8cedf1c

Please sign in to comment.