Releases: IftachSadeh/ANNZ
ANNZ v2.3.2
Bug fixes for root v6.22, see #7.
ANNZ v2.3.1
-
Updated
py/ANNZ.py
andscripts/annz_evalWrapper.py
forpython-3.6
compatibility. -
Fixed bug in the
Makefile
; now ROOT shared libraries are linked after the local objects. -
Added
isReadOnlySys
option, usable for evaluation only. One may setisReadOnlySys = Ture
while using the python wrapper, in order to avoid writing anything to disk during evaluation. -
Fixed issue of unnecessary excess memory consumption following validation of XML files.
-
Added
minPdfWeight
functionality to the new version of PDF generation using the random walk alg.
ANNZ v2.3.0
For users:
-
Changed the optimization method for generating regression PDFs. The new default method (denoted in the output as
PDF_0
) is now generated based on a simple random walk alg. The previous versions of the PDF are now denoted asPDF_1
andPDF_2
. While currently available, the deprecated PDFs are not guaranteed to be supported in the future. In order to derive the deprecated PDFs, set:glob.annz["nPDFs"] = 3 glob.annz["addOldStylePDFs"] = True
-
(1) Two new job options corresponding to
PDF_0
have been added:max_optimObj_PDF
andnOptimLoops
(seeREADME.md
andscripts/annz_rndReg_advanced.py
for details). (2) The default value ofexcludeRangePdfModelFit
has been changed from0.1
to0
. (3) Added several job options for plotting, to control the extent of underflow and overflow regions in the regression target:underflowZ
,overflowZ
,underflowZwidth
,overflowZwidth
,nUnderflowBins
,nOverflowBins
. (Seesrc/myANNZ.cpp
for details.) (4) Added a variable,nZclosBins
, to control the number of bins used for optimization-metric calculations in regression. (Seesrc/myANNZ.cpp
for details.) (5) ROOT scripts are no longer stored by default for each plot. SetsavePlotScripts
to choose otherwise. -
Added a wrapper class, which allows calling the evaluation phase for regression/classification directly from python. This can be used to integrate ANNZ directly within pipelines. The python interface is defined in
py/ANNZ.py
, with a full example given inscripts/annz_evalWrapper.py
. (See README.md for details.) -
Bug fix in a few python scripts, where the example for the
weightInp_wgtKNN
option had previously been set to numerically insignificant values. -
Changed the interface to turn off colour output (see
README.md
).
For developers:
-
Major revamp of the
Makefile
, including adding a step of precompilation of the sharedinclude/commonInclude.hpp
header. -
Reorganization of shared namespaces.
-
Created a new
Manager
class as part ofinclude/myANNZ.hpp
,src/myANNZ.cpp
. -
The new random walk alg for generating regression PDFs is implemented in
ANNZ::getRndMethodBestPDF()
, which has been completely revamped. The old version of this function has been renamed toANNZ::getOldStyleRndMethodBestPDF()
. It is now used in order to derivePDF_1
andPDF_2
. -
Added a wrapper class for e.g., python integration, implemented in
include/Wrapper.hpp
,src/Wrapper.cpp
andpy/ANNZ.py
. -
Completely rewrote
ANNZ::doEvalReg()
to comply with pipeline integration. Added new interfaces for regression evaluation, as implemented insrc/ANNZ_regEval.cpp
.
ANNZ v2.2.2
-
Added the option to to not store the full value of pdfs in the output of optimization/evaluation, by setting
glob.annz["doStorePdfBins"] = False
In this case, only the average metrics of a pdf are included in the output.
-
Added the
sampleFrac_errKNN
option, to allow to sub-sample the input dataset for the knn uncertainty calculation (similar to e.g.,sampleFracInp_wgtKNN
andsampleFracInp_inTrain
). -
Added metric plots of the distribution of the KNN error estimator vs. the true bias. The plots are added to the output by setting
glob.annz["doKnnErrPlots"] = True
-
Added support for input ROOT files with different Tree names.
-
Added support for ROOT version
6.8.*
. -
Other minor modifications and bug fixes.
ANNZ v2.2.1
- Fixed bug with using general math expressions for the
weightVarNames_wgtKNN
andweightVarNames_inTrain
variables. - Modified the
Makefile
to explicitly includerpath
inLDFLAGS
, which may be needed for pre-compiled versions of ROOT. - Modified
subprocess.check_output()
inexamples/scripts/annz_qsub.py , fitsFuncs.py
forPython 3.x
. - Fixed bug which caused a segmentation fault in some cases during reweighting.
- Other minor modifications and bug fixes.
ANNZ v2.2.0
- Added a bias correction procedure for MLMs, which may be switched off using
glob.annz["doBiasCorMLM"] = False
. (SeeREADME.md
andscripts/annz_rndReg_advanced.py
for details.) - Added the option to generate error estimations (using the KNN method) for a general input dataset. An example script is provided as
scripts/annz_rndReg_knnErr.py
. (A detailed description is given inREADME.md
.) - Added the
userWeights_metricPlots
job option, which can be used to set weight expressions for the performance plots of regression. (SeeREADME.md
for details.) - Changed the binning scheme for the performance plots of auxiliary variables (defined using
glob.annz["addOutputVars"]
). Instead of equal-width bins, the plots now include bins which are defined as each having the same number of objects (equal-quantile binning). This e.g., reduces statistical fluctuations in computations of the bias, scatter and other parameters, as a function of the variables used for the training. - Changed the default number of training cycles for ANNs from
5000
to a (more reasonable) randomized choice in the range[500,2000]
(ANNZ::generateOptsMLM()
). The option may be set to any other value by the user, using theNCycles
setting. E.g., during training, set:glob.annz["userMLMopts"] = "ANNZ_MLM=ANN::HiddenLayers=N,N+3:NCycles=3500"
. - Fixed minor bug in
ANNZ::Train_binnedCls()
, which caused a mismatch of job-options for some configuration of binned classification. - Added a version-tag to all intermediate option files, with a format as e.g.,
[versionTag]=ANNZ_2.1.3
. - Minor change to the selection criteria for
ANNZ_best
in randomized regression. - Other minor modifications and bug fixes.
ANNZ v2.1.2
-
Improved selection criteria for
ANNZ_best
in randomized regression. The optimization is now based onglob.annz["optimCondReg"]="sig68"
or"bias"
(The"fracSig68"
option is deprecated.) -
Significant speed improvement for KNN weights and
inTrainFlag
calculations inCatFormat::addWgtKNNtoTree()
. -
Modified
CatFormat::addWgtKNNtoTree()
andCatFormat::inputToSplitTree_wgtKNN()
so that both training and testing objects are used together as the reference dataset, when deriving KNN weights. This new option is on by default, and may be turned off by setting:glob.annz["trainTestTogether_wgtKNN"] = False
- For developers: internal interface change (not backward compatible) - What used to be
CatFormat::addWgtKNNtoTree(TChain * aChainInp, TChain * aChainRef, TString outTreeName)
has been changed toCatFormat::addWgtKNNtoTree(TChain * aChainInp, TChain * aChainRef, TChain * aChainEvl, TString outTreeName)
.
- For developers: internal interface change (not backward compatible) - What used to be
-
Cancelled the
splitTypeValid
option, which was not very useful and confusing for users. From now on, input datasets may only be divided into two subsets, one for training and one for testing. The user may define the training/testing samples in one of two ways (seescripts/annz_rndReg_advanced.py
for details):-
Automatic splitting:
glob.annz["splitType"] = "random" glob.annz["inAsciiFiles"] = "boss_dr10_0.csv;boss_dr10_1.csv"
Set a list of input files in
inAsciiFiles
, and usesplitType
to specify the method for splitting the sample. Allowed values for the latter areserial
,blocks
orrandom
. -
Splitting by file:
glob.annz["splitType"] = "byInFiles" glob.annz["splitTypeTrain"] = "boss_dr10_0.csv" glob.annz["splitTypeTest"] = "boss_dr10_1.csv;boss_dr10_2.csv"
Set a list of input files for training in
splitTypeTrain
, and a list of input files for testing insplitTypeTest
.
-
-
Added plotting for the evaluation mode of regression (single regression, randomized regression and binned classification). If the regression target is detected as part of the evaluated dataset, the nominal performance plots are created. For instance, for the
scripts/annz_rndReg_quick.py
script, the plots will be created inoutput/test_randReg_quick/regres/eval/plots/
. -
Fixed bug in plotting routine from
ANNZ::doMetricPlots()
, when adding user-defined cuts for variables not already present in the input trees. -
Simplified the interface for string variables in cut and weight expressions.
-
For example, given a set of input parameters,
glob.annz["inAsciiVars"] = "D:MAG_AUTO_G;D:MAG_AUTO_R;D:MAG_AUTO_I;D:Z_SPEC;C:FIELD"
one can now use cuts and weights of the form:
glob.annz["userCuts_train"] = " (FIELD == \"FIELD_0\") || (FIELD == \"FIELD_1\")" glob.annz["userCuts_valid"] = " (FIELD == \"FIELD_1\") || (FIELD == \"FIELD_2\")" glob.annz["userWeights_train"] = "1.0*(FIELD == \"FIELD_0\") + 2.0*(FIELD == \"FIELD_1\")" glob.annz["userWeights_valid"] = "1.0*(FIELD == \"FIELD_1\") + 0.1*(FIELD == \"FIELD_2\")"
Here, training is only done using
FIELD_0
andFIELD_1
; validation is weighted such that galaxies fromFIELD_1
have ten times the weight compared to galaxies fromFIELD_2
etc. -
The same rules also apply for the weight and cut options for the KNN re-weighting method:
cutInp_wgtKNN
,cutRef_wgtKNN
,weightRef_wgtKNN
andweightInp_wgtKNN
, and for the corresponding variables for the evaluation compatibility test:cutInp_inTrain
,cutRef_inTrain
,weightRef_inTrain
andweightInp_inTrain
. (Examples for the re-weighting and for the compatibility test using these variables are given inscripts/annz_rndReg_advanced.py
.)
-
-
ANNZ_PDF_max_0
no longer calculated by default. This may be turned back on by setting
glob.annz["addMaxPDF"] = True
- Other minor modifications and bug fixes.
ANNZ v2.1.1
-
Fixed bug in generating a name for an internal
TF1
function inANNZ::setupKdTreeKNN()
. -
Fixed bug in plotting routine from
ANNZ::doMetricPlots()
, when adding user-requested variables which are not floats. -
Added the option,
glob.annz["optimWithMAD"] = False
If set to
True
, then the MAD (median absolute deviation) is used, instead of the 68th percentile of the bias (sigma_68
). This affects only the selection of the "best" MLM and the PDF optimization procedure in randomized regression. Seescripts/generalSettings.py
. -
Added the option,
glob.annz["optimWithScaledBias"] = False
If set to
True
, then instead of the bias,delta == zReg-zTrg
, the expressiondeltaScaled == delta/(1+zTrg)
is used, wherezReg
is the estimated result of the MLM/PDF andzTrg
is the true (target) value. This affects only the selection of the "best" MLM and the PDF optimization procedure in randomized regression. E.g., one can set this parameter in order to minimize the value ofdeltaScaled
instead of the value ofdelta
, or correspondingly the value of the scatter ofdeltaScaled
instead of that ofdelta
. The selection criteria for prioritizing the bias or the scatter remains the parameterglob.annz["optimCondReg"]
. The latter can take the valuebias
(fordelta
ordeltaScaled
),sig68
(for the scatter ofdelta
or ofdeltaScaled
), andfracSig68
(for the outlier fraction ofdelta
or ofdeltaScaled
). Seescripts/generalSettings.py
. -
Added the option,
glob.annz["plotWithScaledBias"] = False
If set to
True
, then instead of the bias,delta == zReg-zTrg
, the expressiondelta/(1+zTrg)
is used. This affects only the figures generated with the plotting routine,ANNZ::doMetricPlots()
, and does not change any of the optimization/output of the code. Seescripts/generalSettings.py
. -
Added option to set the PDF bins in randomized regression by the width of the bins, instead of by the number of the bins. That is, one can now set e.g.,
glob.annz["pdfBinWidth"] = 0.01
instead of e.g.,
glob.annz["nPDFbins"] = 100
Assuming the regression range is
[minValZ,maxValZ] = [0,1.5]
, the first option will lead to 150 PDF bins of width 0.01, while the second will result in 100 bins of width 0.015. The two options are mutually exclusive (the user should define only one or the other). -
For developers: Changed internal key-word interface in
Utils::getInterQuantileStats()
for requesting a MAD calculation: to add the calculation - changed frommedianAbsoluteDeviation
togetMAD
; to retrieve the result of the calculation - fromquant_medianAbsoluteDeviation
toquant_MAD
. -
Other minor modifications.
ANNZ v2.1.0
-
Removed unnecessary dictionary generation from Makefile.
-
Changed
std::map
tostd::unordered_map
in main containers of theOptMaps()
andVarMaps()
classes (constitutes a slight performance boost). -
Nominally, no longer keeping track of the name of the original input file (stored in the ROOT trees with the name defined in
origFileName
inmyANNZ::Init()
). This may be switched back on by settingglob.annz["storeOrigFileName"] = True
. -
Added the option to use an entire input file as signal or background for single/randomized classification, in addition to (or instead of) defining a cut based on one of the input parameters. In order to use this option, one muse define the variables
inpFiles_sig
andinpFiles_bck
. An example is given inscripts/annz_rndCls_advanced.py
. -
Added a bias-correction for randomized regression PDFs. This options is now active by default, and may be turned off by setting,
glob.annz["doBiasCorPDF"] = False
-
Other minor modifications.
ANNZ v2.0.6
-
Did some code optimization for tree-looping operations.
-
Added the script,
annz_rndReg_weights.py
: This shows how one may derive the weights based on the KNN method (usinguseWgtKNN
), and/or theinTrainFlag
quality-flag, without training/evaluating any MLMs. -
Added a plot-reference guide (
thePlotsExplained.pdf
). -
Added the option
doGausSigmaRelErr
(now set toTrue
by default) to estimate the scatter of the relative uncertainty of regression solutions by a Gaussian fit, instead of by the RMS or the 68th percentile of the distribution of the relative uncertainty. This only affects the plotting output of regression problems (ANNZ::doMetricPlots()
). -
Added support for general math expressions for the
weightVarNames_wgtKNN
andweightVarNames_inTrain
variables. -
Nominally, the
inTrainFlag
quality flag is a binary operator, and may only take values of either0
or1
. Have now added the option of settingmaxRelRatioInRef_inTrain < 0
. In this case, themaxRelRatioInRef_inTrain
parameter is ignored. As a result theinTrainFlag
may take floating-point values between zero and one. -
Added a transformation of the input parameters used for the kd-tree during the nominal uncertainty calculation in regression. The variables after the transformation span the range
[-1,1]
. The transformations are performed by default, and may be turned off by setting,glob.annz["doWidthRescale_errKNN"] = False
Similarly, added the same transformations for the kd-tree during the
glob.annz["useWgtKNN"] = True
andglob.annz["addInTrainFlag"] = True
setups. These may be turned off using the flags,doWidthRescale_wgtKNN
anddoWidthRescale_inTrain
, respectively. -
Added support for ROOT file inputs, which may be used instead of ascii inputs (example given in
scripts/annz_rndReg_advanced.py
). -
Other minor modifications.