-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add sonic-ize particle transformer to CMSSW_14_1_0_pre0 #15
base: master
Are you sure you want to change the base?
add sonic-ize particle transformer to CMSSW_14_1_0_pre0 #15
Conversation
enum InputIndexes { | ||
kChargedCandidates = 0, | ||
kNeutralCandidates = 1, | ||
kVertices = 2, | ||
kChargedCandidates4Vec = 3, | ||
kNeutralCandidates4Vec = 4, | ||
kVertices4Vec = 5 | ||
}; | ||
|
||
constexpr static unsigned n_features_cpf_ = 16; | ||
constexpr static unsigned n_pairwise_features_cpf_ = 4; | ||
constexpr static unsigned n_features_npf_ = 8; | ||
constexpr static unsigned n_pairwise_features_npf_ = 4; | ||
constexpr static unsigned n_features_sv_ = 14; | ||
constexpr static unsigned n_pairwise_features_sv_ = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't copy/reimplement constants. move the ones from https://github.com/cms-sw/cmssw/blob/master/RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4ONNXJetTagsProducer.cc into a helper class or base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to /interface/tensor_config.h
max_n_cpf = std::clamp(max_n_cpf, (unsigned int)0, (unsigned int)25); | ||
max_n_npf = std::clamp(max_n_npf, (unsigned int)0, (unsigned int)25); | ||
max_n_vtx = std::clamp(max_n_vtx, (unsigned int)0, (unsigned int)5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these magic numbers should be named constants, both here and in the direct inference producer. (also, you can put the use of clamp()
into the direct inference producer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made code change, please check
vdata.push_back(features.c_pf_features.at(count).btagPf_trackEtaRel); | ||
vdata.push_back(features.c_pf_features.at(count).btagPf_trackPtRel); | ||
vdata.push_back(features.c_pf_features.at(count).btagPf_trackPPar); | ||
vdata.push_back(features.c_pf_features.at(count).btagPf_trackDeltaR); | ||
vdata.push_back(features.c_pf_features.at(count).btagPf_trackPParRatio); | ||
vdata.push_back(features.c_pf_features.at(count).btagPf_trackSip2dVal); | ||
vdata.push_back(features.c_pf_features.at(count).btagPf_trackSip2dSig); | ||
vdata.push_back(features.c_pf_features.at(count).btagPf_trackSip3dVal); | ||
vdata.push_back(features.c_pf_features.at(count).btagPf_trackSip3dSig); | ||
vdata.push_back(features.c_pf_features.at(count).btagPf_trackJetDistVal); | ||
vdata.push_back(features.c_pf_features.at(count).ptrel); | ||
vdata.push_back(features.c_pf_features.at(count).drminsv); | ||
vdata.push_back(features.c_pf_features.at(count).vtx_ass); | ||
vdata.push_back(features.c_pf_features.at(count).puppiw); | ||
vdata.push_back(features.c_pf_features.at(count).chi2); | ||
vdata.push_back(features.c_pf_features.at(count).quality); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than duplicate these lists of features, extract the make_inputs()
functions from https://github.com/cms-sw/cmssw/blob/master/RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4ONNXJetTagsProducer.cc into a helper class. the similar example for ParticleNet is https://github.com/cms-sw/cmssw/blob/master/RecoBTag/FeatureTools/interface/deep_helpers.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hard coded features are moved to interface/tensor_filler.h and src/tensor_filler.cc. The remaining part of make_inputs() in the producer is different between SONIC and original version so I left them in the producer.
timeout = cms.untracked.uint32(300), | ||
mode = cms.string("Async"), | ||
modelName = cms.string("particletransformer_AK4"), # double check | ||
modelConfigPath = cms.FileInPath("RecoBTag/Combined/data/models/particletransformer_AK4/config.pbtxt"), # this is SONIC, not currently in the CMSSW, so the models/ will be copied to this location privately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should prepare a corresponding PR to https://github.com/cms-data/RecoBTag-Combined with the SONIC files, following the setup from cms-data/RecoBTag-Combined#53 where the "modelfile" used by the direct inference producer becomes a symbolic link to the version-numbered file in the Triton model repository path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR is updated on cms-data/RecoBTag-Combined
…rface/ and src/ tensor_filler and tensor_config
std::vector<float> inputs_parT(const btagbtvdeep::SecondaryVertexFeatures& sv_features, parT::InputIndexes idx); | ||
|
||
template<class parT_features> | ||
void parT_tensor_filler(float*& ptr, parT::InputIndexes idx , const parT_features pf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass complex objects by reference: const parT_features& pf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, code is modified
} | ||
|
||
template<class parT_features> | ||
void parT_tensor_filler(std::vector<float>& vdata, parT::InputIndexes idx , const parT_features pf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass complex objects by reference: const parT_features& pf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, code is modified
template<class parT_features> | ||
void parT_tensor_filler(float*& ptr, parT::InputIndexes idx , const parT_features pf) { | ||
std::vector<float> inputs; | ||
inputs = inputs_parT(pf, idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capture returned temporary by const reference: const std::vector<float>& inputs = inputs_parT(pf, idx);
(delete previous line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, code is modified
template<class parT_features> | ||
void parT_tensor_filler(std::vector<float>& vdata, parT::InputIndexes idx , const parT_features pf) { | ||
std::vector<float> inputs; | ||
inputs = inputs_parT(pf, idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capture returned temporary by const reference: const std::vector<float>& inputs = inputs_parT(pf, idx);
(delete previous line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, code is modified
n_cpf_ = std::max((unsigned int)1, n_cpf); | ||
n_npf_ = std::max((unsigned int)1, n_npf); | ||
n_sv_ = std::max((unsigned int)1, n_vtx); | ||
n_cpf_ = std::clamp(n_cpf_, (unsigned int)1, (unsigned int)25); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can just put features.c_pf_features.size()
in here directly instead of first assigning it above (i.e. delete L142-144).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, code is modified
n_cpf_ = std::clamp(n_cpf_, (unsigned int)1, (unsigned int)25); | ||
n_npf_ = std::clamp(n_npf_, (unsigned int)1, (unsigned int)25); | ||
n_sv_ = std::clamp(n_sv_, (unsigned int)1, (unsigned int)5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
25, 25, and 5 should be moved to named constants in the parT
namespace in tensor_configs.h
, so that they can be shared with the SONIC version rather than copying the hardcoded values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, code is modified, also the sonic version's code is modified
@@ -185,110 +170,63 @@ void ParticleTransformerAK4ONNXJetTagsProducer::make_inputs(btagbtvdeep::Particl | |||
unsigned offset = 0; | |||
|
|||
// c_pf candidates | |||
auto max_c_pf_n = std::min(features.c_pf_features.size(), (std::size_t)n_cpf_); | |||
const auto max_c_pf_n = std::min(features.c_pf_features.size(), (std::size_t)n_cpf_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a specific reason to make these const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these variables are just used once in the code and they are not passed into any function, so they don't have any risk of being modified... in my understanding it does not matter of I add const or not. If we in general do not make a variable constant if it is not necessary, I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more that unnecessary changes to existing lines of code are discouraged.
// Loop through the n cpf, and in the case that n_cpf is smaller than max_n_cpf, add padding values to all features | ||
if (igroup == parT::kChargedCandidates) { | ||
unsigned int n_cpf = features.c_pf_features.size(); | ||
n_cpf = std::clamp(n_cpf, (unsigned int)0, (unsigned int)25); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, you can just put features.c_pf_features.size()
directly inside the clamp()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why is the minimum 0 here, but 1 in the direct inference producer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code modified. In the direct inference, when making input for each jet, the clamped number is for forming the input_sizes_
for each jet, clamp minimum at 1, for example, for charged particle flow candidate, it keeps at least a row of n_features_cpf * 0
as input in data_
. In the sonic producer, since I do not use data_
for each jet, but chain up all jet's cpf in vdata
, I have to know when 0 cpf happens in a jet, so I manually add if (max_n_cpf == 0) vdata.insert(vdata.end(), parT::n_features_cpf, 0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I changed the code and remove the "padding" because I don't think we currently have a way to not do padding when sending event info to server... and no need to have clamp 0 and 25 and then clamp on 1 again like in the old code. Now it directly clamp on 1 and 25.
if (max_n_cpf == 0) | ||
vdata.insert(vdata.end(), parT::n_features_cpf, 0); // Add at least 1 row of 0 for a jet that has 0 cpf/npf/sv. | ||
} | ||
else if (igroup == parT::kNeutralCandidates) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a fair amount of duplicated structure in these successive blocks. you could try to introduce a function that would perform this operation given the necessary inputs for each case: igroup
, features.x
, max value for clamp, max_n_x
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, should I do similar things to the direct inference code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code changed.
also: there's now another ParT version... cms-sw#44641 |
kVertices4Vec=5 | ||
}; | ||
|
||
const std::map<unsigned int, InputFeatures> InputIndexes{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a method to loop through all the enum values, this is somewhat redundant (requires adding the same information in two places: the enum definition and this map). I checked with core software and they recommend something like the following:
enum InputFeatures {
kBegin=0,
kChargedCandidates=kBegin,
kNeutralCandidates=1,
kVertices=2,
kChargedCandidates4Vec=3,
kNeutralCandidates4Vec=4,
kVertices4Vec=5,
kEnd=6
};
then you can just loop from kBegin
to kEnd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay that will help...
@@ -180,116 +161,26 @@ void ParticleTransformerAK4ONNXJetTagsProducer::get_input_sizes( | |||
} | |||
|
|||
void ParticleTransformerAK4ONNXJetTagsProducer::make_inputs(btagbtvdeep::ParticleTransformerAK4Features features) { | |||
float* ptr = nullptr; | |||
//float* ptr = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete commented-out code
PR description:
Modified files:
Configuration/ProcessModifiers/python/allSonicTriton_cff.py
RecoBTag/ONNXRuntime/python/pfParticleTransformerAK4_cff.py
Create files:
Configuration/ProcessModifiers/python/particleTransformerAK4SonicTriton_cff.py
RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4SonicJetTagsProducer.cc
PR validation:
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
Before submitting your pull requests, make sure you followed this checklist: