add sonic-ize particle transformer to CMSSW_14_1_0_pre0 #15

y19y19 · 2024-04-02T18:51:51Z

PR description:

Modified files:
Configuration/ProcessModifiers/python/allSonicTriton_cff.py
RecoBTag/ONNXRuntime/python/pfParticleTransformerAK4_cff.py

Create files:
Configuration/ProcessModifiers/python/particleTransformerAK4SonicTriton_cff.py
RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4SonicJetTagsProducer.cc

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Before submitting your pull requests, make sure you followed this checklist:

verify that the PR is really intended for the chosen branch
verify that changes follow CMS Naming, Coding, And Style Rules
verify that the PR passes the basic test procedure suggested in the CMSSW PR instructions

kpedro88 · 2024-04-02T19:10:59Z

RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4SonicJetTagsProducer.cc

+  enum InputIndexes {
+    kChargedCandidates = 0,
+    kNeutralCandidates = 1,
+    kVertices = 2,
+    kChargedCandidates4Vec = 3,
+    kNeutralCandidates4Vec = 4,
+    kVertices4Vec = 5
+  };
+
+  constexpr static unsigned n_features_cpf_ = 16;
+  constexpr static unsigned n_pairwise_features_cpf_ = 4;
+  constexpr static unsigned n_features_npf_ = 8;
+  constexpr static unsigned n_pairwise_features_npf_ = 4;
+  constexpr static unsigned n_features_sv_ = 14;
+  constexpr static unsigned n_pairwise_features_sv_ = 4;


don't copy/reimplement constants. move the ones from https://github.com/cms-sw/cmssw/blob/master/RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4ONNXJetTagsProducer.cc into a helper class or base class.

move to /interface/tensor_config.h

kpedro88 · 2024-04-02T19:13:08Z

RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4SonicJetTagsProducer.cc

+  max_n_cpf = std::clamp(max_n_cpf, (unsigned int)0, (unsigned int)25);
+  max_n_npf = std::clamp(max_n_npf, (unsigned int)0, (unsigned int)25);
+  max_n_vtx = std::clamp(max_n_vtx, (unsigned int)0, (unsigned int)5);


these magic numbers should be named constants, both here and in the direct inference producer. (also, you can put the use of clamp() into the direct inference producer)

made code change, please check

kpedro88 · 2024-04-02T19:14:37Z

RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4SonicJetTagsProducer.cc

+          vdata.push_back(features.c_pf_features.at(count).btagPf_trackEtaRel);
+          vdata.push_back(features.c_pf_features.at(count).btagPf_trackPtRel);
+          vdata.push_back(features.c_pf_features.at(count).btagPf_trackPPar);
+          vdata.push_back(features.c_pf_features.at(count).btagPf_trackDeltaR);
+          vdata.push_back(features.c_pf_features.at(count).btagPf_trackPParRatio);
+          vdata.push_back(features.c_pf_features.at(count).btagPf_trackSip2dVal);
+          vdata.push_back(features.c_pf_features.at(count).btagPf_trackSip2dSig);
+          vdata.push_back(features.c_pf_features.at(count).btagPf_trackSip3dVal);
+          vdata.push_back(features.c_pf_features.at(count).btagPf_trackSip3dSig);
+          vdata.push_back(features.c_pf_features.at(count).btagPf_trackJetDistVal);
+          vdata.push_back(features.c_pf_features.at(count).ptrel);
+          vdata.push_back(features.c_pf_features.at(count).drminsv);
+          vdata.push_back(features.c_pf_features.at(count).vtx_ass);
+          vdata.push_back(features.c_pf_features.at(count).puppiw);
+          vdata.push_back(features.c_pf_features.at(count).chi2);
+          vdata.push_back(features.c_pf_features.at(count).quality);


rather than duplicate these lists of features, extract the make_inputs() functions from https://github.com/cms-sw/cmssw/blob/master/RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4ONNXJetTagsProducer.cc into a helper class. the similar example for ParticleNet is https://github.com/cms-sw/cmssw/blob/master/RecoBTag/FeatureTools/interface/deep_helpers.h

The hard coded features are moved to interface/tensor_filler.h and src/tensor_filler.cc. The remaining part of make_inputs() in the producer is different between SONIC and original version so I left them in the producer.

kpedro88 · 2024-04-02T19:18:05Z

RecoBTag/ONNXRuntime/python/pfParticleTransformerAK4_cff.py

+        timeout = cms.untracked.uint32(300),
+        mode = cms.string("Async"),
+        modelName = cms.string("particletransformer_AK4"), # double check
+        modelConfigPath = cms.FileInPath("RecoBTag/Combined/data/models/particletransformer_AK4/config.pbtxt"), # this is SONIC, not currently in the CMSSW, so the models/ will be copied to this location privately


you should prepare a corresponding PR to https://github.com/cms-data/RecoBTag-Combined with the SONIC files, following the setup from cms-data/RecoBTag-Combined#53 where the "modelfile" used by the direct inference producer becomes a symbolic link to the version-numbered file in the Triton model repository path

PR is updated on cms-data/RecoBTag-Combined

…rface/ and src/ tensor_filler and tensor_config

kpedro88 · 2024-04-19T21:22:35Z

RecoBTag/ONNXRuntime/interface/tensor_fillers.h

+  std::vector<float> inputs_parT(const btagbtvdeep::SecondaryVertexFeatures& sv_features, parT::InputIndexes idx);
+
+  template<class parT_features>
+  void parT_tensor_filler(float*& ptr, parT::InputIndexes idx , const parT_features pf) {


pass complex objects by reference: const parT_features& pf

agree, code is modified

kpedro88 · 2024-04-19T21:22:38Z

RecoBTag/ONNXRuntime/interface/tensor_fillers.h

+  }
+
+  template<class parT_features>
+  void parT_tensor_filler(std::vector<float>& vdata, parT::InputIndexes idx , const parT_features pf) {


pass complex objects by reference: const parT_features& pf

agree, code is modified

kpedro88 · 2024-04-19T21:24:07Z

RecoBTag/ONNXRuntime/interface/tensor_fillers.h

+  template<class parT_features>
+  void parT_tensor_filler(float*& ptr, parT::InputIndexes idx , const parT_features pf) {
+    std::vector<float> inputs;
+    inputs = inputs_parT(pf, idx);


capture returned temporary by const reference: const std::vector<float>& inputs = inputs_parT(pf, idx); (delete previous line)

agree, code is modified

kpedro88 · 2024-04-19T21:24:32Z

RecoBTag/ONNXRuntime/interface/tensor_fillers.h

+  template<class parT_features>
+  void parT_tensor_filler(std::vector<float>& vdata, parT::InputIndexes idx , const parT_features pf) {
+    std::vector<float> inputs;
+    inputs = inputs_parT(pf, idx);


capture returned temporary by const reference: const std::vector<float>& inputs = inputs_parT(pf, idx); (delete previous line)

agree, code is modified

kpedro88 · 2024-04-19T21:37:05Z

RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4ONNXJetTagsProducer.cc

-  n_cpf_ = std::max((unsigned int)1, n_cpf);
-  n_npf_ = std::max((unsigned int)1, n_npf);
-  n_sv_ = std::max((unsigned int)1, n_vtx);
+  n_cpf_ = std::clamp(n_cpf_, (unsigned int)1, (unsigned int)25);


can just put features.c_pf_features.size() in here directly instead of first assigning it above (i.e. delete L142-144).

agree, code is modified

kpedro88 · 2024-04-19T21:37:55Z

RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4ONNXJetTagsProducer.cc

+  n_cpf_ = std::clamp(n_cpf_, (unsigned int)1, (unsigned int)25);
+  n_npf_ = std::clamp(n_npf_, (unsigned int)1, (unsigned int)25);
+  n_sv_ = std::clamp(n_sv_, (unsigned int)1, (unsigned int)5);


25, 25, and 5 should be moved to named constants in the parT namespace in tensor_configs.h, so that they can be shared with the SONIC version rather than copying the hardcoded values.

agree, code is modified, also the sonic version's code is modified

kpedro88 · 2024-04-19T21:38:35Z

RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4ONNXJetTagsProducer.cc

@@ -185,110 +170,63 @@ void ParticleTransformerAK4ONNXJetTagsProducer::make_inputs(btagbtvdeep::Particl
  unsigned offset = 0;

  // c_pf candidates
-  auto max_c_pf_n = std::min(features.c_pf_features.size(), (std::size_t)n_cpf_);
+  const auto max_c_pf_n = std::min(features.c_pf_features.size(), (std::size_t)n_cpf_);


is there a specific reason to make these const?

these variables are just used once in the code and they are not passed into any function, so they don't have any risk of being modified... in my understanding it does not matter of I add const or not. If we in general do not make a variable constant if it is not necessary, I will remove it.

It's more that unnecessary changes to existing lines of code are discouraged.

kpedro88 · 2024-04-19T21:45:17Z

RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4SonicJetTagsProducer.cc

+      // Loop through the n cpf, and in the case that n_cpf is smaller than max_n_cpf, add padding values to all features
+      if (igroup == parT::kChargedCandidates) {
+        unsigned int n_cpf = features.c_pf_features.size();
+        n_cpf = std::clamp(n_cpf, (unsigned int)0,  (unsigned int)25);


as above, you can just put features.c_pf_features.size() directly inside the clamp()

also, why is the minimum 0 here, but 1 in the direct inference producer?

code modified. In the direct inference, when making input for each jet, the clamped number is for forming the input_sizes_ for each jet, clamp minimum at 1, for example, for charged particle flow candidate, it keeps at least a row of n_features_cpf * 0 as input in data_. In the sonic producer, since I do not use data_ for each jet, but chain up all jet's cpf in vdata, I have to know when 0 cpf happens in a jet, so I manually add if (max_n_cpf == 0) vdata.insert(vdata.end(), parT::n_features_cpf, 0);

well I changed the code and remove the "padding" because I don't think we currently have a way to not do padding when sending event info to server... and no need to have clamp 0 and 25 and then clamp on 1 again like in the old code. Now it directly clamp on 1 and 25.

kpedro88 · 2024-04-19T21:48:19Z

RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4SonicJetTagsProducer.cc

+        if (max_n_cpf == 0) 
+          vdata.insert(vdata.end(), parT::n_features_cpf, 0); // Add at least 1 row of 0 for a jet that has 0 cpf/npf/sv.
+      }
+      else if (igroup == parT::kNeutralCandidates) {


there's a fair amount of duplicated structure in these successive blocks. you could try to introduce a function that would perform this operation given the necessary inputs for each case: igroup, features.x, max value for clamp, max_n_x, etc.

I can do that, should I do similar things to the direct inference code?

If possible, yes

code changed.

kpedro88 · 2024-04-19T21:49:50Z

also: there's now another ParT version... cms-sw#44641
hopefully this one is faster to convert now that the basic process has been done once already

kpedro88 · 2024-04-29T20:36:07Z

RecoBTag/ONNXRuntime/interface/tensor_configs.h

+    kVertices4Vec=5
+  };
+
+  const std::map<unsigned int, InputFeatures> InputIndexes{


as a method to loop through all the enum values, this is somewhat redundant (requires adding the same information in two places: the enum definition and this map). I checked with core software and they recommend something like the following:

enum InputFeatures { kBegin=0, kChargedCandidates=kBegin, kNeutralCandidates=1, kVertices=2, kChargedCandidates4Vec=3, kNeutralCandidates4Vec=4, kVertices4Vec=5, kEnd=6 };

then you can just loop from kBegin to kEnd.

okay that will help...

kpedro88 · 2024-04-29T20:37:59Z

RecoBTag/ONNXRuntime/plugins/ParticleTransformerAK4ONNXJetTagsProducer.cc

@@ -180,116 +161,26 @@ void ParticleTransformerAK4ONNXJetTagsProducer::get_input_sizes(
 }

 void ParticleTransformerAK4ONNXJetTagsProducer::make_inputs(btagbtvdeep::ParticleTransformerAK4Features features) {
-  float* ptr = nullptr;
+  //float* ptr = nullptr;


delete commented-out code

y19y19 added 2 commits April 2, 2024 14:31

add SONIC-ize particleTransformer

fcb4d88

add particle transformer modofier

7fabe9f

kpedro88 reviewed Apr 2, 2024

View reviewed changes

move common input for particle transformer with/without sonic to inte…

0301af1

…rface/ and src/ tensor_filler and tensor_config

kpedro88 reviewed Apr 19, 2024

View reviewed changes

respond to code comments

a5b07c8

kpedro88 reviewed Apr 29, 2024

View reviewed changes

y19y19 added 5 commits May 3, 2024 10:44

remove InputIndexes and loop through enum

35d7a18

code patch

2b6404d

remove unnecessary comments

28d81de

resolve comments to make variables inline and use array instead of map

7a4032e

update format

df4fe14

add sonic-ize particle transformer to CMSSW_14_1_0_pre0 #15

Are you sure you want to change the base?

add sonic-ize particle transformer to CMSSW_14_1_0_pre0 #15

Conversation

y19y19 commented Apr 2, 2024

PR description:

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

y19y19 Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kpedro88 commented Apr 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

y19y19 Apr 10, 2024 •

edited

Loading