potential speedup in HLT: AXOL1TLCondition::evaluateCondition consider caching the model? #46740

slava77 · 2024-11-19T20:12:30Z

Looking at a profile of HLT in 14_1_X with callgrind, on MC running only MC_ReducedIterativeTracking_v22, I see that 78% of L1TGlobalProducer::produce is spent in l1t::AXOL1TLCondition::evaluateCondition
https://github.com/cms-sw/cmssw/blob/CMSSW_14_1_0_pre5/L1Trigger/L1TGlobal/src/AXOL1TLCondition.cc#L100

In my test L1TGlobalProducer::produce takes 9.7% of the time; in the full menu it's apparently around ~~2.5%~~ 0.9% (updated to 0.9%, see notes below)

Of all the time spent in l1t::AXOL1TLCondition::evaluateCondition

hls4mlEmulator::ModelLoader::load_model() is 54%
hls4mlEmulator::ModelLoader::~ModelLoader() is 30%
GTADModel_emulator_v4::predict() is 15%
the rest is around 1%

IIUC, the load and destruction of the model happens 10 times per event; I don't see any dependence on the current event variables.
Some kind of caching may be useful to get HLT to run a bit faster (seems like ~~1.5%~~ (updated) 0.6% or so).

The text was updated successfully, but these errors were encountered:

cmsbuild · 2024-11-19T20:12:52Z

cms-bot internal usage

cmsbuild · 2024-11-19T20:12:53Z

A new Issue was created by @slava77.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

mmusich · 2024-11-19T21:18:42Z

@aloeliger @artlbv FYI

aloeliger · 2024-11-19T21:20:19Z

@quinnanm

aloeliger · 2024-11-19T21:20:38Z

I can also take a look at this when I get some time, frankly I've been meaning to for a while.

mmusich · 2024-11-19T21:27:30Z

In my test L1TGlobalProducer::produce takes 9.7% of the time; in the full menu it's apparently around 2.5%

in a more "realistic" setup it's more like sub-percent

(details from #45631 (comment))

makortel · 2024-11-19T21:33:54Z

assign l1

makortel · 2024-11-19T21:34:14Z

type performance-improvements

cmsbuild · 2024-11-19T21:34:16Z

New categories assigned: l1

@aloeliger,@epalencia you have been requested to review this Pull request/Issue and eventually sign? Thanks

slava77 · 2024-11-19T21:49:41Z

in a more "realistic" setup it's more like sub-percent

My 2.5% comes from a more recent test (but on GRun menu) by @bdanzi
The odd part there is that e.g. in the first iteration in the first 6 jobs L1TGlobalProducer takes under 1%, but in the last two it's around 6%

mmusich · 2024-11-19T21:52:49Z

My 2.5% comes from a more recent test (but on GRun menu) by @bdanzi

I'd rather trust more the manual measurement than what comes from the timing server, tbh.

slava77 · 2024-11-20T23:04:29Z

essentially all of the cost is in dlopen and dlclose calls.
https://github.com/cms-hls4ml/hls4mlEmulatorExtras/blob/v1.1.3/src/hls4ml/emulator.cc#L12-L22
create_model has orders of magnitude lower cost

perhaps just having the .so file loaded in the constructor or begin job is enough

artlbv · 2024-11-21T08:51:55Z

IIUC, the load and destruction of the model happens 10 times per event; I don't see any dependence on the current event variables.

The "problem" is that the model is evaluated (and loaded) for every threshold and BX separately, even though it is the same model. If caching is possible that would help of course and in principle the model(s) are known at the beginning of the job as they are fixed in the L1 menu.

IMO it would be good to have some common approach to this loading of HLS4ML models within L1/CMSSW, as e.g. here it seems to be done differently:
https://github.com/cms-sw/cmssw/blob/CMSSW_14_2_X/L1Trigger/Phase2L1ParticleFlow/interface/JetId.h#L50-L53

makortel · 2024-11-21T17:41:32Z

The "problem" is that the model is evaluated (and loaded) for every threshold and BX separately, even though it is the same model. If caching is possible that would help of course and in principle the model(s) are known at the beginning of the job as they are fixed in the L1 menu.

Given that the interface of hls4mlEmulator::Model is not const, the best that one can do is to call ModelLoader::load_model() once per module per stream (in constructor of edm::stream module, or using StreamCache with edm::global module, or in constructor of edm::one module).

This is how

cmssw/L1Trigger/Phase2L1ParticleFlow/interface/JetId.h

Lines 50 to 53 in 92333e3

    
             unique_ptr<float[]> fDY_; 
        
             tensorflow::Session *sessionRef_; 
        
             std::shared_ptr<hls4mlEmulator::Model> modelRef_; 
        
           };

is used in

cmssw/L1Trigger/Phase2L1ParticleFlow/plugins/L1BJetProducer.cc

Lines 51 to 52 in 92333e3

    
           fBJetId_ = std::make_unique<JetId>( 
        
               cfg.getParameter<std::string>("NNInput"), cfg.getParameter<std::string>("NNOutput"), cache, fNParticles_);

(now if hls4mlEmulator::Model interface would be const and thread-efficient, we could avoid the per-stream copies of the hls4mlEmulator::Model)

mmusich · 2024-11-22T13:27:12Z

assign hlt

to keep it on the radar

cmsbuild · 2024-11-22T13:27:35Z

New categories assigned: hlt

@Martin-Grunewald,@mmusich you have been requested to review this Pull request/Issue and eventually sign? Thanks

jpearkes · 2024-12-10T11:16:48Z

Hi @slava77,

For these tests is the --timing flag being used in hltGetConfiguration? From tests we were running last year, we saw a big slowdown without using the --timing flag, e.g.
hltGetConfiguration /dev/CMSSW_13_3_0/GRun --data --output none --globaltag 132X_dataRun3_HLT_for2024TSGStudies_v1 --process jannicke_test1 --timing > hlt.py

I am attaching screenshots from a presentation here . The "Dec" results are when the --timing flag was not being used, the "New" results are when the --timing was being used. The difference in throughput was on the order of 50%.

mmusich · 2024-12-10T11:21:52Z

@jpearkes

For these tests is the --timing flag being used in hltGetConfiguration?

As far as I understand these measurement were derived using the timing server, so the flag should have been used by default.
Irrespective of how the timing was run, there is an opportunity to make the code more efficient, see suggestion at #46740 (comment). Is it something that is going to picked up by the AXO team?

jpearkes · 2024-12-10T11:37:57Z

@mmusich We were using the timing server for the tests too, and at least around this time last year it was not enabled by default.

Completely agreed that it is an opportunity to make the code more efficient. As after running the --timing flag the timing was no longer an issue the AXO deprioritized it. We could bump it up in priority if needed though. @quinnanm and @aloeliger are our main emulator contacts from both AXO and CICADA.

cmsbuild added the pending-assignment label Nov 19, 2024

cmsbuild added l1-pending pending-signatures performance-improvements and removed pending-assignment labels Nov 19, 2024

cmsbuild added the hlt-pending label Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potential speedup in HLT: AXOL1TLCondition::evaluateCondition consider caching the model? #46740

potential speedup in HLT: AXOL1TLCondition::evaluateCondition consider caching the model? #46740

slava77 commented Nov 19, 2024 •

edited

Loading

cmsbuild commented Nov 19, 2024 •

edited

Loading

cmsbuild commented Nov 19, 2024

mmusich commented Nov 19, 2024 •

edited

Loading

aloeliger commented Nov 19, 2024

aloeliger commented Nov 19, 2024

mmusich commented Nov 19, 2024

makortel commented Nov 19, 2024

makortel commented Nov 19, 2024

cmsbuild commented Nov 19, 2024

slava77 commented Nov 19, 2024

mmusich commented Nov 19, 2024

slava77 commented Nov 20, 2024

artlbv commented Nov 21, 2024

makortel commented Nov 21, 2024

mmusich commented Nov 22, 2024

cmsbuild commented Nov 22, 2024

jpearkes commented Dec 10, 2024 •

edited

Loading

mmusich commented Dec 10, 2024

jpearkes commented Dec 10, 2024

potential speedup in HLT: AXOL1TLCondition::evaluateCondition consider caching the model? #46740

potential speedup in HLT: AXOL1TLCondition::evaluateCondition consider caching the model? #46740

Comments

slava77 commented Nov 19, 2024 • edited Loading

cmsbuild commented Nov 19, 2024 • edited Loading

cmsbuild commented Nov 19, 2024

mmusich commented Nov 19, 2024 • edited Loading

aloeliger commented Nov 19, 2024

aloeliger commented Nov 19, 2024

mmusich commented Nov 19, 2024

makortel commented Nov 19, 2024

makortel commented Nov 19, 2024

cmsbuild commented Nov 19, 2024

slava77 commented Nov 19, 2024

mmusich commented Nov 19, 2024

slava77 commented Nov 20, 2024

artlbv commented Nov 21, 2024

makortel commented Nov 21, 2024

mmusich commented Nov 22, 2024

cmsbuild commented Nov 22, 2024

jpearkes commented Dec 10, 2024 • edited Loading

mmusich commented Dec 10, 2024

jpearkes commented Dec 10, 2024

slava77 commented Nov 19, 2024 •

edited

Loading

cmsbuild commented Nov 19, 2024 •

edited

Loading

mmusich commented Nov 19, 2024 •

edited

Loading

jpearkes commented Dec 10, 2024 •

edited

Loading