Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReplaceProducer doesn't add outputs if outputs of old and new producer are different #279

Open
moritzmolch opened this issue Jul 26, 2024 · 2 comments · May be fixed by #280
Open

ReplaceProducer doesn't add outputs if outputs of old and new producer are different #279

moritzmolch opened this issue Jul 26, 2024 · 2 comments · May be fixed by #280
Assignees
Labels
bug Something isn't working

Comments

@moritzmolch
Copy link

I want to replace a ProducerGroup with ReplaceProducer for a scope (in this example the mm scope). However, when I do that the outputs of the new producer are not showing up in the list of all outputs (the list of branch names in the output file of the respective scope.

The old producer is

BasicBJetQuantities = ProducerGroup(
    name="BasicBJetQuantities",
    call=None,
    input=None,
    output=None,
    scopes=["mt", "et", "tt", "em", "mm", "ee"],
    subproducers=[
        NumberOfBJets,
        NumberOfBJets_boosted,
    ],
)  # outputs nbtag, nbtag_boosted

with outputs nbtag and nbtag_boosted and the new producer is

BasicBJetQuantitiesMuMu = ProducerGroup(
    name="BasicBJetQuantitiesMuMu",
    call=None,
    input=None,
    output=None,
    scopes=["mm"],
    subproducers=[
        NumberOfBJets,
    ],
)  # outputs nbtag

with the output nbtag. The modification in the main config file is declared with

configuration.add_modification_rule(
    ["mm"],
    ReplaceProducer(
        producers=[jets.BasicBJetQuantities, jets.BasicBJetQuantitiesMuMu],
        samples=available_sample_types,
    ),
)

The problem is now that I cannot find nbtag as branch in the output file of the mm scope, although it is an output of the new producer. I also declared it with the configuration.add_outputs explicitly as output variable in the mm scope.

Also, the debug output of the cmake compilation of the configuration shows that the output nbtag is removed. The variable nbtag_boosted is not appearing here as I didn't declare it as an output of the mm scope in the main configuration, so I guess this part is okay. However, there is no line saying that nbtag is added back:

...
[DEBUG   ] [rules.py           :303] scopes: ['mm']
[DEBUG   ] [rules.py           :304] Producers to be replaced: [ProducerGroup: BasicBJetQuantities, ProducerGroup: BasicBJetQuantitiesMuMu]
[DEBUG   ] [rules.py           :309] ReplaceProducer: Replace ProducerGroup: BasicBJetQuantities from producer in scope mm with ProducerGroup: BasicBJetQuantitiesMuMu
[DEBUG   ] [rules.py           :387] ReplaceProducer: Removing nbtag from outputs in scope mm
[CRITICAL] [rules.py           :130] Applying rule ProducerRule - replace ProducerGroup: BasicJetQuantities with ProducerGroup: BasicJetQuantitiesMuMu for ['ggh_htautau', 'ggh_hbb', 'vbf_htautau', 'vbf_hbb', 'rem_htautau', 'rem_hbb', 'embedding', 'embedding_mc', 'singletop', 'ttbar', 'diboson', 'dyjets', 'wjets', 'data', 'electroweak_boson', 'nmssm_Ybb', 'nmssm_Ytautau'] in scopes ['mm'] for sample dyjets
...

I think the problem lies in this else branch of the update_outputs method of the ReplaceProducer class:

else:
for scope in scopes:
for removed_output in removed_outputs[scope]:
if removed_output in outputs_to_be_updated[scope]:
log.debug(
"ReplaceProducer: Removing {} from outputs in scope {}".format(
removed_output, scope
)
)
outputs_to_be_updated[scope].remove(removed_output)
for added_output in added_outputs[scope]:
if added_output in outputs_to_be_updated[scope]:
log.debug(
"ReplaceProducer: Adding {} from outputs in scope {}".format(
added_output, scope
)
)
outputs_to_be_updated[scope].add(added_output)

Line 394 should be changed from

if added_output in outputs_to_be_updated[scope]:

to

if added_output not in outputs_to_be_updated[scope]:

right? If I apply this change, I get the following output, which is what I would expect:

[DEBUG   ] [rules.py           :303] scopes: ['mm']
[DEBUG   ] [rules.py           :304] Producers to be replaced: [ProducerGroup: BasicBJetQuantities, ProducerGroup: BasicBJetQuantitiesMuMu]
[DEBUG   ] [rules.py           :309] ReplaceProducer: Replace ProducerGroup: BasicBJetQuantities from producer in scope mm with ProducerGroup: BasicBJetQuantitiesMuMu
[DEBUG   ] [rules.py           :387] ReplaceProducer: Removing nbtag from outputs in scope mm
[DEBUG   ] [rules.py           :395] ReplaceProducer: Adding nbtag from outputs in scope mm
[CRITICAL] [rules.py           :130] Applying rule ProducerRule - replace ProducerGroup: BasicFatJetQuantities with ProducerGroup: BasicFatJetQuantitiesMuMu for ['ggh_htautau', 'ggh_hbb', 'vbf_htautau', 'vbf_hbb', 'rem_htautau', 'rem_hbb', 'embedding', 'embedding_mc', 'singletop', 'ttbar', 'diboson', 'dyjets', 'wjets', 'data', 'electroweak_boson', 'nmssm_Ybb', 'nmssm_Ytautau'] in scopes ['mm'] for sample dyjets

Finally, nbtag is then also a branch of the output file, so this change should fix the problem.

I think that this bug only affects producers, which have different sets of outputs. In that case, the if-else query goes to the if added_outputs == removed_outputs branch, which does not modify the output set.

@moritzmolch moritzmolch added the bug Something isn't working label Jul 26, 2024
@moritzmolch moritzmolch changed the title ReplaceProducer doesn't add outputs if outputs of old and new producer are different [BUG] ReplaceProducer doesn't add outputs if outputs of old and new producer are different Jul 26, 2024
moritzmolch added a commit to moritzmolch/CROWN that referenced this issue Jul 26, 2024
@moritzmolch
Copy link
Author

I added the proposed fix as a pull request: #280
Sorry for the spam, should've packed it in one thread. 😅

@harrypuuter
Copy link
Contributor

Hi,

thanks for the report. This is the way the ReplaceProducer is supposed to work (or at least sort of). There are still the RemoveProducer and AppendProducer functions that should be used if the outputs are different after replacement. ReplaceProducer should only be used, if the new producer has the same outputs as the old one, to avoid confusion and unintended Replacements. I agree that the current implementation is not ideal, and you case should raise an error with a hint to use a Remove/Append combination instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants