Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in require behaviour #793

Open
znicholls opened this issue Oct 24, 2023 · 6 comments
Open

Regression in require behaviour #793

znicholls opened this issue Oct 24, 2023 · 6 comments

Comments

@znicholls
Copy link
Collaborator

require_data is not a drop in replacement for require_variable. This leads to a regression in behaviour with no easy fix for users.

See script below for demonstration.

Script
import numpy as np
import pandas as pd
import pyam


test = pd.DataFrame(
    np.ones((8, 3)),
    columns=[2010, 2015, 2020],
    index=pd.MultiIndex.from_tuples(
        [
            (
                "scenario_a",
                "model_a",
                "Emissions|CO2|Waste",
                "World",
                "GtC / yr",
            ),
            (
                "scenario_a",
                "model_a",
                "Emissions|CO2|Other",
                "World",
                "GtC / yr",
            ),
            (
                "scenario_b",
                "model_a",
                "Emissions|CO2|Waste",
                "World",
                "GtC / yr",
            ),
            (
                "scenario_b",
                "model_a",
                "Emissions|CO2|Industrial",
                "World",
                "GtC / yr",
            ),
            (
                "scenario_a",
                "model_b",
                "Emissions|CO2|Other",
                "World",
                "GtC / yr",
            ),
            (
                "scenario_a",
                "model_b",
                "Emissions|CO2|Industrial",
                "World",
                "GtC / yr",
            ),
            (
                "scenario_b",
                "model_b",
                "Emissions|CO2|AFOLU",
                "World",
                "GtC / yr",
            ),
            (
                "scenario_b",
                "model_b",
                "Emissions|CO2|Industrial",
                "World",
                "GtC / yr",
            ),
        ],
        names=[
            "scenario",
            "model",
            "variable",
            "region",
            "unit",
        ],
    ),
)
test = pyam.IamDataFrame(test)

if pyam.__version__.startswith("2"):
    matches_requirements = test.require_data(
        variable=["Emissions|CO2|Other", "Emissions|CO2|Waste"], exclude_on_fail=True
    )
    print("3 scenarios fail (the ones that don't have BOTH requirement)")
    print(test.exclude)
    assert test.exclude.sum() == 1

else:
    matches_requirements = test.require_variable(
        variable=["Emissions|CO2|Other", "Emissions|CO2|Waste"], exclude_on_fail=True
    )
    print("Only 1 scenario fails (the one that doesn't have EITHER requirement)")
    print(test.meta)
    assert test.meta["exclude"].sum() == 1
Behaviour with pyam-iamc 2.0 and require_data
% pip list | grep pyam-iamc && python scratch.py
pyam-iamc               2.0.0

3 scenarios fail (the ones that don't have BOTH requirement)
model    scenario  
model_a  scenario_a    False
         scenario_b     True
model_b  scenario_a     True
         scenario_b     True
dtype: bool
Traceback (most recent call last):
  File ".../scratch.py", line 85, in <module>
    assert test.exclude.sum() == 1
           ^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Behaviour with pyam-iamc 1.9 and require_variable
% pip list | grep pyam-iamc && python scratch.py
pyam-iamc          1.9.0

Only 1 scenario fails (the one that doesn't have EITHER requirement)
                    exclude
model   scenario           
model_a scenario_a    False
        scenario_b    False
model_b scenario_a    False
        scenario_b     True

I think the basic difference is that require_variable did an OR requirement (any match was marked as a match). require_data is an AND requirement (all requirements had to match in order to be marked as match).

@danielhuppmann
Copy link
Member

I assume that I did think through these changes a while back, but can't recollect my thoughts right now...

But from a first-principles point of view, I do think that checking all items in a list is more intuitive than any for a "requirement".

Question to me is what your use case is? Are you trying to ensure that at least one of "Waste" or "Other" is present?

@znicholls
Copy link
Collaborator Author

But from a first-principles point of view, I do think that checking all items in a list is more intuitive than any for a "requirement".

Me too

Question to me is what your use case is? Are you trying to ensure that at least one of "Waste" or "Other" is present?

Trying to get this to behave iiasa/climate-assessment#47 @jkikstra wrote the code that uses this and I assume was trying to do a check for one or both of them being there, but I don't actually know (see this comment for the function which calls it iiasa/climate-assessment#47 (comment))

@znicholls
Copy link
Collaborator Author

cc @phackstock

@danielhuppmann
Copy link
Member

Still not sure what the actual use case is from that comment, but I guess we could add a kwarg how={"all", "any"}, default 'all', inspired by pandas.dropna().

As for implementation, I guess if any data is present after applying the filters (=kwargs of require_data()) , then the "any"-requirement is satisfied for the filters.

@znicholls
Copy link
Collaborator Author

Here's the line where it's used: https://github.com/iiasa/climate-assessment/blob/485f3d24fc646ad8d77c65ac5e787a27dc79db04/src/climate_assessment/checks.py#L788

Up to @jkikstra and @phackstock whether it's easier to add the feature back into pyam or just hack a workaround into climate-assessment

@phackstock
Copy link
Contributor

No strong feelings from my side either way. I would say in the interest of time it's better to build a workaround in climate-assessment.

jkikstra pushed a commit to iiasa/climate-assessment that referenced this issue Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants