Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FutureWarning: ChainedAssignmentError #894

Open
quant12345 opened this issue Dec 15, 2024 · 8 comments
Open

FutureWarning: ChainedAssignmentError #894

quant12345 opened this issue Dec 15, 2024 · 8 comments

Comments

@quant12345
Copy link
Contributor

quant12345 commented Dec 15, 2024

I see several warnings (for example, here is one):

tests/test_plotting.py::test_line_color_fill_between_interpolate
  D:\a\pyam\pyam\tests\test_plotting.py:153: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
  You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
  A typical example is when you are setting values in a column of a DataFrame, like:
  
  df["col"][row_indexer] = value
  
  Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.
  
  See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
    df.data.loc[len(df.data) - 1] = newdata

I wanted to fix this, but I can't find where exactly the chain indexing happens. It's clearly not on row 153, but somewhere higher. Probably when generating the class with the data.

I also tried (doesn't help):

df.data.loc[len(df.data) - 1, :] = newdata

Tried searching the][repository to find possible chain indexing. Printing data, but mostly it was dictionaries or in the end it didn't apply to the tests of interest.

@danielhuppmann
Copy link
Member

My guess it that this warning is indeed a result from line 153 - pandas is sometimes overly careful with these future-warnings.

Note that df.data is an attribute of the IamDataFrame and I think that this test-implementation is from the early days of pyam when df.data was the actual pandas.DataFrame. So this test should probably be refactored entirely.

@quant12345
Copy link
Contributor Author

quant12345 commented Dec 17, 2024

I also tried to print a line with data before and after reinstallation. It did not change:

newdata = [
        "test_model1",
        "test_scenario1",
        "World",
        "Primary Energy|Coal",
        "EJ/y",
        2010,
        3.50,
    ]

print('before')
print(df.data.loc[len(df.data) - 1])

df.data.loc[len(df.data) - 1] = newdata

print('after')
print(df.data.loc[len(df.data) - 1])

And this also gives zeros:

print(df.filter(model="test_model1", scenario="test_scenario1"))

If I'm not mistaken, df.data dumps data into a dataframe and they will be independent of df?

Update

Tried to create pyam.IamDataFrame from new data. And then combine pyam.concat.

code:
@pytest.mark.mpl_image_compare(**MPL_KWARGS)
def test_line_color_fill_between_interpolate(plot_df):
    # designed to create the sawtooth behavior at a midpoint with missing data
    df = pyam.IamDataFrame(plot_df.data.copy())
    fig, ax = plt.subplots(figsize=(8, 8))
    newdata1 = [
        "test_model1",
        "test_scenario1",
        "World",
        "Primary Energy|Coal",
        "EJ/y",
        2010,
        3.50,
    ]
    newdata2 = [
        "test_model1",
        "test_scenario1",
        "World",
        "Primary Energy|Coal",
        "EJ/y",
        2012,
        3.50,
    ]
    newdata3 = [
        "test_model1",
        "test_scenario1",
        "World",
        "Primary Energy|Coal",
        "EJ/y",
        2015,
        3.50,
    ]
    columns = ['model', 'scenario', 'region', 'variable', 'unit', 'year', 'value']

    new_data = pd.DataFrame([newdata1, newdata2, newdata3], columns=columns)
    new_pyam = pyam.IamDataFrame(new_data)
    df = pyam.concat([df, new_pyam])
    df.plot(ax=ax, color="model", fill_between=True, legend=True)
    return fig

Tests fail: FAILED tests/test_plotting.py::test_line_color_fill_between_interpolate - ValueError: Duplicate rows in data.

How do you solve the duplicate problem in pyam.concat?

@danielhuppmann
Copy link
Member

If I'm not mistaken, df.data dumps data into a dataframe and they will be independent of df?

Correct.

How do you solve the duplicate problem in pyam.concat?

You are creating a new IamDataFrame that has duplicate entries with the initial dataframe, so concat() raises an error - as it should. Why are you trying to concat overlapping data?

@quant12345
Copy link
Contributor Author

quant12345 commented Dec 18, 2024

Need to try dumping the data into a dataframe, adding what you need, and converting it back to pyam.IamDataFrame.

Just need to decide which columns are considered for duplicates, all except value?

@danielhuppmann I did PR #900 but on macos-13 py3.12 and ubuntu-latest py3.12 the tests fail:

FAILED tests/test_plotting.py::test_line_color_fill_between_interpolate - Failed: Error: Image files did not match.
  RMS Value: 4.395753712151479
  Expected:  
    /tmp/tmpiqvpod4m/tests.test_plotting.test_line_color_fill_between_interpolate/baseline.png
  Actual:    
    /tmp/tmpiqvpod4m/tests.test_plotting.test_line_color_fill_between_interpolate/result.png
  Difference:
    /tmp/tmpiqvpod4m/tests.test_plotting.test_line_color_fill_between_interpolate/result-failed-diff.png
  Tolerance: 
    2

@danielhuppmann
Copy link
Member

The duplicates are in the rows of df._data, not the columns.

@quant12345
Copy link
Contributor Author

I don't convert plot_df.data.copy() to pyam.IamDataFrame, but first set the values ​​and remove duplicate rows by taking the rows of the columns_ = ['model', 'scenario', 'region', 'variable', 'unit', 'year']:

df = df.drop_duplicates(subset=columns_).reset_index(drop=True)

and then convert it to pyam.IamDataFrame:
df = pyam.IamDataFrame(df)

What could be the problem here? Locally the test passes for me, it breaks in Actions py3.12.

@danielhuppmann
Copy link
Member

Tests are failing on py3.12 because on the latest supported version, we are using pytest-mpl to check that the generated figures are indeed as expected, see the Actions workflow

# run tests with Matplotlib & CodeCov on latest Python version

Figure created from your branch
test_line_color_fill_between_interpolate

Original figure
test_line_color_fill_between_interpolate

I'm not sure what exactly you are trying to do, but you are clearly changing the actual data used when generating the plot.

Look at the test-readme for more info into running the tests with figure-comparison: https://github.com/IAMconsortium/pyam/blob/main/tests/README.md

@quant12345
Copy link
Contributor Author

quant12345 commented Dec 18, 2024

And as you said, pyam.IamDataFrame used to be like a pandas dataframe. In the tests, the data is changed and supplemented, if I understand correctly? That's why I don't convert the dataframe to pyam.IamDataFrame first, but I do what the code did in the past and only then convert the data to pyam.IamDataFrame. Thanks for the tip, I also found the commit where this was done.

In my code, if I don’t change anything at all, the tests pass and the picture is the same:

df = pyam.IamDataFrame(plot_df.data.copy())
df.plot(ax=ax, color="model", fill_between=True, legend=True)

then it is not at all clear what these lines do?:

df.data.loc[len(df.data) - 1] = newdata
df.data.loc[len(df.data)] = newdata
df.data.loc[len(df.data) + 1] = newdata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants