Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate legacy reporting #159

Merged
merged 55 commits into from
Apr 5, 2024
Merged

Migrate legacy reporting #159

merged 55 commits into from
Apr 5, 2024

Conversation

glatterf42
Copy link
Member

@glatterf42 glatterf42 commented Mar 25, 2024

This PR takes the current version of the "legacy reporting" from message_data's ssp_dev branch and plugs it into message_ix_models/report/legacy.

No changes were made to the code in order to preserve comparability to other versions of the legacy reporting that might exist in unknown locations. This is why the files in message_ix_models/report/legacy had to be specifically excluded from code improvement tools such as mypy and ruff.

Since we want to confirm that the legacy reporting runs with the latest version of the global model baseline snapshot on zenodo, I added information about this snapshot to SOURCES in util/pooch. Some tests apart from tests/report/test_legacy::test_legacy_report might not work due to this change. In particular, ss_reporter and unpack_snapshot_data seem to only handle snapshot-0 for now.
On my local system, running test_load almost crashed my computer trying to unpack all snapshot-1 data. It resulted in a Java heap space OutOfMemoryError.
On a tangent here, ss_reporter is a name where I'd vividly wish for a change, e.g. to simulated_solution_reporter. All these shortened names are bad in themselves, but even more so when this creates references to literal nazi organizations.

I didn't dare running test_legacy_report. I'm not sure the current setup is the best with load_snapshots a session-scoped fixture parametrized for the snapshots and yielding scenarios with them, but I'm not sure what else to do. test_legacy_report definitely still needs to be adapted to cover all function parameters/options, if that's our goal. We should at least find one working combination of those and bake it into the tests, I think.

Further TODOs that can be expected:

  • The units for the latest snapshot version might have changed; make sure the legacy_report() can handle the new units
  • The DB receiving the scenarios might need to be taught the hierarchy between the data contained in the scenarios

Before starting this PR, I started migrating the legacy reporting while cleaning it up at the same time. This is nowhere near completion, but the point I got to can be seen at https://github.com/iiasa/message-ix-models/tree/modernize/legacy-reporting.

How to review

TBA

PR checklist

  • Continuous integration checks all ✅
  • Add or expand tests; coverage checks both ✅
  • Add, expand, or update documentation.
  • Update doc/whatsnew.

@glatterf42 glatterf42 added the enh New features or functionality label Mar 25, 2024
Copy link

codecov bot commented Mar 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.6%. Comparing base (f08a344) to head (2a0e7e6).
Report is 4260 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##            main    #159     +/-   ##
=======================================
- Coverage   75.7%   74.6%   -1.1%     
=======================================
  Files         93      94      +1     
  Lines       6379    6428     +49     
=======================================
- Hits        4831    4800     -31     
- Misses      1548    1628     +80     

see 11 files with indirect coverage changes

@khaeru
Copy link
Member

khaeru commented Mar 25, 2024

Thanks for getting this started!

Quick thoughts on only a few items:

This is why the files in message_ix_models/report/legacy had to be specifically excluded from code improvement tools such as mypy and ruff.

I am undecided on whether we should also exclude them from coverage measurement. The legacy reporting is very big; in terms of line count it outweighs the entire existing message-ix-models code base. This combined with the low (previously: zero) test coverage would mean a big hit for the coverage of the package overall.

On my local system, running test_load almost crashed my computer trying to unpack all snapshot-1 data. It resulted in a Java heap space OutOfMemoryError.

In theory, the unpacking of the data goes parameter-by-parameter from the Excel file into the Gzipped CSVs, and shouldn't hold more than one in memory at a time. We could try forcing the backend to clear its cache after each one.

Also note that the repo contains a pre-unpacked version of snapshot-0, which saves time on CI. I think this might get unwieldy if we do this for every snapshot, but perhaps could do that in this case.

If this gets to be a big set of tasks, we could do it in its own PR and then rebase this one.

simulated_solution_reporter

😬 Good catch, 100% we should change this.

We should at least find one working combination of those and bake it into the tests, I think.

Agreed, 1 and no more for this initial PR. These can later be expanded as (internal) users need to ensure stable behaviour for other combinations of settings (though, again, we should prefer to expand the features of the genno-based reporting to provide what is needed).

@glatterf42
Copy link
Member Author

The tests are taking so long that I'll probably temporarily disable all but one version, too. And we might want to consider moving these tests to some kind of nightly workflow instead of testing on every push to every PR.

@khaeru
Copy link
Member

khaeru commented Mar 26, 2024

…we might want to consider moving these tests to some kind of nightly workflow instead of testing on every push to every PR.

Indeed, see #139

@glatterf42 glatterf42 force-pushed the migrate/legacy-reporting branch 2 times, most recently from e65b427 to 3f87d74 Compare March 27, 2024 07:12
@glatterf42
Copy link
Member Author

On main, we currently don't have the .xlsx snapshot files committed. With their most recent location update, I hope we finally register the existing unpacked data correctly, at least for snapshot-0. snapshot-1 might take significantly longer still since it's calling ixmp.Scenario.read_excel() directly without checking existince of other files first.
If this is indeed true, we might want to use message_ix_models.model.snapshot.read_excel() for snapshot-1, too.
Also, we should consider removing the .xlsx files from our commit history again to keep that smaller (or are they kept from main's history automatically if we delete them here before merging?).

Either way, the snapshot-0 data still contain "Equation and variable data" that snapshot-1 is still missing. They were added through this commit, which describes the command with which they were created.

@glatterf42
Copy link
Member Author

glatterf42 commented Mar 29, 2024

This PR is going to create a headache because we are again including a demand.csv.gz and DEMAND.csv.gz and this time, we need both of them. report/legacy/pp_utils.py relies on DEMAND being there, unless we rewrite it.

Unfortunately, this duplicate name is not what's causing the current test failure. Testing locally, the check if name in parameters in model/snapshot::read_excel (correctly, I think) eliminates DEMAND from the list of parameters it wants to add.
So I'm not quite sure why read_excel(add_units=True) doesn't add this unit anymore. Most likely sets.xlsx simply doesn't contain this unit, while the whole snapshot xlsx file does.

@glatterf42 glatterf42 force-pushed the migrate/legacy-reporting branch 7 times, most recently from 376f880 to 489e179 Compare April 3, 2024 13:26
@glatterf42
Copy link
Member Author

Regarding test coverage, I waited with a decision to see how large the final drop would be. Even with the existing test (which is a single call to the legacy report() function, admittedly), this would lead to almost 30% reduced coverage, so I'm opting to exclude the migrated code from coverage instead.

@glatterf42
Copy link
Member Author

The main issue about reduced test coverage seems to be that the functions in model/snapshot.py like unpack() and read_excel() don't count as being tested since they are only run as part of a fixture. I'm not sure how we could tell codecov that this is intentional -- or maybe it shouldn't be, after all: maybe fixtures should only be very reliable code because the other tests depend on this code reliably running fine.

@khaeru
Copy link
Member

khaeru commented Apr 4, 2024

Towards getting this merged, can we please reword or drop the following commits?

8ba1a389 remove debug output to study failing scenario
e8c2afe7 Add debug output to study failing scenario

I will meanwhile go through and leave some comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, we may want to write tests that refer to the output of the legacy reporting—for instance, to check improvements to the genno-based reporting.

There would be different ways to do that:

  • On one's own machine, run the legacy reporting on snapshot-1 (either directly or via the tests) and either:
    • Pick up the generated file.
    • Export the time series data stored on the Scenario.
  • As part of this workflow, add actions/upload-artifact to scoop up the generated file.

I don't think we need to do either for this PR, just making a note of our options

@glatterf42 glatterf42 force-pushed the migrate/legacy-reporting branch from b1d58ee to d8a6fa9 Compare April 4, 2024 10:54
@glatterf42 glatterf42 force-pushed the migrate/legacy-reporting branch from 7abdff4 to 2ad3837 Compare April 5, 2024 08:44
@glatterf42 glatterf42 force-pushed the migrate/legacy-reporting branch from 2ad3837 to 2a0e7e6 Compare April 5, 2024 09:44
@glatterf42 glatterf42 merged commit ad7d356 into main Apr 5, 2024
22 of 23 checks passed
@glatterf42 glatterf42 deleted the migrate/legacy-reporting branch April 5, 2024 10:14
khaeru added a commit that referenced this pull request Apr 5, 2024
khaeru added a commit that referenced this pull request Apr 8, 2024
@khaeru khaeru mentioned this pull request Apr 22, 2024
2 tasks
@glatterf42 glatterf42 mentioned this pull request Aug 22, 2024
4 tasks
@khaeru khaeru added the report-legacy Legacy reporting label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enh New features or functionality report-legacy Legacy reporting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants