Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove technology_type column from tools.costs #269

Merged
merged 4 commits into from
Dec 16, 2024
Merged

Conversation

measrainsey
Copy link
Contributor

@measrainsey measrainsey commented Dec 12, 2024

Small PR to remove the technology_type column from the costs tool

This PR should close:

This column was carried over from the legacy costs calculation work. While I think it may be helpful to understand how technologies are grouped, this column doesn't really serve any purpose in the code base, as the only mentions of it in the code is it being deleted.

Additionally, when others have been adding technologies to the different modules, there has been confusion about what they should put in this column (and it also becomes added work/input).

So I've decided to remove the usage of this column from the costs tool. Shouldn't affect the tool really in any way.

But might cause some merge errors in #235.

How to review

For @khaeru and/or @glatterf42 : Read the diff and note that the CI checks all pass.

PR checklist

  • Continuous integration checks all ✅
  • Add or expand tests; coverage checks both
  • Add, expand, or update documentation.
  • Update doc/whatsnew.

@measrainsey measrainsey added the costs `.tools.costs`/cost data preparation label Dec 12, 2024
@measrainsey measrainsey self-assigned this Dec 12, 2024
@measrainsey measrainsey changed the title Costs/tech type Remove technology_type column from tools.costs Dec 12, 2024
@measrainsey measrainsey marked this pull request as ready for review December 12, 2024 11:04
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.6%. Comparing base (44087e4) to head (59c4ace).

Additional details and impacted files
@@           Coverage Diff           @@
##            main    #269     +/-   ##
=======================================
- Coverage   77.6%   76.6%   -1.0%     
=======================================
  Files        211     211             
  Lines      16079   16079             
=======================================
- Hits       12481   12332    -149     
- Misses      3598    3747    +149     
Files with missing lines Coverage Δ
message_ix_models/tools/costs/decay.py 100.0% <100.0%> (ø)

... and 8 files with indirect coverage changes

Copy link
Member

@glatterf42 glatterf42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me. One suggestion in-line, but no need to pick it up if you don't want to.

.reset_index(drop=True)
.drop(columns=["technology_type"])
reduction_joined = reduction_energy._append(reduction_module).reset_index(
drop=True
)
else:
reduction_joined = reduction_energy.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such cases, I prefer this kind of syntax:

reduction_joined = reduction_energy.copy() if module == "energy" else reduction_energy._append(reduction_module).reset_index(drop=True)

I think this is an improvement as it reduces the number of lines requiring tests and more clearly states that all you do is define reduction_joined, just slightly differently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but what is ._append() here? I thought pandas deprecated and removed methods like DataFrame.append() in v2.0 (2023), and I don't see this one in the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah interesting - it seems that function may be a "private" function that isn't supported by pandas 🙈 i'll push a commit to switch to using concat or something else like that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @glatterf42 for the suggestion!

the if...else statement is actually not on module == "energy" but on if os.path.exists(ffile). if the file exists, then the code reads in the file as a dataframe, deletes some rows in another dataframe, and then concats the two dataframes together as the output. if the file doesn't exist, then the output is just the second dataframe copied over. i thought about condensing the code like you suggested but seems like the if part requires a few more steps? 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I misunderstood what was going on. I guess its technically possible to still make this one line, but I don't know if it will be easier to read. Up to you if you want to try this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's alright, i would prefer to skip editing this part for this PR and move forward with approving/merging -- thank you! :)

@glatterf42 glatterf42 linked an issue Dec 12, 2024 that may be closed by this pull request
@khaeru khaeru merged commit d55fbf9 into main Dec 16, 2024
30 checks passed
@khaeru khaeru deleted the costs/tech-type branch December 16, 2024 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
costs `.tools.costs`/cost data preparation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cost reduction CSV has unnecessary technology_type column
3 participants