Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor variable scaling, pressure level scalings only applied in specific circumstances #52

Draft
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

sahahner
Copy link
Member

@sahahner sahahner commented Dec 27, 2024

Solve the problem explained in issue #7 by refactoring the variable scalings into a general variable scaling and a pressure level scaling.
@mc4117 , @pinnstorm and me came up with a new structure. This PR implements this.

This is first draft. Feedback very welcome!

  • allow several variable level scaling (i.e. pressure level and model level)
  • implement/update tests
  • decide: do we want to allow scaling by variable_ref and variable_name, i.e. scale q_50 by q and q_50?

@sahahner sahahner linked an issue Dec 27, 2024 that may be closed by this pull request
2 tasks
@b8raoult
Copy link
Collaborator

Please consider using the knowledge about variables that come from the dataset metadata. See https://github.com/ecmwf/anemoi-transform/blob/7cbf5f3d4baa37453022a5a97e17cc71a5b8ceeb/src/anemoi/transform/variables/__init__.py#L47

@sahahner sahahner linked an issue Dec 30, 2024 that may be closed by this pull request
@sahahner
Copy link
Member Author

Please consider using the knowledge about variables that come from the dataset metadata. See https://github.com/ecmwf/anemoi-transform/blob/7cbf5f3d4baa37453022a5a97e17cc71a5b8ceeb/src/anemoi/transform/variables/__init__.py#L47

We have given this some thought, and after wanting to use the information from the dataset in the beginning, I have opted for allowing the definition of our own groups here to use different scaling for self-defined groups.
Also, I was also told that it is possible to build datasets without information about the variable types and therefore not to rely on that metadata.
If you have strong opinions on this I am happy to discuss it again.

data_indices,
).get_variable_scaling()

# Instantiate the pressure level scaling class with the training configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is possible but I wonder if here we could instantiate from a list of scalars rather than specific ones? I think this is how it is done for the validation metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this would be useful, as we want to allow more scalar methods for model levels/tendency/etc.
I will have a look.


# Instantiate the pressure level scaling class with the training configuration
pressurelevelscaler = instantiate(
config.training.pressure_level_scaler,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this config location is wrong? It should be training.variable_loss_scaling.pressure_level_scalar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, in the config file, the pressure_level_scalar should be defined directly in training as before.

@sahahner sahahner changed the title pressure level scalings only applied in specific circumstances refactor variable scaling, pressure level scalings only applied in specific circumstances Jan 2, 2025
@FussyDuck
Copy link

FussyDuck commented Jan 2, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 3 committers have signed the CLA.

✅ sahahner
❌ pinnstorm
❌ mc4117
You have signed the CLA already but the status is still pending? Let us recheck it.

@HCookie HCookie self-requested a review January 6, 2025 14:36
@abstractmethod
def get_variable_scaling(self) -> np.ndarray: ...

def get_variable_group(self, variable_name: str) -> tuple[str, str, int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should put this function in an utils. I only say this because I want to use it in another application outside of the loss scalings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is very similar to the function from the plotting function. Will make sure to have only one function that can be used in both cases.

@JPXKQX
Copy link
Member

JPXKQX commented Jan 8, 2025

Hi, I would like to know what you think about making all scalers explicit in the config file. Something similar to the additional_scalers: field, but including not only the scalers per variable, but also the node_loss_weight,... The positive aspect I see is that there would be more homogeneity in the scalers defined in the metrics/loss fields.

@mc4117
Copy link
Member

mc4117 commented Jan 9, 2025

Hi, I would like to know what you think about making all scalers explicit in the config file. Something similar to the additional_scalers: field, but including not only the scalers per variable, but also the node_loss_weight,... The positive aspect I see is that there would be more homogeneity in the scalers defined in the metrics/loss fields.

Seems like a good idea! Would you like to add this in this PR?

@JPXKQX
Copy link
Member

JPXKQX commented Jan 9, 2025

Hi, I would like to know what you think about making all scalers explicit in the config file. Something similar to the additional_scalers: field, but including not only the scalers per variable, but also the node_loss_weight,... The positive aspect I see is that there would be more homogeneity in the scalers defined in the metrics/loss fields.

Seems like a good idea! Would you like to add this in this PR?

I’m not sure what the best approach is. On the one hand, adding more work to this PR would increase its complexity, which might make it more logical to address this refactor in a future PR. On the other hand, this PR already introduces some changes to the configs, and the future PRs would also involve changes to the configs. From this, it might be better to have 1 PR and communicate all the changes to users at once. What do you think?

@pinnstorm
Copy link
Member

Hi, I would like to know what you think about making all scalers explicit in the config file. Something similar to the additional_scalers: field, but including not only the scalers per variable, but also the node_loss_weight,... The positive aspect I see is that there would be more homogeneity in the scalers defined in the metrics/loss fields.

Seems like a good idea! Would you like to add this in this PR?

I’m not sure what the best approach is. On the one hand, adding more work to this PR would increase its complexity, which might make it more logical to address this refactor in a future PR. On the other hand, this PR already introduces some changes to the configs, and the future PRs would also involve changes to the configs. From this, it might be better to have 1 PR and communicate all the changes to users at once. What do you think?

I'm happy for it to be included in this PR! Not sure if @sahahner or @mc4117 have other views?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pressure Level Scalings only applied in specific circumstances Loss scalings
7 participants