Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added icon to nwp providers #72

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gabrielelibardi
Copy link

Pull Request

Description

I added icon to the nwp providers. Specifically the changes should allow to an xarray lazily from a list of .zarr or .zarr.zip paths downloaded from here https://huggingface.co/datasets/openclimatefix/dwd-icon-eu.
This pull request is created to address this issue #66 (comment).
In principle this should work even if ones uses remote paths directly to the the .zarr.zip files however because of the many request made to the hugging face server in a short time this may result in a 429 Error. There are ways around this as mentioned in the issue, that have not yet been implemented.

Fixes #

How Has This Been Tested?

see the test_load_icon_eu added to ocf-data-sampler/tests/load/test_load_nwp.py

  • [X ] Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings


from ocf_data_sampler.load.nwp.providers.utils import open_zarr_paths

def transform_to_channels(nwp : xr.Dataset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right in thinking that the input here is an xarray Dataset which has multiple data variables for each NWP variable and we want to go from that to a DataArray (e.g. one data variable but an extra channel dimension?)

I think a simpler approach might be to do something like what is done here https://github.com/openclimatefix/ocf_datapipes/blob/main/ocf_datapipes/load/nwp/providers/gfs.py#L26 where we use to_array() on the Dataset to convert it to a DataArray and then rename the variable dimension which is created with to_array() to channel

But I may have misunderstood the intention/need for this function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are perfectly right, I deleted this and use to_array() instead, thx for pointing it out!

@Sukh-P
Copy link
Member

Sukh-P commented Oct 28, 2024

Thanks for creating this PR and the great work already done on trying to support ICON data in this library!

Something to note is that if this is added in as is that people may assume this library already supports ICON data but without some normalisation constants added and ICON listed as an NWP provider here creating samples from it won't work, so my suggestion is that either this is added in this PR, or in a subsequent PR or this outstanding work is clearly documented in a Github issue or README, thanks!

@gabrielelibardi
Copy link
Author

Thanks for creating this PR and the great work already done on trying to support ICON data in this library!

Something to note is that if this is added in as is that people may assume this library already supports ICON data but without some normalisation constants added and ICON listed as an NWP provider here creating samples from it won't work, so my suggestion is that either this is added in this PR, or in a subsequent PR or this outstanding work is clearly documented in a Github issue or README, thanks!

I can compute the std and mean constants, do you have a script to do this for others NWP? How large of a sample do you take?

@Sukh-P
Copy link
Member

Sukh-P commented Nov 1, 2024

I can compute the std and mean constants, do you have a script to do this for others NWP? How large of a sample do you take?

Thanks, that would be great! So I don't think we have a script in Github currently so just created this gist to share some example code of how I have calculated some of the normalisation stats previously, in the example I used 200 samples I think that would be fine for this too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants