-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demo version #1
Demo version #1
Conversation
update project cov dir
): | ||
"""Select a time slice from a Dataset or DataArray.""" | ||
used_duration = history_duration is not None and forecast_duration is not None | ||
used_intervals = interval_start is not None and interval_end is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly just out of curiosity: when is interval slicing used? I've never seen it, so I'm assuming GSPs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually do use it in datapipes. I actually want to propose that we get rid of [history, forecast]_duration
in favour of interval_[start, end]
. There are some cases where interval makes more sense, and it dates back to this issue in datapipes. If we did this we could also get rid of live_delay_minutes
in the satellite config which is slightly redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for pointing me!
This reverts commit 709178a.
if "target_time_utc" in da.coords: | ||
example[NWPBatchKey.nwp_target_time_utc] = da.target_time_utc.values.astype(float) | ||
|
||
# TODO: Do we need this at all? Especially since it is only present in UKV data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this either needs to include lat lon or not exist, otherwise it's kind of strange that only ukv gets positional features? Unless I'm misunderstanding the purpose of this.
if "nwp" in dataset_dict: | ||
|
||
nwp_numpy_modalities = dict() | ||
|
||
for nwp_key, da_nwp in dataset_dict["nwp"].items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so vague I hesitate to even call it a suggestion, but I feel like this unpacking happens over and over and over in every other function, which makes me wonder if it would be, well, maybe not easier, but clearer to just plain store the nwp sources unpacked and prefixed with nwp.
or something? It will mess with checking key by key, so we will have to revert to a for loop iterating over the keys, and the making of all the specific parts nwp functions need will have to be moved out somewhere, but I feel like it would be so much neater if we could get it down to something (VERY vaguely) like:
def is_nwp(key):
split_key = key.split()
return split_key[0] == "nwp"
def return_spacial_slice(da, config, key):
if is_nwp(key):
return spacial_slice_nwp(da, config, **get_additional_nwp_kwargs(da, config))
else:
return spacial_slice(da, config)
for key, dataset in datasets_dict.items():
config = config.key
slice = return_spacial_slice(dataset, config, key)
slices.append(slice)
It may well be that this is a me-problem (I am known to have a massive sore spot over the separate processing of nwps and non-nwps), but thought I'd through it in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could even have the dictionary contain more strongly typed elements, e.g.
from typing import NamedTuple
class DAContainer(NamedTuple):
key: str
"""??? Something different to da.name?"""
da: xr.DataArray
"""The actual xarray DataArray containing the data"""
# Note the stronger typing in the dictionary parameter, which I guess is something like
# {
# "nwp": [DAContainer("ECMWF_INDIA", xr.DataArray(...)), DAContainer(...)],
# "sat": [DAContainer("", xr.DataArray(...)), ...],
# }
def process_and_combine_datasets(dataset_dict: dict[str, list[DAContainer]], config: Configuration) -> NumpyBatch:
if "nwp" in dataset_dict:
nwp_numpy_modalities = dict()
for dac in dataset_dict["nwp"]:
provider = config.input_data.nwp[dac.key].nwp_provider
da_nwp = (dac.da - NWP_MEANS[provider]) / NWP_STDS[provider]
...
I haven't looked at this carefully enough, but it might just be the case that you can use the DataArray "name" attribute to avoid this altogether, I don't know what the "nwp_key" is meant to represent other than this? Then you could have
def process_and_combine_datasets(dataset_dict: dict[str, list[xr.DataArray]], config: Configuration) -> NumpyBatch:
if "nwp" in dataset_dict:
nwp_numpy_modalities = dict()
for da in dataset_dict["nwp"]:
provider = config.input_data.nwp[da.name].nwp_provider
da_nwp = (da - NWP_MEANS[provider]) / NWP_STDS[provider]
...
...
for nwp_key, da_nwp in dataset_dict["nwp"].items(): | ||
# Standardise | ||
provider = config.input_data.nwp[nwp_key].nwp_provider | ||
da_nwp = (da_nwp - NWP_MEANS[provider]) / NWP_STDS[provider] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably requires a whole other conversation re: whether we're keeping the whole statistics in consts thing or are they moving into the data attributes or something, but could be worth thinking about moving this into a separate normalisation function? It will also provide space for different handling of variables such as prate that don't mix well with standard normalisation
|
||
def fill_time_periods(time_periods: pd.DataFrame, freq: pd.Timedelta): | ||
datetimes = [] | ||
for _, row in time_periods.iterrows(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this could be vectorised instead of looping through all the rows in the df and that would be more efficient? Happy to try to do that if you agree
end_buffer = minutes(0) | ||
|
||
# This is the max staleness we can use considering the max step of the input data | ||
max_possible_staleness = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More concretely on the find_valid_t0_times
function perhaps being too long I think this part about the staleness could be moved into a get_max_staleness
function, happy to do that if you think it's a good idea
if modality_name not in {"gsp"}: | ||
raise NotImplementedError(f"Modality {modality_name} not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very probably just a for-now thing, but in case it isn't: it should probably support "pv" as well for pvsite. but again, we'll probably add it when we add the pvsite pipeline
return contiguous_t0_periods | ||
|
||
|
||
def _find_contiguous_t0_periods_nwp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but is this ever used? It looks like an attempt to hide stuff like max_staleness calculation and such under the hood, is that what it is?
@all-contributors |
I've put up a pull request to add @dfulu! 🎉 |
@all-contributors |
I've put up a pull request to add @AUdaltsova! 🎉 |
@all-contributors please add @Sukh-P for code. |
I've put up a pull request to add @Sukh-P! 🎉 |
@all-contributors please add @peterdudfield for code. |
I've put up a pull request to add @peterdudfield! 🎉 |
This pull request provides the minimal demo version of this library. Still lots to do
TODO:
TODO list #7