-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for zeros in the data #159
Comments
Yes, I would add it to the filter here: nwp-consumer/src/nwp_consumer/internal/service/consumer.py Lines 410 to 424 in 372359c
|
Hi, I'm someone who is just starting in the field of open source development. |
You can definately contirbute. Are you familiar with python and xarray? |
I'm familiar with python and common libraries like pandas, but not with xarray |
thanks good @GAMinsect, you migh need to learn a bit of xarray. Your welcome to give it ago. |
After 2 weeks of working on it in my free time, here's my implementation. def _dataQualityFilter(ds: xr.Dataset) -> bool:
"""Filter out data that is not of sufficient quality."""
if ds == xr.Dataset():
return False
zeroCount = 0
elementCount = 0
# Carry out a basic data quality check
for data_var in ds.data_vars:
if ds[f"{data_var}"].isnull().any():
log.warn(
event=f"Dataset has NaNs in variable {data_var}",
initTime=str(ds.coords["init_time"].values[0])[:16],
variable=data_var,
)
data = ds[data_var].data
elementCount += data.size
zeroCount += (data == 0).sum()
if zeroCount / elementCount > 0.2:
raise ValueError("In your dataset more than 20% of your data are 0's")
return True
|
@peterdudfield is the code fine? |
ill let @devsjc review if thats ok. |
Detailed Description
It would be great to have a check in place that checks for zeros. A large amount of these is normally an error
Context
Possible Implementation
The text was updated successfully, but these errors were encountered: