Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices for publishing numerical weather prediction data in WIS2 #158

Open
6a6d74 opened this issue Aug 1, 2024 · 8 comments
Open

Comments

@6a6d74
Copy link
Collaborator

6a6d74 commented Aug 1, 2024

At Met Office we're trying to work out the best way to migrate our existing NWP GRIB bulletins from GTS into WIS2.

In GTS, the bulletins are packaged by physical parameter. This allows data users to select which GRIB files to retrieve based on the parameter. The view from the Met Office team is that this is still a valid requirement: the files are big - so there's value in downloading only those that are of interest. So the question is: how do we cater users who want to download only specific parameters from a model run?

Options are listed below.

Option 1: Magic "filenames"
Users somehow are able to parse the "data-id" attribute in the WIS Notification Message to identify that a given data item relates to a specific parameter.
Pros: No changes required to WIS2.
Cons: Embedding metadata in filenames is poor practice - how do users know how to parse the filename to extract the parameter, what controlled vocab is used for the parameter, etc. A file-naming convention would need to be established that defines this. And every model data publisher would need to implement. Maybe the GTS file-naming convention could be reused - but then we're binding GTS into WIS2 which I think is not helpful.

Option 2: Fine-grained dataset definition and subscription topics
Allow users to subscribe to topics that align with specific parameters so they only receive notification messages about model data containing the parameters they're interested in.
Pros: No changes required to WIS2. Simple for users - all the filtering is done "server-side", i.e., a user only subscribes to what they want.
Cons: Complex for data publishers, (i) publishing notifications for model output on multiple topics, (ii) all those topics need to be registered, (iii) because the subscription topic is defined in the discovery metadata, a proliferation of topics implies a proliferation of discovery metadata records all of which need to be managed. This would lead toward fine-grained discovery metadata records - and a problem similar to WIS1 where we had too many metadata records.

It is possible to put multiple subscription end-points/topics in one discovery metadata file but how would a user know which topic relates to which parameter? We're into a similar problem of having magic filenames (see option 1).

Option 3: Put parameter information in the WIS Notification Message
Use an attribute in the WIS Notification Message to indicate physical parameter (or a list of parameters). Users would receive notifications for data from an entire model run, but they can do client-side filtering to identify which notification messages relate to data for the parameters they're interested in, and only download those files.
Pros: No changes required to WIS2 - the WIS Notification Message is extensible so we can add extra attributes, but this would benefit from standardisation.
Cons: Slightly more complex for data publishers - they need to add the parameter attribute into the notification message. Slightly more complex for users - they need to do client-side filtering to select data based on the parameter attribute.

Standardising the parameter attribute in the WIS Notification Message would be beneficial.

Option 4: Sidecar files with metadata
Publish an "index" file alongside the GRIB files which says what data is in each GRIB file. NOAA and ECMWF already do this - but not in the same way. STAC (stacspec.org) provides a widely adopted and simple mechanism to provide structured metadata about file-based data assets (and note that the basic/mandatory content of a STAC item is pretty similar to that of a WIS Notification Message). Data publishers could publish a STAC Collection for the model run with a set of STAC items each of which refer to a GRIB file (or other data asset).
Pros: No changes required to WIS2 - the sidecar files are just more data items! Sidecar files contain more file-level metadata allowing users to be even more selective about what they download. Adopting a STAC-based approach would enable easy integration into a broad community of ecosystems (see this example for NOAA's HRRR data - stactools-package, stac-explorer). This would provide a generalised approach that enables data users to select only the data they need - whether downloading for local use or using directly in the cloud.
Cons: A new standardisation effort, probably within ET-Data, would be needed within WMO to adopt this approach - which would take time. Data publishers and data users would need to update their workflows to adopt.

Recommendation: Option 3 seems like the best balance to me. It would be optional for data publishers - if they didn't include a parameter attribute everything still works - albeit that users can't distinguish which data files relate to which parameters and would have to download everything. Users don't have to do client-side filtering - they have the option to filter or just download everything. Standardising the parameter attribute would drive consistency across the WIS2 ecosystem - which would help system designers/vendors provide server- and client-side systems with the parameter filtering capability.

Option 4 might be a good longer-term aim.

Personally, I think Option 1 is a poor choice and Option 2 creates a mess of too many topics and datasets.

@6a6d74
Copy link
Collaborator Author

6a6d74 commented Aug 1, 2024

If we can agree the best way to publish NWP data into WIS2, the Guide should be updated with this information - plus any other changes that might be necessary (e.g., if we include an optional parameter attribute in the WIS2 Notification Message)

@golfvert
Copy link
Collaborator

golfvert commented Aug 1, 2024

I had a side discussion with Tom about this. France is having the same issue. We are putting our NWP products in "packages", to make the file size "decent". In our case, it is a mix between steps and parameters.
I guess each NWP centre may want to define packages in its own way.

First, I don't think that domain specific solutions should be in the guide. In the logic that WIS2 is "pipes".
Having a common way to present/describe the packages (if they choose to do packages) by all NWP centres is important, though.

I'd say too that Option 3 is the right option.
In the guide, we can suggest that to provide client side filtering, adding a section in the properties is the preferred method.

Then, we ask the WIPPS team (where the sublevel of the TH and metadata aspects were agreed) to discuss and agree how the filtering should look like.
This is then included in a cookbook (https://github.com/wmo-im/wis2-cookbook) or something like that.

In short, we define the overall approach, they agree on the specifics. The result is not in the guide.

@tomkralidis
Copy link
Collaborator

+1 for Option 3. WMN properties.parameter (or actualy properties.parameters[] for granules providing > 1 parameter) is the least disruptive and valuable update. It's likely that this can be used for other domains as well, and can be "filterable" from a Global Replay Service perspective.

Given this would benefit multiple domains, I propose we add to WNM proper as an optional element.

@amilan17
Copy link
Member

amilan17 commented Aug 2, 2024

@sebvi @wmo-im/tt-nwpmd

@amilan17
Copy link
Member

amilan17 commented Aug 2, 2024

@6a6d74 @golfvert Please see the decision in the TT-NWPMD for how to provide index files: wmo-im/tt-nwpmd#13

"NWPMD meeting on 2023.09.14

TT-WISMD agree to include a link to the index file in the notification message. The index file itself won't be cached. TT-NPWMD won't define the format of the index file and agreed that each Centre can use its own format of the index file. The ticket wmo-im/tt-nwpmd#13 is closed."

@6a6d74
Copy link
Collaborator Author

6a6d74 commented Aug 15, 2024

The Met Office team will look into implementing option #3, and involving IBL in that discussion

@6a6d74
Copy link
Collaborator Author

6a6d74 commented Aug 15, 2024

@6a6d74 @golfvert Please see the decision in the TT-NWPMD for how to provide index files: wmo-im/tt-nwpmd#13

"NWPMD meeting on 2023.09.14

TT-WISMD agree to include a link to the index file in the notification message. The index file itself won't be cached. TT-NPWMD won't define the format of the index file and agreed that each Centre can use its own format of the index file. The ticket wmo-im/tt-nwpmd#13 is closed."

Good to see that TT-NWPMD/TT-WISMD have discussed index file. Noting that TT-NWPMD won't define the index file format, there's still an outstanding need for standardisation before we, the WWW community, could adopt this approach for operational weather prediction.

@sebvi
Copy link

sebvi commented Aug 19, 2024

At the time we discussed index files, there was no consensus on what the format could be and it felt that spending time discussing it would slow down our work on defining the THs for weather. Agreeing on a common format is always difficult as many NMS have already their own way of indexing and are not necessarily keen on changing because it means development and allocating resources. At ECMWF, we provide indexes in the format produced by ecCodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants