-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kerchunk / VirtualiZarr way to open radar files #187
Comments
Sounds great! Happy to chat. |
Thanks @aladinor for this initiative! I've quickly skimmed the documentation of kerchunk and VirtualiZarr, still trying to get behind this. But I already have some questions.
1 - furuno, 2 - sigmet/nexrad, 3 - rainbow The acknowledged standards are CfRadial1/2 (NetCDF4) and ODIM_H5 (hdf5). So I do not see issues for these files. |
Okay @kmuehlbauer I'm very pleased to have your thoughts on all this, but you've really come in with the hardest questions / most tricky file formats here!!
Yes. This is a limitation of VirtualiZarr right now, which originally comes from the kerchunk definition of references, and would be enshrined in the proposed zarr chunk manifest specification. This format is sufficient for HDF5, netCDF, TIFF, GRIB and FITS, but possibly not for your filetypes! Note also that my understanding is that this is motivated by what's supporting when reading from cloud object storage, i.e. http range requests. If we cannot think of how to read data efficiently from your file formats using http range requests then perhaps there is not much point in trying to kerchunk/virtualize them....
So multiple variables are compressed into a single chunk? That already is outside of the Zarr model, where compression is always defined per-array.
I don't think so... Again, does cloud object storage actually support reading with a strided pattern? If not then I suppose you would either have to read all 3 variables' data to get at one, or issue many many http requests to get at each 100 bytes. Both of those sound very inefficient. You might want to ask about these things on the chunk manifest Zarr spec proposal issue.
I'm not quite sure I follow. Do you mean that a single chunk does not correspond to a fixed byte length?
Again I'm not sure I follow. As long as given a byte range, one could read those bytes and apply a known decoding step to get out the array bytes then you are good.
You should be fine with these formats. |
I mentioned this to @d-v-b in the zarr call just now, and we thought that essentially the virtualizarr effort is trying to make zarr into a "superformat", sort of a superset of other formats such as netCDF. However, although you might imagine altering the proposed chunk manifest to accommodate more formats (e.g. by adding a |
Thanks @TomNicholas, I thought so. So we still can engage here for the formats which fit. And for the other formats it's a clear message to radar manufacturers, weather services and data providers what their formats should be like if cloud readiness is the aim. The good thing is that with the new standard FM301/CfRadial2 WMO choose wisely an hdf5/NetCDF based format. |
Thanks, @kmuehlbauer and @TomNicholas, for bringing this all up in the conversation. We can start looking for radar cloud-amenable formats, see if we can create a backend for VirtualiZarr, and then see what possible solution we can find for the other formats. Please let me know your thoughts. |
Hi everyone,
Handling historical radar datasets can often be overwhelming. To simplify this process, I propose we adopt the concepts from Kerchunk / Virtualizarr to create reference files. By leveraging these tools, we can read multiple radar data files in a Zarr-like manner, significantly enhancing our capabilities for big data historical analysis.
Proposed Approach:
Benefits:
I've previously discussed this idea with @TomNicholas, @kmuehlbauer, and @mgrover1. I'd like to start a discussion thread and possibly arrange a meeting to explore this further.
I look forward to your feedback and thoughts on this proposal. Let's collaborate to make historical radar data analysis more efficient and accessible for everyone!
The text was updated successfully, but these errors were encountered: