Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roundtrip of geotiff/nc to zarr to geotiff/nc as test ground to find what info needs to be saved #50

Open
felixcremer opened this issue Jun 12, 2024 · 3 comments

Comments

@felixcremer
Copy link

In one of the last meetings we talked about a roundtrip between python and Julia to test the implementations. Here I am proposing that we should also look at a roundtrip between data formats to see what is needed in geozarr to make this roundtrip possible without loss of information and therefore what is needed to save geotiff or netcdf-like datasets in a zarr file.
We could start with a geotiff file, open it in the software of choice, save it to a geozarr, open it again and save it in a geotiff. After this roundtrip we should get the same geotiff file that we started with. The same should be done with a NetCDF file instead of a geotiff file.
This way we would learn what we need to save in a geozarr to save the different data models without loss of information.

What are good example files for this kind of test?
Are there relevant collections of small test files for netcdf and geotiff available?
For NetCDF if found https://www.unidata.ucar.edu/software/netcdf/examples/files.html

For GeoTIFF I found https://github.com/GeoTIFF/test-data
Can we make a selection of these files to capture most of the common use cases?

@rbavery
Copy link

rbavery commented Jul 13, 2024

Could GeoZarr also map to a STAC Collection of geotiffs in addition to a group of geotiffs without STAC metadata?

I think showing a roundtrip from a query of a commonly used STAC collection like https://stacindex.org/catalogs/earth-search#/43bjKKcJQfxYaT1ir3Ep6uENfjEoQrjkzhd2?t=3 would be a useful example. Geotiffs don't encode necessary information like the time a scene was captured, so an external metadata file like a STAC JSON is needed. This example scene highlights the many non coordinate attributes it would be useful to capture, like cloud cover percentage and sun azimuth.

Existing libraries stackstac, odc-stac, and xpystac handle opening STAC item collections as xarray DataArrays or Datasets but when these are saved to Zarr or NetCDF it's easy to lose the geospatial metadata and difficult to roundtrip back to geotiff.

It would be great if there was a tool (instead of stackstac vs odc-stac vs xpystac) that understood how to convert STAC queries to GeoZarr.

The STAC Collection linked above can be accessed with pystac_client. This tutorial covers this access pattern.

@mdsumner
Copy link

mdsumner commented Sep 4, 2024

I would suggest using a OISST netcdf (unproblematically a regular grid on clean 0,360,-90,90) with degenerate rectilinear coords expressing that extent on cell centres (at 0.25 degree increments).

For GeoTIFF the NSIDC 25km daily sea ice concentration imagery on either pole, described by geotransform (equivalent to an outer edge extent), again representable by degenerate rectilinear coords on cell centres with pole specific crs. I can put this together if that's helpful 🙏. I think other combinations.of situations proliferate from those two cases, and I can see real world examples that would work

@rbavery
Copy link

rbavery commented Oct 8, 2024

I agree with this approach and think having test cases that operate on data that does and does not assume extent on cell center would be valuable. most satellite imagery I work with does not use cell center.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants