Skip to content

Commit

Permalink
Merge pull request #59 from fema-ffrd/feature/zmeta
Browse files Browse the repository at this point in the history
Zarr Metadata
  • Loading branch information
thwllms authored Jul 25, 2024
2 parents 000829f + 497bdeb commit 6b45130
Show file tree
Hide file tree
Showing 19 changed files with 19,205 additions and 43 deletions.
16 changes: 16 additions & 0 deletions docs/source/API.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
API
===
.. toctree::
:maxdepth: 1

RasGeomHdf
RasPlanHdf
RasHdf

:code:`rashdf` provides two primary classes for reading data from
HEC-RAS geometry and plan HDF files: :code:`RasGeomHdf` and :code:`RasPlanHdf`.
Both of these classes inherit from the :code:`RasHdf` base class, which
inherits from the :code:`h5py.File` class.

Note that :code:`RasPlanHdf` inherits from :code:`RasGeomHdf`, so all of the
methods available in :code:`RasGeomHdf` are also available in :code:`RasPlanHdf`.
91 changes: 91 additions & 0 deletions docs/source/Advanced.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
Advanced
========
:code:`rashdf` provides convenience methods for generating
Zarr metadata for HEC-RAS HDF5 files. This is particularly useful
for working with stochastic ensemble simulations, where many
HEC-RAS HDF5 files are generated for different model realizations,
forcing scenarios, or other sources of uncertainty.

To illustrate this, consider a set of HEC-RAS HDF5 files stored
in an S3 bucket, where each file represents a different simulation
of a river model. We can generate Zarr metadata for each simulation
and then combine the metadata into a single Kerchunk metadata file
that includes a new "sim" dimension. This combined metadata file
can then be used to open a single Zarr dataset that includes all
simulations.

The cell timeseries output for a single simulation might look
something like this::

>>> from rashdf import RasPlanHdf
>>> plan_hdf = RasPlanHdf.open_uri("s3://bucket/simulations/1/BigRiver.p01.hdf")
>>> plan_hdf.mesh_cells_timeseries_output("BigRiverMesh1")
<xarray.Dataset> Size: 66MB
Dimensions: (time: 577, cell_id: 14188)
Coordinates:
* time (time) datetime64[ns] 5kB 1996-01-14...
* cell_id (cell_id) int64 114kB 0 1 ... 14187
Data variables:
Water Surface (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
Cell Cumulative Precipitation Depth (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
Attributes:
mesh_name: BigRiverMesh1

Note that the example below requires installation of the optional
libraries :code:`kerchunk`, :code:`zarr`, :code:`fsspec`, and :code:`s3fs`::

from rashdf import RasPlanHdf
from kerchunk.combine import MultiZarrToZarr
import json

# Example S3 URL pattern for HEC-RAS plan HDF5 files
s3_url_pattern = "s3://bucket/simulations/{sim}/BigRiver.p01.hdf"

zmeta_files = []
sims = list(range(1, 11))

# Generate Zarr metadata for each simulation
for sim in sims:
s3_url = s3_url_pattern.format(sim=sim)
plan_hdf = RasPlanHdf.open_uri(s3_url)
zmeta = plan_hdf.zmeta_mesh_cells_timeseries_output("BigRiverMesh1")
json_file = f"BigRiver.{sim}.p01.hdf.json"
with open(json_file, "w") as f:
json.dump(zmeta, f)
zmeta_files.append(json_file)
# Combine Zarr metadata files into a single Kerchunk metadata file
# with a new "sim" dimension
mzz = MultiZarrToZarr(zmeta_files, concat_dims=["sim"], coo_map={"sim": sims})
mzz_dict = mss.translate()

with open("BigRiver.combined.p01.json", "w") as f:
json.dump(mzz_dict, f)

Now, we can open the combined dataset with :code:`xarray`::

import xarray as xr

ds = xr.open_dataset(
"reference://",
engine="zarr",
backend_kwargs={
"consolidated": False,
"storage_options": {"fo": "BigRiver.combined.p01.json"},
},
chunks="auto",
)

The resulting combined dataset includes a new :code:`sim` dimension::

<xarray.Dataset> Size: 674MB
Dimensions: (sim: 10, time: 577, cell_id: 14606)
Coordinates:
* cell_id (cell_id) int64 117kB 0 1 ... 14605
* sim (sim) int64 80B 1 2 3 4 5 6 7 8 9 10
* time (time) datetime64[ns] 5kB 1996-01-14...
Data variables:
Cell Cumulative Precipitation Depth (sim, time, cell_id) float32 337MB dask.array<chunksize=(10, 228, 14606), meta=np.ndarray>
Water Surface (sim, time, cell_id) float32 337MB dask.array<chunksize=(10, 228, 14606), meta=np.ndarray>
Attributes:
mesh_name: BigRiverMesh1
3 changes: 2 additions & 1 deletion docs/source/RasGeomHdf.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
RasGeomHdf
==========

.. currentmodule:: rashdf
.. autoclass:: RasGeomHdf
:show-inheritance:
Expand All @@ -21,6 +22,6 @@ RasGeomHdf
get_geom_structures_attrs,
get_geom_2d_flow_area_attrs,
cross_sections_elevations,
cross_sections
cross_sections,
river_reaches

10 changes: 9 additions & 1 deletion docs/source/RasPlanHdf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ RasPlanHdf
mesh_max_ws_err,
mesh_max_iter,
mesh_last_iter,
mesh_cells_summary_output,
mesh_faces_summary_output,
mesh_cells_timeseries_output,
mesh_faces_timeseries_output,
reference_lines,
reference_lines_names,
reference_points,
Expand All @@ -31,4 +35,8 @@ RasPlanHdf
cross_sections_flow,
cross_sections_wsel,
steady_flow_names,
steady_profile_xs_output
steady_profile_xs_output,
zmeta_mesh_cells_timeseries_output,
zmeta_mesh_faces_timeseries_output,
zmeta_reference_lines_timeseries_output,
zmeta_reference_points_timeseries_output
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
templates_path = ["_templates"]
exclude_patterns = []

master_doc = "index"


# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
Expand Down
24 changes: 6 additions & 18 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ HDF5 files. It is a wrapper around the :code:`h5py` library, and provides an int
convenience functions for reading key HEC-RAS geometry data, output data,
and metadata.

.. toctree::
:maxdepth: 2

API
Advanced

Installation
============
With :code:`pip`::
Expand Down Expand Up @@ -82,21 +88,3 @@ credentials)::
'Simulation Start Time': datetime.datetime(1996, 1, 14, 12, 0),
'Time Window': [datetime.datetime(1996, 1, 14, 12, 0),
datetime.datetime(1996, 2, 7, 12, 0)]}


API
===
.. toctree::
:maxdepth: 1

RasGeomHdf
RasPlanHdf
RasHdf

:code:`rashdf` provides two primary classes for reading data from
HEC-RAS geometry and plan HDF files: :code:`RasGeomHdf` and :code:`RasPlanHdf`.
Both of these classes inherit from the :code:`RasHdf` base class, which
inherits from the :code:`h5py.File` class.

Note that :code:`RasPlanHdf` inherits from :code:`RasGeomHdf`, so all of the
methods available in :code:`RasGeomHdf` are also available in :code:`RasPlanHdf`.
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ classifiers = [
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
version = "0.5.0"
version = "0.6.0"
dependencies = ["h5py", "geopandas>=1.0,<2.0", "pyarrow", "xarray"]

[project.optional-dependencies]
dev = ["pre-commit", "ruff", "pytest", "pytest-cov", "fiona"]
dev = ["pre-commit", "ruff", "pytest", "pytest-cov", "fiona", "kerchunk", "zarr", "dask", "fsspec", "s3fs"]
docs = ["sphinx", "numpydoc", "sphinx_rtd_theme"]

[project.urls]
Expand Down
5 changes: 4 additions & 1 deletion src/rashdf/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def __init__(self, name: str, **kwargs):
Additional keyword arguments to pass to h5py.File
"""
super().__init__(name, mode="r", **kwargs)
self._loc = name

@classmethod
def open_uri(
Expand Down Expand Up @@ -49,7 +50,9 @@ def open_uri(
import fsspec

remote_file = fsspec.open(uri, mode="rb", **fsspec_kwargs)
return cls(remote_file.open(), **h5py_kwargs)
result = cls(remote_file.open(), **h5py_kwargs)
result._loc = uri
return result

def get_attrs(self, attr_path: str) -> Dict:
"""Convert attributes from a HEC-RAS HDF file into a Python dictionary for a given attribute path.
Expand Down
Loading

0 comments on commit 6b45130

Please sign in to comment.