Using `teehr` to support event-based evaluations #247

jarq6c · 2024-09-10T12:47:07Z

Here's an example of a single site evaluation of rainfall-driven runoff events: https://github.com/jarq6c/little_hope/blob/main/teehr-events/single_site.ipynb

The goal of this evaluation was to isolate likely periods of streamflow rise and recession associated with discrete rainfall events (i.e. not snowmelt, reservoir releases, glacial dam breaks, effluent releases, etc), and compute the median event peak streamflow bias (signed error) of the retrospective simulation.

The steps were as follows:

Select a USGS site (02146470)
Use the Network Linked Data Index API (NLDI) to get the associated comid/nwm feature id
Use teehr to retrieve NWM v3.0 simulation data and USGS observations
Process the observations to a continuous time series
Use hydrotools event detection to generate a dataframe of start and end times for each event in the observations
Characterize each event in terms of peak flow
Compute peak flow bias for each event
Bootstrap 95% confidence intervals around the median peak flow bias

I was planning to use the NLDI to get basin boundaries to compute mean areal precipitation from AORC. I'm also interested in comparing the AORC MAP to the data from the USGS rain gauge that's in this catchment (351104080521845). The ultimate goal of acquiring the rain data is to compute storm total precipitation and event based runoff efficiencies.

I'm still trying to figure out how to use teehr to acquire AORC data. Any insight or recommendations on how to improve this process would be welcome.

The text was updated successfully, but these errors were encountered:

mgdenno · 2024-09-12T20:00:11Z

@jarq6c to address your second question first, we have tools to fetch and process the gridded precipitation data that is available with the retrospective (which as I understand is the AORC data). Everything in this post is referencing the latest release version from main (v0.3.28). The process is a two step process whereby, first you have to create a weights file that specifies the fraction of that each grid cell is of the entire basin.

There is an example here: https://rtiinternational.github.io/teehr/user_guide/notebooks/loading/grid_loading_example.html#generate-the-weights-file (for weights file generation) and here for retro gridded data processing https://rtiinternational.github.io/teehr/user_guide/notebooks/loading/load_gridded_retrospective.html

Does this help or were you looking at a different source?

As far as comparing AORC to a rain gauge, this should work but one would need to be defined as the primary and one as the secondary. My initial thought is to have the gauge data be primary and the AORC be secondary, but there might be other implications to this decision that I am overlooking right now. Also might depend on other ways those data sources will be evaluated.

mgdenno · 2024-09-12T20:42:24Z

@jarq6c As for the first part of your question regarding the event-based evaluation steps, I think in v0.4-beta, with the merge of #250 which allows the chaining of metric calculations, we sort of have everything except the event detection piece.

Basically we can now do something like the following (note, the month field is a stand-in for an event id). This exact setup is untested, but we have tested that the chaining works, so this should work.

cb = Bootstrappers.CircularBlock(
    seed=50,
    reps=500,
    block_size=10,
    quantiles=[0.05, 0.95]
)
avg_cb = metrics.Average(bootstrap=cb)
avg_cb.output_field_name = "avg_cb"

eval.metrics.query(
    order_by=["primary_location_id", "month"],
    group_by=["primary_location_id", "month"],
    include_metrics=[
        metrics.RelativeBias()
    ]
).query(
    order_by=["primary_location_id"],
    group_by=["primary_location_id"],
    include_metrics=[avg_cb]
).to_pandas()

A few other notes and comments on the specific numbered items:

I am not familiar with this API, but will check it out. We parsed the RouteLink file to get the crosswalk table data and stored it in S3 so we can get the relationship data from there. We should perhaps look at the API though instead?
We have tools to do this in both v0.3.28 and v0.4.0b.
We do not currently have gap filling but should probably put it on the list of features.
We do not currently have event detection but should add it for sure. I think from a workflow perspective it is just another user defined field on the joined timeseries table, but the calculation is somewhat more complicated than some current ones because it needs to group all the timeseries into "single timeseries" (based on location, variable, units, reference_time, etc, all the fields that make a unique timeseries) and sort them to identify continuous blocks of data that go together and constitute and "event". My sense is all event detection methods need that, then the actual algorithm may vary between methods. Correct?
through 8. I think we can handle as shown above (I also realize that what I showed with the two steps is not quite what you were describing, but I think the chaining is what is needed regardless of which metrics are calculated in each step).

jarq6c · 2024-09-13T13:28:30Z

The NLDI API might be useful for introducing more catchment or physically-based "metadata." For example, I've used it to generate crosswalks, get catchment boundaries, find upstream/downstream reaches, and find the closest stream reach to a dam (given coordinates).
My expectation is that yes, all of these types of "event detection" methods will operate on singular location-specific time series. The actual algorithm employed will change depending on catchment characteristics, variable (stage/discharge/precip), and event type.
I think the example makes sense in form. A theoretical "event" adaptation might look like this:

bs = Bootstrappers.ClassicalBootstrap(
    seed=50,
    reps=2000,
    quantiles=[0.025, 0.975]
)
med_bs = metrics.Median(bootstrap=bs)
med_bs.output_field_name = "med_bs"

# Add 'event_id' column
add_event_id_column(algorithm="Eckhardt", field_name="event_id")

eval.metrics.query(
    order_by=["primary_location_id", "event_id"],
    group_by=["primary_location_id", "event_id"],
    include_metrics=[
        metrics.RelativeBias()
    ]
).query(
    order_by=["primary_location_id"],
    group_by=["primary_location_id"],
    include_metrics=[med_bs]
).to_pandas()

mgdenno · 2024-09-13T13:53:50Z

This issue is also related to "Add 'event_id` column" #227

mgdenno added this to the v0.5 Release milestone Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `teehr` to support event-based evaluations #247

Using `teehr` to support event-based evaluations #247

jarq6c commented Sep 10, 2024

mgdenno commented Sep 12, 2024

mgdenno commented Sep 12, 2024 •

edited

Loading

jarq6c commented Sep 13, 2024

mgdenno commented Sep 13, 2024

Using teehr to support event-based evaluations #247

Using teehr to support event-based evaluations #247

Comments

jarq6c commented Sep 10, 2024

mgdenno commented Sep 12, 2024

mgdenno commented Sep 12, 2024 • edited Loading

jarq6c commented Sep 13, 2024

mgdenno commented Sep 13, 2024

Using `teehr` to support event-based evaluations #247

Using `teehr` to support event-based evaluations #247

mgdenno commented Sep 12, 2024 •

edited

Loading