Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using teehr to support event-based evaluations #247

Open
jarq6c opened this issue Sep 10, 2024 · 4 comments
Open

Using teehr to support event-based evaluations #247

jarq6c opened this issue Sep 10, 2024 · 4 comments
Milestone

Comments

@jarq6c
Copy link

jarq6c commented Sep 10, 2024

Here's an example of a single site evaluation of rainfall-driven runoff events: https://github.com/jarq6c/little_hope/blob/main/teehr-events/single_site.ipynb

The goal of this evaluation was to isolate likely periods of streamflow rise and recession associated with discrete rainfall events (i.e. not snowmelt, reservoir releases, glacial dam breaks, effluent releases, etc), and compute the median event peak streamflow bias (signed error) of the retrospective simulation.

The steps were as follows:

  1. Select a USGS site (02146470)
  2. Use the Network Linked Data Index API (NLDI) to get the associated comid/nwm feature id
  3. Use teehr to retrieve NWM v3.0 simulation data and USGS observations
  4. Process the observations to a continuous time series
  5. Use hydrotools event detection to generate a dataframe of start and end times for each event in the observations
  6. Characterize each event in terms of peak flow
  7. Compute peak flow bias for each event
  8. Bootstrap 95% confidence intervals around the median peak flow bias

I was planning to use the NLDI to get basin boundaries to compute mean areal precipitation from AORC. I'm also interested in comparing the AORC MAP to the data from the USGS rain gauge that's in this catchment (351104080521845). The ultimate goal of acquiring the rain data is to compute storm total precipitation and event based runoff efficiencies.

I'm still trying to figure out how to use teehr to acquire AORC data. Any insight or recommendations on how to improve this process would be welcome.

@mgdenno
Copy link
Contributor

mgdenno commented Sep 12, 2024

@jarq6c to address your second question first, we have tools to fetch and process the gridded precipitation data that is available with the retrospective (which as I understand is the AORC data). Everything in this post is referencing the latest release version from main (v0.3.28). The process is a two step process whereby, first you have to create a weights file that specifies the fraction of that each grid cell is of the entire basin.

There is an example here: https://rtiinternational.github.io/teehr/user_guide/notebooks/loading/grid_loading_example.html#generate-the-weights-file (for weights file generation) and here for retro gridded data processing https://rtiinternational.github.io/teehr/user_guide/notebooks/loading/load_gridded_retrospective.html

Does this help or were you looking at a different source?

As far as comparing AORC to a rain gauge, this should work but one would need to be defined as the primary and one as the secondary. My initial thought is to have the gauge data be primary and the AORC be secondary, but there might be other implications to this decision that I am overlooking right now. Also might depend on other ways those data sources will be evaluated.

@mgdenno
Copy link
Contributor

mgdenno commented Sep 12, 2024

@jarq6c As for the first part of your question regarding the event-based evaluation steps, I think in v0.4-beta, with the merge of #250 which allows the chaining of metric calculations, we sort of have everything except the event detection piece.

Basically we can now do something like the following (note, the month field is a stand-in for an event id). This exact setup is untested, but we have tested that the chaining works, so this should work.

cb = Bootstrappers.CircularBlock(
    seed=50,
    reps=500,
    block_size=10,
    quantiles=[0.05, 0.95]
)
avg_cb = metrics.Average(bootstrap=cb)
avg_cb.output_field_name = "avg_cb"

eval.metrics.query(
    order_by=["primary_location_id", "month"],
    group_by=["primary_location_id", "month"],
    include_metrics=[
        metrics.RelativeBias()
    ]
).query(
    order_by=["primary_location_id"],
    group_by=["primary_location_id"],
    include_metrics=[avg_cb]
).to_pandas()

A few other notes and comments on the specific numbered items:

  1. I am not familiar with this API, but will check it out. We parsed the RouteLink file to get the crosswalk table data and stored it in S3 so we can get the relationship data from there. We should perhaps look at the API though instead?
  2. We have tools to do this in both v0.3.28 and v0.4.0b.
  3. We do not currently have gap filling but should probably put it on the list of features.
  4. We do not currently have event detection but should add it for sure. I think from a workflow perspective it is just another user defined field on the joined timeseries table, but the calculation is somewhat more complicated than some current ones because it needs to group all the timeseries into "single timeseries" (based on location, variable, units, reference_time, etc, all the fields that make a unique timeseries) and sort them to identify continuous blocks of data that go together and constitute and "event". My sense is all event detection methods need that, then the actual algorithm may vary between methods. Correct?
  5. through 8. I think we can handle as shown above (I also realize that what I showed with the two steps is not quite what you were describing, but I think the chaining is what is needed regardless of which metrics are calculated in each step).

@jarq6c
Copy link
Author

jarq6c commented Sep 13, 2024

  • The NLDI API might be useful for introducing more catchment or physically-based "metadata." For example, I've used it to generate crosswalks, get catchment boundaries, find upstream/downstream reaches, and find the closest stream reach to a dam (given coordinates).
  • My expectation is that yes, all of these types of "event detection" methods will operate on singular location-specific time series. The actual algorithm employed will change depending on catchment characteristics, variable (stage/discharge/precip), and event type.
  • I think the example makes sense in form. A theoretical "event" adaptation might look like this:
bs = Bootstrappers.ClassicalBootstrap(
    seed=50,
    reps=2000,
    quantiles=[0.025, 0.975]
)
med_bs = metrics.Median(bootstrap=bs)
med_bs.output_field_name = "med_bs"

# Add 'event_id' column
add_event_id_column(algorithm="Eckhardt", field_name="event_id")

eval.metrics.query(
    order_by=["primary_location_id", "event_id"],
    group_by=["primary_location_id", "event_id"],
    include_metrics=[
        metrics.RelativeBias()
    ]
).query(
    order_by=["primary_location_id"],
    group_by=["primary_location_id"],
    include_metrics=[med_bs]
).to_pandas()

@mgdenno
Copy link
Contributor

mgdenno commented Sep 13, 2024

This issue is also related to "Add 'event_id` column" #227

@mgdenno mgdenno added this to the v0.5 Release milestone Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants