-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6 from mhpi/update
Code release.
- Loading branch information
Showing
134 changed files
with
6,658 additions
and
3,491 deletions.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,141 +1,90 @@ | ||
This code contains deep learning code used to modeling hydrologic systems, from soil moisture to streamflow, from projection to forecast. | ||
|
||
# Citations | ||
|
||
If you find our code to be useful, please cite the following papers: | ||
|
||
Feng, DP, K. Fang and CP. Shen, [Enhancing streamflow forecast and extracting insights using continental-scale long-short term memory networks with data integration], Water Resources Reserach (2020), https://doi.org/10.1029/2019WR026793 | ||
|
||
Fang, K., CP. Shen, D. Kifer and X. Yang, [Prolongation of SMAP to Spatio-temporally Seamless Coverage of Continental US Using a Deep Learning Neural Network], Geophysical Research Letters, doi: 10.1002/2017GL075619, preprint accessible at: arXiv:1707.06611 (2017) https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2017GL075619 | ||
|
||
Shen, CP., [A trans-disciplinary review of deep learning research and its relevance for water resources scientists], Water Resources Research. 54(11), 8558-8593, doi: 10.1029/2018WR022643 (2018) https://doi.org/10.1029/2018WR022643 | ||
|
||
Major code contributor: Kuai Fang (PhD., Penn State), and smaller contribution from Dapeng Feng (PhD Student, Penn State) | ||
|
||
A new release is expected in early July, 2020, together with video code walkthrough. | ||
Computational benchmark: training of CAMELS data (w/ or w/o data integration) with 671 basins, 10 years, 300 epochs, in ~1 hour with GPU. | ||
|
||
# Example | ||
Two examples with sample data are wrapped up including | ||
- [train a LSTM network to learn SMAP soil moisture](example/train-lstm.py) | ||
- [estimate uncertainty of a LSTM network ](example/train-lstm-mca.py) | ||
|
||
A demo for temporal test is [here](example/demo-temporal-test.ipynb) | ||
|
||
# License | ||
Non-Commercial Software License Agreement | ||
|
||
By downloading the hydroDL software (the “Software”) you agree to | ||
the following terms of use: | ||
Copyright (c) 2020, The Pennsylvania State University (“PSU”). All rights reserved. | ||
|
||
1. PSU hereby grants to you a perpetual, nonexclusive and worldwide right, privilege and | ||
license to use, reproduce, modify, display, and create derivative works of Software for all | ||
non-commercial purposes only. You may not use Software for commercial purposes without | ||
prior written consent from PSU. Queries regarding commercial licensing should be directed | ||
to The Office of Technology Management at 814.865.6277 or [email protected]. | ||
2. Neither the name of the copyright holder nor the names of its contributors may be used | ||
to endorse or promote products derived from this software without specific prior written | ||
permission. | ||
3. This software is provided for non-commercial use only. | ||
4. Redistribution and use in source and binary forms, with or without modification, are | ||
permitted provided that redistributions must reproduce the above copyright notice, license, | ||
list of conditions and the following disclaimer in the documentation and/or other materials | ||
provided with the distribution. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE | ||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | ||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | ||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | ||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | ||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
POSSIBILITY OF SUCH DAMAGE. | ||
|
||
|
||
# Database description | ||
## Database Structure | ||
``` | ||
├── CONUS | ||
│ ├── 2000 | ||
│ │ ├── [Variable-Name].csv | ||
│ │ ├── ... | ||
│ │ ├── timeStr.csv | ||
│ │ └── time.csv | ||
│ ├── ... | ||
│ ├── 2017 | ||
│ │ └── ... | ||
│ ├── const | ||
│ │ ├── [Constant-Variable-Name].csv | ||
│ │ └── ... | ||
│ └── crd.csv | ||
├── CONUSv4f1 | ||
│ └── ... | ||
├── Statistics | ||
│ ├── [Variable-Name]_stat.csv | ||
│ ├── ... | ||
│ ├── const_[Constant-Variable-Name]_stat.csv | ||
│ └── ... | ||
├── Subset | ||
│ ├── CONUS.csv | ||
│ └── CONUSv4f1.csv | ||
└── Variable | ||
├── varConstLst.csv | ||
└── varLst.csv | ||
``` | ||
### 1. Dataset folders (*CONUS* , *CONUSv4f1*) | ||
Data folder contains all data including both training and testing, time-dependent variables and constant variables. | ||
In example data structure, there are two dataset folders - *CONUS* and *CONUSv4f1*. Those data are saved in: | ||
|
||
- **year/[Variable-Name].csv**: | ||
|
||
A csv file of size [#grid, #time], where each column is one grid and each row is one time step. This file saved data of a time-dependent variable of current year. For example, *CONUS/2010/SMAP_AM.csv* is SMAP data of 2002 on the CONUS. | ||
|
||
Most time-dependent varibles comes from NLDAS, which included two forcing product (FORA, FORB) and three simulations product land surface models (NOAH, MOS, VIC). Variables are named as *[variable]\_[product]\_[layer]*, and reference of variable can be found in [NLDAS document](https://hydro1.gesdisc.eosdis.nasa.gov/data/NLDAS/README.NLDAS2.pdf). For example, *SOILM_NOAH_0-10* refers to soil moisture product simulated by NOAH model at 0-10 cm. | ||
|
||
Other than NLDAS, SMAP data are also saved in same format but always used as target. In level 3 database, there are two SMAP csv files which are only available after 2015: *SMAP_AM.csv* and *SMAP_PM.csv*. | ||
|
||
-9999 refers to NaN. | ||
|
||
- **year/time.csv** & **timeStr.csv** | ||
|
||
Dates csv file of current year folder, of size [#date]. *time.csv* recorded Matlab datenum and *timeStr.csv* recorded date in format of yyyy-mm-dd. | ||
|
||
Notice that each year start from and end before April 1st. For example data in folder 2010 is actually data from 2010-04-01 to 2011-03-31. The reason is that SMAP launched at April 1st. | ||
|
||
- **const/[Constant Variable Name].csv** | ||
|
||
csv file for constant variables of size [#grid]. | ||
|
||
- **crd.csv** | ||
|
||
Coordinate of all grids. First Column is latitude and second column is longitude. Each row refers a grid. | ||
|
||
### 2. Statistics folder | ||
|
||
Stored statistics of variables in order to do data normalization during training. Named as: | ||
- Time dependent variables-> [variable name].csv | ||
- Constant variables-> const_[variable name].csv | ||
|
||
Each file wrote four statistics of variable: | ||
- 90 percentile | ||
- 10 percentile | ||
- mean | ||
- std | ||
|
||
During training we normalize data by (data - mean) / std | ||
|
||
### 3. Subset folder | ||
Subset refers to a subset of grids from the complete dataset (CONUS or Global). For example, a subset only contains grids in Pennsylvania. All subsets (including the CONUS or Global dataset) will have a *[subset name].csv* file in the *Subset* folder. *[subset name].csv* is wrote as: | ||
- line 1 -> root dataset | ||
- line 2 - end -> indexs of subset grids in rootset (start from 1) | ||
|
||
If the index is -1 means all grid, from example CONUS dataset. | ||
|
||
### 4. Variable folder | ||
Stored csv files contains a list of variables. Used as input to training code. Time-dependent variables and constant variables should be stored seperately. For example: | ||
- varLst.csv -> a list of time-dependent variables used as training predictors. | ||
- varLst.csv -> a list of constant variables used as training predictors. | ||
This code contains deep learning code used to modeling hydrologic systems, from soil moisture to streamflow, from projection to forecast. | ||
|
||
This released code depends on our hydroDL repository, please follow our original github repository where we will release new updates occasionally | ||
https://github.com/mhpi/hydroDL | ||
# Citations | ||
|
||
If you find our code to be useful, please cite the following papers: | ||
|
||
Feng, DP., Lawson, K., and CP. Shen, Mitigating prediction error of deep learning streamflow models in large data-sparse regions with ensemble modeling and soft data, Geophysical Research Letters (2021, Accepted) arXiv preprint https://arxiv.org/abs/2011.13380 | ||
|
||
Feng, DP, K. Fang and CP. Shen, Enhancing streamflow forecast and extracting insights using continental-scale long-short term memory networks with data integration, Water Resources Research (2020), https://doi.org/10.1029/2019WR026793 | ||
|
||
Shen, CP., A trans-disciplinary review of deep learning research and its relevance for water resources scientists, Water Resources Research. 54(11), 8558-8593, doi: 10.1029/2018WR022643 (2018) https://doi.org/10.1029/2018WR022643 | ||
|
||
Major code contributor: Dapeng Feng (PhD Student, Penn State) and Kuai Fang (PhD., Penn State) | ||
|
||
# Examples | ||
The environment we are using is shown as the file `repoenv.yml`. To create the same conda environment, please run: | ||
```Shell | ||
conda env create -f repoenv.yml | ||
``` | ||
Activate the installed environment before running the code: | ||
```Shell | ||
conda activate mhpihydrodl | ||
``` | ||
You can also use this `Environment Setup_Tutorial.pdf` document as a reference to set up your environment and solve some frequently encountered questions. | ||
There may be a small compatibility issue with our code when using very high pyTorch version. Welcome to contact us if you find any issue not able to solve or bug. | ||
|
||
|
||
Several examples related to the above papers are presented here. **Click the title link** to see each example. | ||
## [1.Train a LSTM data integration model to make streamflow forecast](example/StreamflowExample-DI.py) | ||
The dataset used is NCAR CAMELS dataset. Download CAMELS following [this link](https://ral.ucar.edu/solutions/products/camels). | ||
Please download both forcing, observation data `CAMELS time series meteorology, observed flow, meta data (.zip)` and basin attributes `CAMELS Attributes (.zip)`. | ||
Put two unzipped folders under the same directory, like `your/path/to/Camels/basin_timeseries_v1p2_metForcing_obsFlow`, and `your/path/to/Camels/camels_attributes_v2.0`. Set the directory path `your/path/to/Camels` | ||
as the variable `rootDatabase` inside the code later. | ||
|
||
Computational benchmark: training of CAMELS data (w/ or w/o data integration) with 671 basins, 10 years, 300 epochs, in ~1 hour with GPU. | ||
|
||
Related papers: | ||
Feng et al. (2020). [Enhancing streamflow forecast and extracting insights using long‐short term memory networks with data integration at continental scales](https://doi.org/10.1029/2019WR026793). Water Resources Research. | ||
|
||
## [2.Train LSTM and CNN-LSTM models for prediction in ungauged regions](example/PUR/trainPUR-Reg.py) | ||
The dataset used is also NCAR CAMELS. Follow the instructions in the first example above to download and unzip the dataset. Use [this code](example/PUR/testPUR-Reg.py) to test your saved models after training finished. | ||
|
||
Related papers: | ||
Feng et al. (2021, Accepted). Mitigating prediction error of deep learning streamflow models in large data-sparse regions with ensemble modeling and soft data. Geophysical Research Letters. | ||
Feng et al. (2020). [Enhancing streamflow forecast and extracting insights using long‐short term memory networks with data integration at continental scales](https://doi.org/10.1029/2019WR026793). Water Resources Research. | ||
|
||
## [3.Train a LSTM model to learn SMAP soil moisture](example/demo-LSTM-Tutorial.ipynb) | ||
The example dataset is embedded in this repo and can be found [here](example/data). | ||
You can also use [this script](example/train-lstm.py) to train model if you don't want to work with Jupyter Notebook. | ||
|
||
Related papers: | ||
Fang et al. (2017), [Prolongation of SMAP to Spatio-temporally Seamless Coverage of Continental US Using a Deep Learning Neural Network](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2017GL075619), Geophysical Research Letters. | ||
|
||
## [4.Estimate uncertainty of a LSTM network ](example/train-lstm-mca.py) | ||
Related papers: | ||
Fang et al. (2020). [Evaluating the potential and challenges of an uncertainty quantification method for long short-term memory models for soil moisture predictions](https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2020WR028095), Water Resources Research. | ||
# License | ||
Non-Commercial Software License Agreement | ||
|
||
By downloading the hydroDL software (the “Software”) you agree to | ||
the following terms of use: | ||
Copyright (c) 2020, The Pennsylvania State University (“PSU”). All rights reserved. | ||
|
||
1. PSU hereby grants to you a perpetual, nonexclusive and worldwide right, privilege and | ||
license to use, reproduce, modify, display, and create derivative works of Software for all | ||
non-commercial purposes only. You may not use Software for commercial purposes without | ||
prior written consent from PSU. Queries regarding commercial licensing should be directed | ||
to The Office of Technology Management at 814.865.6277 or [email protected]. | ||
2. Neither the name of the copyright holder nor the names of its contributors may be used | ||
to endorse or promote products derived from this software without specific prior written | ||
permission. | ||
3. This software is provided for non-commercial use only. | ||
4. Redistribution and use in source and binary forms, with or without modification, are | ||
permitted provided that redistributions must reproduce the above copyright notice, license, | ||
list of conditions and the following disclaimer in the documentation and/or other materials | ||
provided with the distribution. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | ||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE | ||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | ||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | ||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | ||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | ||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | ||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
POSSIBILITY OF SUCH DAMAGE. |
18 changes: 9 additions & 9 deletions
18
example/.ipynb_checkpoints/demo-temporal-test-checkpoint.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.