Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocess/ndvi #77

Open
wants to merge 71 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
a0d552b
initial commit
tommylees112 Jul 23, 2019
1ec976e
remove specifics from esa-cci-landcover
tommylees112 Jul 23, 2019
bcbcc43
fix flake errors
tommylees112 Jul 23, 2019
6f4b0b8
add merge_file functionality
tommylees112 Jul 23, 2019
71682e8
add print statement to merge_files to ensure user knows what's happening
tommylees112 Jul 23, 2019
b3f6d7d
create a function for working with categorical data like landcover
tommylees112 Jul 23, 2019
845cb21
moving the get_modal to different branch
tommylees112 Jul 23, 2019
2a3dd2d
remove an old merge conflictfrom mac environment
tommylees112 Jul 23, 2019
016aefc
update to fix tests
tommylees112 Jul 24, 2019
b956533
fix flake error
tommylees112 Jul 24, 2019
dce8d69
rename latlon
tommylees112 Jul 25, 2019
46469d4
add ability to specify years_to_process
tommylees112 Jul 25, 2019
affc1f3
mypy
tommylees112 Jul 25, 2019
9fdc252
update the ability to run specific years
tommylees112 Jul 25, 2019
02c63f2
print message
tommylees112 Jul 25, 2019
a2f8bec
update the ability to run specific years
tommylees112 Jul 25, 2019
5add043
update the ability to run specific years
tommylees112 Jul 25, 2019
ed83dce
update the ability to run specific years
tommylees112 Jul 25, 2019
c598150
catch error preprocessing some ndvi timestamps
tommylees112 Jul 25, 2019
d0e491b
add pandas
tommylees112 Jul 25, 2019
5bfd2dc
fix the latlon decoded too -
tommylees112 Jul 25, 2019
6ac23e2
save to netcdf -
tommylees112 Jul 25, 2019
6c3f9e9
fix filename test
tommylees112 Jul 25, 2019
73484d8
fix mypy
tommylees112 Jul 25, 2019
25dd83f
merge local
tommylees112 Oct 28, 2019
5e41972
add preprocess function
tommylees112 Oct 28, 2019
c689319
update the preprocess function
tommylees112 Oct 28, 2019
580461c
process ndvi only
tommylees112 Oct 28, 2019
e0636b9
settings for vscode
tommylees112 Oct 28, 2019
abc5867
change default cleanup
tommylees112 Oct 28, 2019
3c40feb
update notebook exploring the ndvi values
tommylees112 Oct 28, 2019
4f6e250
update scripts with extra imports
tommylees112 Oct 29, 2019
43aba25
explore ndvi data notebook
tommylees112 Oct 29, 2019
a552af6
update engineer script
tommylees112 Oct 29, 2019
fb01d5d
add naming convenctions to regrid function
tommylees112 Oct 29, 2019
6e91ff7
add naming convenctions to regrid function
tommylees112 Oct 29, 2019
917c257
update engineer script
tommylees112 Oct 30, 2019
167eefa
add ignore timesteps
tommylees112 Oct 30, 2019
213b5c3
add pandas to base preprocessor
tommylees112 Oct 31, 2019
02ec650
update the missing timesteps
tommylees112 Oct 31, 2019
d78f27a
automatically detect errored timesteps
tommylees112 Oct 31, 2019
3c4397c
notebook update
tommylees112 Oct 31, 2019
5391715
update models script
tommylees112 Nov 1, 2019
61d7118
parsimonious doesnt need ignore_vars
tommylees112 Nov 1, 2019
4656ddf
update models
tommylees112 Nov 1, 2019
bd09f10
update models
tommylees112 Nov 1, 2019
46edb7c
merge remote master to loca
tommylees112 Nov 3, 2019
8b8b872
update ignore vars
tommylees112 Nov 3, 2019
0765358
add early stopping
tommylees112 Nov 3, 2019
e97dc6d
dont run the explainer for now
tommylees112 Nov 3, 2019
cea19f8
update region to include africa
tommylees112 Nov 13, 2019
02ad003
script updatE
tommylees112 Nov 16, 2019
e83948f
preprocess ndvi
tommylees112 Nov 16, 2019
558cc0e
preprocess ndvi with new reference grid
tommylees112 Nov 16, 2019
08bc89c
update regrid path
tommylees112 Nov 16, 2019
f710798
update regrid path
tommylees112 Nov 16, 2019
cab8aa3
process recent decade
tommylees112 Nov 16, 2019
d7d54d7
update engineer to vci target var
tommylees112 Nov 18, 2019
51257f5
update scripts
tommylees112 Nov 19, 2019
f04b3b1
add soil water levels (moisture)
tommylees112 Nov 19, 2019
2ce41fb
preprocess era5
tommylees112 Nov 19, 2019
d95ad4a
update scripts
tommylees112 Nov 19, 2019
d38e40d
update regrid path
tommylees112 Nov 19, 2019
e27129d
update descriptions of NDVI preprocessor
tommylees112 Nov 19, 2019
b0cfc82
update n months to use in preprocess
tommylees112 Nov 19, 2019
1d4e76f
update gleam
tommylees112 Nov 19, 2019
0333363
add model for iterative experiments
tommylees112 Nov 22, 2019
910cfa9
update
tommylees112 Nov 22, 2019
751bd23
remove the print mse each batch
tommylees112 Nov 26, 2019
113c9ef
experiment_0 modelling
tommylees112 Nov 26, 2019
567aa8c
essa cci preproccess
tommylees112 Nov 27, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal"
}
]
}
7 changes: 7 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"python.pythonPath": "/Users/tommylees/miniconda3/envs/crop/bin/python",
"python.testing.pytestArgs": ["tests"],
"python.testing.unittestEnabled": false,
"python.testing.nosetestsEnabled": false,
"python.testing.pytestEnabled": true
}
2 changes: 1 addition & 1 deletion notebooks/draft/05_tl_event_detect.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3619,7 +3619,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
"version": "3.7.0"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion notebooks/draft/06_tl_indices.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
"version": "3.7.0"
}
},
"nbformat": 4,
Expand Down
324 changes: 324 additions & 0 deletions notebooks/draft/20_tl_s5_explore.ipynb

Large diffs are not rendered by default.

905 changes: 905 additions & 0 deletions notebooks/draft/21_tl_ndvi_explore.ipynb

Large diffs are not rendered by default.

303 changes: 303 additions & 0 deletions notebooks/draft/22_tl_explore_ndvi_regrid.ipynb

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions scripts/drafts/eng_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
import pandas as pd
from netCDF4 import num2date

from pathlib import Path
from typing import List

# ------------------------------------------------------------------------------
# Selcting the Same Timeslice
Expand Down
25 changes: 21 additions & 4 deletions scripts/engineer.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
from src.engineer import Engineer


def engineer(experiment='one_month_forecast', process_static=True,
pred_months=12):
def engineer_VCI(experiment='one_month_forecast', process_static=True,
pred_months=12):
# if the working directory is alread ml_drought don't need ../data
if Path('.').absolute().as_posix().split('/')[-1] == 'ml_drought':
data_path = Path('data')
Expand All @@ -15,7 +15,23 @@ def engineer(experiment='one_month_forecast', process_static=True,

engineer = Engineer(data_path, experiment=experiment, process_static=process_static)
engineer.engineer(
test_year=2018, target_variable='VCI',
test_year=[y for y in range(2011, 2019)], target_variable='VCI',
pred_months=pred_months, expected_length=pred_months,
)


def engineer_NDVI(experiment='one_month_forecast', process_static=True,
pred_months=12):
# if the working directory is alread ml_drought don't need ../data
if Path('.').absolute().as_posix().split('/')[-1] == 'ml_drought':
data_path = Path('data')
else:
data_path = Path('../data')

engineer = Engineer(data_path, experiment=experiment,
process_static=process_static)
engineer.engineer(
test_year=[y for y in range(2011, 2019)], target_variable='ndvi',
pred_months=pred_months, expected_length=pred_months,
)

Expand All @@ -30,5 +46,6 @@ def engineer_static():


if __name__ == '__main__':
engineer(pred_months=12)
# engineer_NDVI(pred_months=12)
engineer_VCI(pred_months=3)
# engineer_static()
31 changes: 16 additions & 15 deletions scripts/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,15 @@ def export_era5():
}

era5_variables = [
'10m_u_component_of_wind', '10m_v_component_of_wind', 'volumetric_soil_water_layer_1',
'volumetric_soil_water_layer_2', 'volumetric_soil_water_layer_3',
'volumetric_soil_water_layer_4', 'surface_pressure', 'surface_sensible_heat_flux',
'surface_latent_heat_flux', 'soil_temperature_level_1', '2m_temperature',
'mean_eastward_turbulent_surface_stress', 'mean_northward_turbulent_surface_stress',
'surface_net_solar_radiation_clear_sky', 'surface_net_thermal_radiation_clear_sky',
'vertical_integral_of_divergence_of_moisture_flux', 'potential_evaporation',
'evaporation'
# '10m_u_component_of_wind', '10m_v_component_of_wind',
'volumetric_soil_water_layer_1', 'volumetric_soil_water_layer_2',
'volumetric_soil_water_layer_3', 'volumetric_soil_water_layer_4',
# 'surface_pressure', 'surface_sensible_heat_flux',
# 'surface_latent_heat_flux', 'soil_temperature_level_1', '2m_temperature',
# 'mean_eastward_turbulent_surface_stress', 'mean_northward_turbulent_surface_stress',
# 'surface_net_solar_radiation_clear_sky', 'surface_net_thermal_radiation_clear_sky',
# 'vertical_integral_of_divergence_of_moisture_flux', 'potential_evaporation',
# 'evaporation'
]

for variable in era5_variables:
Expand Down Expand Up @@ -172,10 +173,10 @@ def export_kenya_boundaries():

if __name__ == '__main__':
export_era5()
export_vhi()
export_chirps()
export_era5POS()
export_gleam()
export_esa()
export_s5()
export_kenya_boundaries()
# export_vhi()
# export_chirps()
# export_era5POS()
# export_gleam()
# export_esa()
# export_s5()
# export_kenya_boundaries()
67 changes: 42 additions & 25 deletions scripts/models.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
import sys
sys.path.append('..')

from pathlib import Path
from src.analysis import all_shap_for_file
from src.models import (Persistence, LinearRegression,
LinearNetwork, RecurrentNetwork,
EARecurrentNetwork, load_model)
from src.analysis import all_shap_for_file
from pathlib import Path


# NOTE: p84.162 == 'vertical integral of moisture flux'


def parsimonious(
Expand All @@ -17,14 +20,17 @@ def parsimonious(
else:
data_path = Path('../data')

predictor = Persistence(data_path, experiment=experiment)
predictor = Persistence(
data_path, experiment=experiment
)
predictor.evaluate(save_preds=True)


def regression(
experiment='one_month_forecast',
include_pred_month=True,
surrounding_pixels=1
surrounding_pixels=None,
ignore_vars=None
):
# if the working directory is alread ml_drought don't need ../data
if Path('.').absolute().as_posix().split('/')[-1] == 'ml_drought':
Expand All @@ -35,19 +41,21 @@ def regression(
predictor = LinearRegression(
data_path, experiment=experiment,
include_pred_month=include_pred_month,
surrounding_pixels=surrounding_pixels
surrounding_pixels=surrounding_pixels,
ignore_vars=ignore_vars,
)
predictor.train()
predictor.train(early_stopping=5)
predictor.evaluate(save_preds=True)

# mostly to test it works
predictor.explain(save_shap_values=True)
# predictor.explain(save_shap_values=True)


def linear_nn(
experiment='one_month_forecast',
include_pred_month=True,
surrounding_pixels=1
surrounding_pixels=None,
ignore_vars=None
):
# if the working directory is alread ml_drought don't need ../data
if Path('.').absolute().as_posix().split('/')[-1] == 'ml_drought':
Expand All @@ -59,19 +67,21 @@ def linear_nn(
layer_sizes=[100], data_folder=data_path,
experiment=experiment,
include_pred_month=include_pred_month,
surrounding_pixels=surrounding_pixels
surrounding_pixels=surrounding_pixels,
ignore_vars=ignore_vars,
)
predictor.train(num_epochs=50, early_stopping=5)
predictor.evaluate(save_preds=True)
predictor.save_model()

_ = predictor.explain(save_shap_values=True)
# _ = predictor.explain(save_shap_values=True)


def rnn(
experiment='one_month_forecast',
include_pred_month=True,
surrounding_pixels=1
surrounding_pixels=None,
ignore_vars=None
):
# if the working directory is alread ml_drought don't need ../data
if Path('.').absolute().as_posix().split('/')[-1] == 'ml_drought':
Expand All @@ -84,20 +94,22 @@ def rnn(
data_folder=data_path,
experiment=experiment,
include_pred_month=include_pred_month,
surrounding_pixels=surrounding_pixels
surrounding_pixels=surrounding_pixels,
ignore_vars=ignore_vars,
)
predictor.train(num_epochs=50, early_stopping=5)
predictor.evaluate(save_preds=True)
predictor.save_model()

_ = predictor.explain(save_shap_values=True)
# _ = predictor.explain(save_shap_values=True)


def earnn(
experiment='one_month_forecast',
include_pred_month=True,
surrounding_pixels=None,
pretrained=True
pretrained=True,
ignore_vars=None
):
# if the working directory is alread ml_drought don't need ../data
if Path('.').absolute().as_posix().split('/')[-1] == 'ml_drought':
Expand All @@ -111,22 +123,27 @@ def earnn(
data_folder=data_path,
experiment=experiment,
include_pred_month=include_pred_month,
surrounding_pixels=surrounding_pixels
surrounding_pixels=surrounding_pixels,
ignore_vars=ignore_vars,
)
predictor.train(num_epochs=50, early_stopping=5)
predictor.train(num_epochs=50, early_stopping=10)
predictor.evaluate(save_preds=True)
predictor.save_model()
else:
predictor = load_model(data_path / f'models/{experiment}/ealstm/model.pt')
predictor = load_model(
data_path / f'models/{experiment}/ealstm/model.pt')

test_file = data_path / f'features/{experiment}/test/2018_3'
assert test_file.exists()
all_shap_for_file(test_file, predictor, batch_size=100)
# test_file = data_path / f'features/{experiment}/test/2018_3'
# assert test_file.exists()
# all_shap_for_file(test_file, predictor, batch_size=100)


if __name__ == '__main__':
# parsimonious()
# regression()
# linear_nn()
# rnn()
earnn(pretrained=True)
ignore_vars = None
ignore_vars = ['ndvi', 'p84.162', 'sp', 'tp']

parsimonious()
regression(ignore_vars=ignore_vars)
linear_nn(ignore_vars=ignore_vars)
# rnn(ignore_vars=ignore_vars)
# earnn(pretrained=False, ignore_vars=ignore_vars)
Loading