Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Prescriptors and project to handle new prescription #89

Merged
merged 19 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
1f1001a
Refactored prescriptors to be more user-oriented vs. train oriented. …
danyoungday May 30, 2024
9db5c04
Added saving loading and frompretrained to prescriptor
danyoungday May 31, 2024
a184505
Updated heuristics
danyoungday May 31, 2024
33059f2
Removed references to ESP
danyoungday May 31, 2024
7acd0d2
Implemented saving and loading for heuristics
danyoungday May 31, 2024
1e5c73b
ignore esp files
danyoungday May 31, 2024
8bc07d0
Updated experiments to work with new prescriptor architecture
danyoungday May 31, 2024
b917f17
reran training with fixed distance calculation then reran experiments
danyoungday Jun 3, 2024
1cb7620
Modified app to use new prescriptors and performed minor refactoring …
danyoungday Jun 3, 2024
d0596b3
Merge branch 'refactor-app' into hf-prescriptors
danyoungday Jun 3, 2024
f5ae448
renamed indices so that we don't have duplicate columns
danyoungday Jun 4, 2024
babf912
Fixed test to download data and use it
danyoungday Jun 4, 2024
926ffe0
Linted to reach threshold
danyoungday Jun 4, 2024
7b53ef1
Merge branch 'main' into hf-prescriptors
danyoungday Jun 6, 2024
3bdde8c
Added some documentation to show where prescriptor logic is used in t…
danyoungday Jun 6, 2024
0f58c24
Modified ELUCData by consolidating yucky classes into single clean class
danyoungday Jun 7, 2024
04c26e6
Updated documentation for new data file
danyoungday Jun 7, 2024
2f34f9d
Refactored data and encoder, updated all things to work with new data
danyoungday Jun 7, 2024
8a96046
Linted files to reach 9.7
danyoungday Jun 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions use_cases/eluc/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,14 @@ experiments/predictor_significance
# Ignores figures for paper
experiments/figures

prescriptors/esp
# Ignores trained prescriptors and seeds
prescriptors/*/trained_prescriptors
prescriptors/*/training_runs
prescriptors/*/seeds
prescriptors/trained_models
prescriptors/nsga2/transfer_prescriptors.ipynb

app/data/app_data.csv
data/*.zip
data/processed/*.csv
*.nc

2 changes: 1 addition & 1 deletion use_cases/eluc/.pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ ignore=prescriptors/esp

recursive=y

fail-under=9.65
fail-under=9.7

jobs=0

Expand Down
3 changes: 3 additions & 0 deletions use_cases/eluc/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ RUN pip install --no-cache-dir --upgrade pip && \
# Copy source files over
COPY . .

# Python setup script - downloads data and processes it
RUN python -m app.process_data
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change our dockerfile to download the data from HuggingFace before we start the app so that we don't have to push the csv to the git repo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How fast (or slow) is it?

Copy link
Collaborator Author

@danyoungday danyoungday Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes about a minute. We could also upload the preprocessed app dataset to huggingface which would remove most of this time.


# Expose Flask (Dash) port
EXPOSE 4057

Expand Down
35 changes: 12 additions & 23 deletions use_cases/eluc/app/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,44 +18,35 @@
import dash_bootstrap_components as dbc

from data import constants
from data.eluc_data import ELUCEncoder
import app.constants as app_constants
from app import utils
from prescriptors.nsga2.torch_prescriptor import TorchPrescriptor

app = Dash(__name__,
external_stylesheets=[dbc.themes.BOOTSTRAP, dbc.icons.BOOTSTRAP],
prevent_initial_callbacks="initial_duplicate")
server = app.server

df = pd.read_csv(app_constants.DATA_FILE_PATH, index_col=app_constants.INDEX_COLS)
df.rename(columns={col + ".1": col for col in app_constants.INDEX_COLS}, inplace=True)
COUNTRIES_DF = regionmask.defined_regions.natural_earth_v5_0_0.countries_110.to_dataframe()

# Prescriptor list should be in order of least to most change
# Load pareto df
pareto_df = pd.read_csv(app_constants.PARETO_CSV_PATH)
pareto_df = pareto_df.sort_values(by="change", ascending=True)
pareto_df.sort_values(by="change", inplace=True)
prescriptor_list = list(pareto_df["id"])

encoder = ELUCEncoder.from_json(app_constants.PRESCRIPTOR_PATH / "fields.json")
# TODO: Stop hard-coding candidate params -> make cand config file?
candidate_params = {
"in_size": len(constants.CAO_MAPPING["context"]),
"hidden_size": 16,
"out_size": len(constants.RECO_COLS)
}
prescriptor = TorchPrescriptor(None, encoder, None, 1, candidate_params)
# Load prescriptors
prescriptor_manager = utils.load_prescriptors()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use a single PrescriptorManager object now that can hold all the individual LandUsePrescriptors. This allows us to in the future add Heuristics too since they implement Prescriptor


# Load predictors
predictors = utils.load_predictors()

# Cells
min_lat = df.index.get_level_values("lat").min()
max_lat = df.index.get_level_values("lat").max()
min_lon = df.index.get_level_values("lon").min()
max_lon = df.index.get_level_values("lon").max()
min_time = df.index.get_level_values("time").min()
max_time = df.index.get_level_values("time").max()
min_lat = df["lat"].min()
max_lat = df["lat"].max()
min_lon = df["lon"].min()
max_lon = df["lon"].max()
min_time = df["time"].min()
max_time = df["time"].max()

lat_list = list(np.arange(min_lat, max_lat + app_constants.GRID_STEP, app_constants.GRID_STEP))
lon_list = list(np.arange(min_lon, max_lon + app_constants.GRID_STEP, app_constants.GRID_STEP))
Expand Down Expand Up @@ -478,9 +469,7 @@ def select_prescriptor(_, presc_idx, year, lat, lon):
presc_id = prescriptor_list[presc_idx]
context = df.loc[year, lat, lon][constants.CAO_MAPPING["context"]]
context_df = pd.DataFrame([context])
prescribed = prescriptor.prescribe_land_use(context_df,
cand_id=presc_id,
results_dir=app_constants.PRESCRIPTOR_PATH)
prescribed = prescriptor_manager.prescribe(presc_id, context_df)
# Prescribed gives it to us in diff format, we need to recompute recommendations
for col in constants.RECO_COLS:
prescribed[col] = context[col] + prescribed[f"{col}_diff"]
Expand Down Expand Up @@ -544,7 +533,7 @@ def compute_land_change(sliders, year, lat, lon, locked):
warnings.append(html.P("WARNING: Negative values detected. Please lower the value of a locked slider."))

# Compute total change
change = prescriptor.compute_percent_changed(context_actions_df)
change = prescriptor_manager.compute_percent_changed(context_actions_df)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_percent_changed is now part of PrescriptorManager


return warnings, f"{change['change'].iloc[0] * 100:.2f}"

Expand Down
11 changes: 4 additions & 7 deletions use_cases/eluc/app/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@

from data.constants import LAND_USE_COLS

DATA_FILE_PATH = Path("data/processed/app_data.csv")
DATA_FILE_PATH = Path("app/data/app_data.csv")

APP_START_YEAR = 2012

GRID_STEP = 0.25

INDEX_COLS = ["time", "lat", "lon"]
INDEX_COLS = ["time_idx", "lat_idx", "lon_idx"]

NO_CHANGE_COLS = ["primf", "primn", "urban"]
CHART_COLS = LAND_USE_COLS + ["nonland"]
Expand All @@ -31,12 +31,9 @@
CHART_TYPES = ["Treemap", "Pie Chart"]

PREDICTOR_PATH = Path("predictors/trained_models")
PRESCRIPTOR_PATH = Path("prescriptors/nsga2/trained_prescriptors/demo")
PRESCRIPTOR_PATH = Path("prescriptors/trained_models")

# Pareto front
PARETO_CSV_PATH = PRESCRIPTOR_PATH / "pareto.csv"
PARETO_FRONT_PATH = PRESCRIPTOR_PATH / "pareto_front.png"

FIELDS_PATH = PRESCRIPTOR_PATH / "fields.json"
PARETO_CSV_PATH = Path("app/data/pareto.csv")

DEFAULT_PRESCRIPTOR_IDX = 1 # By default we select the second prescriptor that minimizes change
11 changes: 11 additions & 0 deletions use_cases/eluc/app/data/pareto.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
id,parents,NSGA-II_rank,distance,ELUC,change
1_1,"(None, None)",1,inf,-0.011886149,0.0040596884414631
49_0,"('45_27', '21_80')",1,0.20109607207578,-2.1743317,0.0533331221986483
61_92,"('26_38', '60_66')",1,0.0427627827542644,-4.3928847,0.0690307965236667
85_51,"('82_16', '53_65')",1,0.0506403998177242,-8.327706,0.1045782392704721
82_16,"('51_17', '76_44')",1,0.0606549652913774,-11.905426,0.1340041432727491
70_30,"('59_5', '51_15')",1,0.0230698188448444,-13.6480255,0.1685300249153988
67_70,"('39_79', '59_5')",1,0.032665984351135,-14.625315,0.1968721749125319
36_3,"('30_85', '34_74')",1,0.0577219544984151,-15.683007,0.2284119445886663
62_84,"('58_38', '56_3')",1,0.0310138542659899,-16.641254,0.2609490730066026
54_67,"('43_94', '43_94')",1,inf,-17.59739,0.2924915519940842
4 changes: 2 additions & 2 deletions use_cases/eluc/app/process_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ def main():
"""
# Subsets the dataset so train_df is from start_year-1 to test year which we discard.
# Then we take the app data as the test def which is from the app start year to the end of the dataset.
dataset = ELUCData(start_year=APP_START_YEAR-1, test_year=APP_START_YEAR)
dataset = ELUCData.from_hf(start_year=APP_START_YEAR-1, test_year=APP_START_YEAR)
test_df = dataset.test_df
save_dir = Path("data/processed")
save_dir = Path("app/data")
save_dir.mkdir(exist_ok=True)
test_df.to_csv(save_dir / "app_data.csv")

Expand Down
27 changes: 25 additions & 2 deletions use_cases/eluc/app/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@

import app.constants as app_constants
from data import constants

from prescriptors.prescriptor_manager import PrescriptorManager
from prescriptors.nsga2.land_use_prescriptor import LandUsePrescriptor

from predictors.predictor import Predictor
from predictors.neural_network.neural_net_predictor import NeuralNetPredictor
from predictors.sklearn.sklearn_predictor import LinearRegressionPredictor, RandomForestPredictor

Expand Down Expand Up @@ -258,9 +263,27 @@ def create_pareto(pareto_df: pd.DataFrame, presc_id: int) -> go.Figure:
" Average ELUC: %{y} tC/ha<extra></extra>")
return fig

def load_predictors() -> dict:
def load_prescriptors() -> tuple[list[str], PrescriptorManager]:
"""
Loads in prescriptors from disk, downloads from HuggingFace first if needed.
TODO: Currently hard-coded to load specific prescriptors from pareto path.
:return: dict of prescriptor name -> prescriptor object.
"""
prescriptors = {}
pareto_df = pd.read_csv(app_constants.PARETO_CSV_PATH)
pareto_df = pareto_df.sort_values(by="change")
for cand_id in pareto_df["id"]:
cand_path = f"danyoung/eluc-{cand_id}"
cand_local_dir = app_constants.PRESCRIPTOR_PATH / cand_path.replace("/", "--")
prescriptors[cand_id] = LandUsePrescriptor.from_pretrained(cand_path, local_dir=cand_local_dir)

prescriptor_manager = PrescriptorManager(prescriptors, None)

return prescriptor_manager
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads the hard-coded pareto and downloads the appropriate models from HuggingFace


def load_predictors() -> dict[str, Predictor]:
"""
Loads in predictors from disk.
Loads in predictors from disk, downloads from HuggingFace first if needed.
TODO: Currently hard-coded to load specific predictors. We need to make this able to handle any amount!
:return: dict of predictor name -> predictor object.
"""
Expand Down
Loading
Loading