-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Prescriptors and project to handle new prescription #89
Conversation
…Still have to test them
@@ -22,6 +22,9 @@ RUN pip install --no-cache-dir --upgrade pip && \ | |||
# Copy source files over | |||
COPY . . | |||
|
|||
# Python setup script - downloads data and processes it | |||
RUN python -m app.process_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change our dockerfile to download the data from HuggingFace before we start the app so that we don't have to push the csv to the git repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How fast (or slow) is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes about a minute. We could also upload the preprocessed app dataset to huggingface which would remove most of this time.
} | ||
prescriptor = TorchPrescriptor(None, encoder, None, 1, candidate_params) | ||
# Load prescriptors | ||
prescriptor_manager = utils.load_prescriptors() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use a single PrescriptorManager object now that can hold all the individual LandUsePrescriptors. This allows us to in the future add Heuristics too since they implement Prescriptor
@@ -544,7 +533,7 @@ def compute_land_change(sliders, year, lat, lon, locked): | |||
warnings.append(html.P("WARNING: Negative values detected. Please lower the value of a locked slider.")) | |||
|
|||
# Compute total change | |||
change = prescriptor.compute_percent_changed(context_actions_df) | |||
change = prescriptor_manager.compute_percent_changed(context_actions_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compute_percent_changed is now part of PrescriptorManager
use_cases/eluc/app/data/pareto.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we hard code which prescriptors to use and what their places on the pareto are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine for now.
Just wondering: could that be a file we push to HF and that the app downloads when it starts? That way editing this file and restarting the app would be enough to update the presriptors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can go in the special repository that handles the app on the huggingface space.
|
||
prescriptor_manager = PrescriptorManager(prescriptors, None) | ||
|
||
return prescriptor_manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reads the hard-coded pareto and downloads the appropriate models from HuggingFace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reran experiments with new prescriptors and removed references to ESP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New Prescriptor logic. Handles abstract prescribe
method as well as save
and load
. Additionally has a special torch_prescribe
method to be used during evolution so that we can avoid the overhead of converting the entire evaluation dataset from pandas to PyTorch.
eluc_df, change_df = self.prescriptor.predict_metrics(context_actions_df) | ||
prescriptor = LandUsePrescriptor(candidate, self.encoder, self.batch_size) | ||
context_actions_df = prescriptor.torch_prescribe(self.context_df, self.encoded_context_dl) | ||
eluc_df, change_df = prescriptor_manager.predict_metrics(context_actions_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we use the new prescriptor logic during evolution. We create a LandUsePrescriptor
wrapped around whichever Candidate
we are evaluating. We also create a dummy PrescriptorManager
that wraps our Predictor
. Then we can call torch_prescribe
on the evaluation dataset (which is in the PyTorch format so we don't have to convert) and then predict our metrics using the PrescriptorManager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment would be useful in the code itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
||
if not Path(local_dir).exists() or not Path(local_dir).is_dir(): | ||
hf_args["local_dir"] = local_dir | ||
snapshot_download(repo_id=path_or_url, **hf_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be rewritten according to our Persistor
logic. Currently it's hard copy/pasted from Predictor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. We also need to dissociate the prescriptor interface from the way the models are persisted and loaded. For another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wraps a dict
of Prescriptor
objects and a single Predictor
so we can compare them and use them in the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, good progress!
@@ -22,6 +22,9 @@ RUN pip install --no-cache-dir --upgrade pip && \ | |||
# Copy source files over | |||
COPY . . | |||
|
|||
# Python setup script - downloads data and processes it | |||
RUN python -m app.process_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How fast (or slow) is it?
use_cases/eluc/app/data/pareto.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine for now.
Just wondering: could that be a file we push to HF and that the app downloads when it starts? That way editing this file and restarting the app would be enough to update the presriptors
eluc_df, change_df = self.prescriptor.predict_metrics(context_actions_df) | ||
prescriptor = LandUsePrescriptor(candidate, self.encoder, self.batch_size) | ||
context_actions_df = prescriptor.torch_prescribe(self.context_df, self.encoded_context_dl) | ||
eluc_df, change_df = prescriptor_manager.predict_metrics(context_actions_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment would be useful in the code itself
|
||
if not Path(local_dir).exists() or not Path(local_dir).is_dir(): | ||
hf_args["local_dir"] = local_dir | ||
snapshot_download(repo_id=path_or_url, **hf_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. We also need to dissociate the prescriptor interface from the way the models are persisted and loaded. For another PR.
|
||
return eluc_df, change_df | ||
|
||
# TODO: Move this to its own predictor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, compute_percent_changed
could be a Predictor
. Means the constructor of PrescriptorManager
should probably take a list of Predictor objects instead of a single one. And to compute the metrics we loop over the predictors. For another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored structure of ELUCData to be loaded from classmethods rather than being 2 different classes with 2 different loading functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved ELUCEncoder
to its own file. Made constructor load fields. Classmethods to load from pandas or json.
Refactored the Prescriptors to be more user-oriented instead of being a convenient wrapper around our ESP-like training. Reran evolution and used the new prescriptors in the app and experiments. Removed references to ESP.
New prescriptor hierarchy:
Prescriptor: Abstract class with abstract predict method - needs to be moved to SDK
LandUsePrescriptor: Implementation of Prescriptor specific to our project. Wraps Candidate from evolution
Candidate: PyTorch NN trained with NSGA-II wrapped by LandUsePrescriptor
PrescriptorManager: Takes in many Prescriptors and a single Predictor and allows us to compare the Prescriptors