Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Prescriptors and project to handle new prescription #89

Merged
merged 19 commits into from
Jun 7, 2024

Conversation

danyoungday
Copy link
Collaborator

@danyoungday danyoungday commented Jun 4, 2024

Refactored the Prescriptors to be more user-oriented instead of being a convenient wrapper around our ESP-like training. Reran evolution and used the new prescriptors in the app and experiments. Removed references to ESP.

New prescriptor hierarchy:

Prescriptor: Abstract class with abstract predict method - needs to be moved to SDK

LandUsePrescriptor: Implementation of Prescriptor specific to our project. Wraps Candidate from evolution

Candidate: PyTorch NN trained with NSGA-II wrapped by LandUsePrescriptor

PrescriptorManager: Takes in many Prescriptors and a single Predictor and allows us to compare the Prescriptors

@danyoungday danyoungday requested a review from ofrancon June 4, 2024 17:21
@danyoungday danyoungday self-assigned this Jun 4, 2024
@@ -22,6 +22,9 @@ RUN pip install --no-cache-dir --upgrade pip && \
# Copy source files over
COPY . .

# Python setup script - downloads data and processes it
RUN python -m app.process_data
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change our dockerfile to download the data from HuggingFace before we start the app so that we don't have to push the csv to the git repo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How fast (or slow) is it?

Copy link
Collaborator Author

@danyoungday danyoungday Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes about a minute. We could also upload the preprocessed app dataset to huggingface which would remove most of this time.

}
prescriptor = TorchPrescriptor(None, encoder, None, 1, candidate_params)
# Load prescriptors
prescriptor_manager = utils.load_prescriptors()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use a single PrescriptorManager object now that can hold all the individual LandUsePrescriptors. This allows us to in the future add Heuristics too since they implement Prescriptor

@@ -544,7 +533,7 @@ def compute_land_change(sliders, year, lat, lon, locked):
warnings.append(html.P("WARNING: Negative values detected. Please lower the value of a locked slider."))

# Compute total change
change = prescriptor.compute_percent_changed(context_actions_df)
change = prescriptor_manager.compute_percent_changed(context_actions_df)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_percent_changed is now part of PrescriptorManager

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we hard code which prescriptors to use and what their places on the pareto are

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine for now.
Just wondering: could that be a file we push to HF and that the app downloads when it starts? That way editing this file and restarting the app would be enough to update the presriptors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can go in the special repository that handles the app on the huggingface space.


prescriptor_manager = PrescriptorManager(prescriptors, None)

return prescriptor_manager
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads the hard-coded pareto and downloads the appropriate models from HuggingFace

Copy link
Collaborator Author

@danyoungday danyoungday Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reran experiments with new prescriptors and removed references to ESP

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Prescriptor logic. Handles abstract prescribe method as well as save and load. Additionally has a special torch_prescribe method to be used during evolution so that we can avoid the overhead of converting the entire evaluation dataset from pandas to PyTorch.

eluc_df, change_df = self.prescriptor.predict_metrics(context_actions_df)
prescriptor = LandUsePrescriptor(candidate, self.encoder, self.batch_size)
context_actions_df = prescriptor.torch_prescribe(self.context_df, self.encoded_context_dl)
eluc_df, change_df = prescriptor_manager.predict_metrics(context_actions_df)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we use the new prescriptor logic during evolution. We create a LandUsePrescriptor wrapped around whichever Candidate we are evaluating. We also create a dummy PrescriptorManager that wraps our Predictor. Then we can call torch_prescribe on the evaluation dataset (which is in the PyTorch format so we don't have to convert) and then predict our metrics using the PrescriptorManager.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment would be useful in the code itself

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


if not Path(local_dir).exists() or not Path(local_dir).is_dir():
hf_args["local_dir"] = local_dir
snapshot_download(repo_id=path_or_url, **hf_args)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be rewritten according to our Persistor logic. Currently it's hard copy/pasted from Predictor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We also need to dissociate the prescriptor interface from the way the models are persisted and loaded. For another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wraps a dict of Prescriptor objects and a single Predictor so we can compare them and use them in the app.

Copy link
Member

@ofrancon ofrancon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, good progress!

@@ -22,6 +22,9 @@ RUN pip install --no-cache-dir --upgrade pip && \
# Copy source files over
COPY . .

# Python setup script - downloads data and processes it
RUN python -m app.process_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How fast (or slow) is it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine for now.
Just wondering: could that be a file we push to HF and that the app downloads when it starts? That way editing this file and restarting the app would be enough to update the presriptors

eluc_df, change_df = self.prescriptor.predict_metrics(context_actions_df)
prescriptor = LandUsePrescriptor(candidate, self.encoder, self.batch_size)
context_actions_df = prescriptor.torch_prescribe(self.context_df, self.encoded_context_dl)
eluc_df, change_df = prescriptor_manager.predict_metrics(context_actions_df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment would be useful in the code itself


if not Path(local_dir).exists() or not Path(local_dir).is_dir():
hf_args["local_dir"] = local_dir
snapshot_download(repo_id=path_or_url, **hf_args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We also need to dissociate the prescriptor interface from the way the models are persisted and loaded. For another PR.


return eluc_df, change_df

# TODO: Move this to its own predictor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, compute_percent_changed could be a Predictor. Means the constructor of PrescriptorManager should probably take a list of Predictor objects instead of a single one. And to compute the metrics we loop over the predictors. For another PR.

Base automatically changed from refactor-app to main June 6, 2024 22:45
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored structure of ELUCData to be loaded from classmethods rather than being 2 different classes with 2 different loading functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved ELUCEncoder to its own file. Made constructor load fields. Classmethods to load from pandas or json.

@danyoungday danyoungday merged commit 4aa17b2 into main Jun 7, 2024
1 check passed
@danyoungday danyoungday deleted the hf-prescriptors branch June 7, 2024 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants