Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictor Contribution #96

Merged
merged 6 commits into from
Jul 23, 2024
Merged

Predictor Contribution #96

merged 6 commits into from
Jul 23, 2024

Conversation

danyoungday
Copy link
Collaborator

Created an evaluator for the predictors which loads them via. a config file. Users can create custom predictors now and put them in the config so that they can compare them to our models. A README on how to do this is now found in the custom predictors folder.

@danyoungday danyoungday requested a review from ofrancon July 22, 2024 17:21
@danyoungday danyoungday self-assigned this Jul 22, 2024
@@ -32,7 +32,7 @@
"from persistence.serializers.sklearn_serializer import SKLearnSerializer\n",
"from predictors.predictor import Predictor\n",
"from predictors.neural_network.neural_net_predictor import NeuralNetPredictor\n",
"from predictors.sklearn.sklearn_predictor import LinearRegressionPredictor, RandomForestPredictor"
"from predictors.sklearn_predictor.sklearn_predictor import LinearRegressionPredictor, RandomForestPredictor"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to rename this folder because it was getting confused with the real sklearn library. It took me an hour to debug this!!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instructions on how to create a custom predictor. We link to relevant files to read over to get an idea of how to do it.


def predict(self, context_actions_df: pd.DataFrame) -> pd.DataFrame:
dummy_eluc = list(range(len(context_actions_df)))
return pd.DataFrame({"ELUC": dummy_eluc}, index=context_actions_df.index)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dummy predictor just to show people how to create one. Just returns random numbers.

Dummy load function that just returns a new instance of the class.
"""
print("Loading model from", path)
return cls()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dummy predictor implements the load function rather than being in its own serializer to show that this is possible. All you need is a load and predict function but the serializer makes things nicer in our official predictors.

"filepath": "predictors/custom/template/model.pt"
},
{
"type": "hf",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can load with local or huggingface from the config

spec = importlib.util.spec_from_file_location(model["name"], model["classpath"])
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
model_instance = getattr(module, model["name"])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic load function uses importlib to load arbitrary classes. Did not realize python could do this!

persistor = HuggingFacePersistor(model_instance())
predictor = persistor.from_pretrained(model["url"], local_dir=model["filepath"])
elif model["type"] == "local":
predictor = model_instance().load(Path(model["filepath"]))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can load from huggingface or locally

raise ValueError(f"Columns {not_seen} not found in input dataframe.")

seen_outcomes = [col for col in self.outcomes if col in context_actions_df.columns]
return context_actions_df.drop(columns=seen_outcomes).copy()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We validate that the context and actions columns are in the input, then remove outcomes from the input

if not set(self.outcomes) == set(outcomes_df.columns):
print(self.outcomes, outcomes_df.columns)
not_seen = set(self.outcomes) - set(outcomes_df.columns)
raise ValueError(f"Outcomes {not_seen} not found in output dataframe.")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We make sure indices are the same in the output so we can compare and make sure the column name matches up.

Copy link
Member

@ofrancon ofrancon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The README.md in custom is helping a lot. When I tried it in the GitHub branch the links were broken though - please fix them and I'll approve


## Create a Custom Predictor

An example custom predictor can be found in the [template](predictors/custom/template) folder. In order to create a custom predictor, 2 steps must be completed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link seems wrong. We're already in use_cases/eluc/predictors/custom. The link should be (template)


An example custom predictor can be found in the [template](predictors/custom/template) folder. In order to create a custom predictor, 2 steps must be completed.

1. You need to implement the `Predictor` interface. This is defined in [predictor.py](predictors/predictor.py). It is a simple abstract class that requires a `predict` method that takes in a dataframe of context and actions and returns a dataframe of outcomes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link seems wrong. We're already in use_cases/eluc/predictors/custom. The link should probably be (../predictor.py)


1. You need to implement the `Predictor` interface. This is defined in [predictor.py](predictors/predictor.py). It is a simple abstract class that requires a `predict` method that takes in a dataframe of context and actions and returns a dataframe of outcomes.

2. You need, either in the same class or a specific serializer class, to implement a `load` method that takes in a path to a model on disk and returns an instance of the `Predictor`. (See [serializer.py](persistence/persistors/serializers/serializer.py) for the interface for serialization and [neural_network_serializer.py](persistence/persistors/serializers/neural_network_serializer.py) for an example of how to implement serialization.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check these other links too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed all links to be relative rather than absolute

"""
Validates input and output dataframes for predictor evaluation.
Context, actions, outcomes do not necessarily have to match the project's CAO_MAPPING. For example, if we are
just evaluating ELUC we can just pass the single column as outcomes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we still need to have at least the context and actions in the input of the predictor, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we still need to have at least the context and actions in the input of the predictor, right?

Currently we leave it open to the Validator which context/actions to make sure are in the inputs. When we create the Validator in the Scoring script we pass in our correct CA and just ELUC as O.

from predictors.predictor import Predictor
from predictors.scoring.validator import Validator

class PredictorScorer:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed name to Scorer to avoid confusion with Evaluation during evolution

Copy link
Member

@ofrancon ofrancon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@danyoungday danyoungday merged commit baa5c42 into main Jul 23, 2024
1 check passed
@danyoungday danyoungday deleted the contribution branch July 23, 2024 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants