-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predictor Contribution #96
Conversation
…to evaluate predictors
@@ -32,7 +32,7 @@ | |||
"from persistence.serializers.sklearn_serializer import SKLearnSerializer\n", | |||
"from predictors.predictor import Predictor\n", | |||
"from predictors.neural_network.neural_net_predictor import NeuralNetPredictor\n", | |||
"from predictors.sklearn.sklearn_predictor import LinearRegressionPredictor, RandomForestPredictor" | |||
"from predictors.sklearn_predictor.sklearn_predictor import LinearRegressionPredictor, RandomForestPredictor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to rename this folder because it was getting confused with the real sklearn library. It took me an hour to debug this!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instructions on how to create a custom predictor. We link to relevant files to read over to get an idea of how to do it.
|
||
def predict(self, context_actions_df: pd.DataFrame) -> pd.DataFrame: | ||
dummy_eluc = list(range(len(context_actions_df))) | ||
return pd.DataFrame({"ELUC": dummy_eluc}, index=context_actions_df.index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dummy predictor just to show people how to create one. Just returns random numbers.
Dummy load function that just returns a new instance of the class. | ||
""" | ||
print("Loading model from", path) | ||
return cls() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dummy predictor implements the load function rather than being in its own serializer to show that this is possible. All you need is a load and predict function but the serializer makes things nicer in our official predictors.
"filepath": "predictors/custom/template/model.pt" | ||
}, | ||
{ | ||
"type": "hf", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can load with local or huggingface from the config
spec = importlib.util.spec_from_file_location(model["name"], model["classpath"]) | ||
module = importlib.util.module_from_spec(spec) | ||
spec.loader.exec_module(module) | ||
model_instance = getattr(module, model["name"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic load function uses importlib to load arbitrary classes. Did not realize python could do this!
persistor = HuggingFacePersistor(model_instance()) | ||
predictor = persistor.from_pretrained(model["url"], local_dir=model["filepath"]) | ||
elif model["type"] == "local": | ||
predictor = model_instance().load(Path(model["filepath"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can load from huggingface or locally
raise ValueError(f"Columns {not_seen} not found in input dataframe.") | ||
|
||
seen_outcomes = [col for col in self.outcomes if col in context_actions_df.columns] | ||
return context_actions_df.drop(columns=seen_outcomes).copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We validate that the context and actions columns are in the input, then remove outcomes from the input
if not set(self.outcomes) == set(outcomes_df.columns): | ||
print(self.outcomes, outcomes_df.columns) | ||
not_seen = set(self.outcomes) - set(outcomes_df.columns) | ||
raise ValueError(f"Outcomes {not_seen} not found in output dataframe.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make sure indices are the same in the output so we can compare and make sure the column name matches up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. The README.md in custom
is helping a lot. When I tried it in the GitHub branch the links were broken though - please fix them and I'll approve
|
||
## Create a Custom Predictor | ||
|
||
An example custom predictor can be found in the [template](predictors/custom/template) folder. In order to create a custom predictor, 2 steps must be completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link seems wrong. We're already in use_cases/eluc/predictors/custom
. The link should be (template)
|
||
An example custom predictor can be found in the [template](predictors/custom/template) folder. In order to create a custom predictor, 2 steps must be completed. | ||
|
||
1. You need to implement the `Predictor` interface. This is defined in [predictor.py](predictors/predictor.py). It is a simple abstract class that requires a `predict` method that takes in a dataframe of context and actions and returns a dataframe of outcomes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link seems wrong. We're already in use_cases/eluc/predictors/custom
. The link should probably be (../predictor.py)
|
||
1. You need to implement the `Predictor` interface. This is defined in [predictor.py](predictors/predictor.py). It is a simple abstract class that requires a `predict` method that takes in a dataframe of context and actions and returns a dataframe of outcomes. | ||
|
||
2. You need, either in the same class or a specific serializer class, to implement a `load` method that takes in a path to a model on disk and returns an instance of the `Predictor`. (See [serializer.py](persistence/persistors/serializers/serializer.py) for the interface for serialization and [neural_network_serializer.py](persistence/persistors/serializers/neural_network_serializer.py) for an example of how to implement serialization.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check these other links too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed all links to be relative rather than absolute
""" | ||
Validates input and output dataframes for predictor evaluation. | ||
Context, actions, outcomes do not necessarily have to match the project's CAO_MAPPING. For example, if we are | ||
just evaluating ELUC we can just pass the single column as outcomes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we still need to have at least the context and actions in the input of the predictor, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at https://github.com/cognizant-ai-labs/covid-xprize/blob/master/covid_xprize/validation/predictor_validation.py for inspiration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we still need to have at least the context and actions in the input of the predictor, right?
Currently we leave it open to the Validator which context/actions to make sure are in the inputs. When we create the Validator in the Scoring script we pass in our correct CA and just ELUC as O.
from predictors.predictor import Predictor | ||
from predictors.scoring.validator import Validator | ||
|
||
class PredictorScorer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed name to Scorer to avoid confusion with Evaluation during evolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Created an evaluator for the predictors which loads them via. a config file. Users can create custom predictors now and put them in the config so that they can compare them to our models. A README on how to do this is now found in the custom predictors folder.