Predictor Contribution #96

danyoungday · 2024-07-22T17:21:05Z

Created an evaluator for the predictors which loads them via. a config file. Users can create custom predictors now and put them in the config so that they can compare them to our models. A README on how to do this is now found in the custom predictors folder.

…to evaluate predictors

…ADME

danyoungday · 2024-07-22T17:21:42Z

use_cases/eluc/experiments/predictor_experiments.ipynb

@@ -32,7 +32,7 @@
    "from persistence.serializers.sklearn_serializer import SKLearnSerializer\n",
    "from predictors.predictor import Predictor\n",
    "from predictors.neural_network.neural_net_predictor import NeuralNetPredictor\n",
-    "from predictors.sklearn.sklearn_predictor import LinearRegressionPredictor, RandomForestPredictor"
+    "from predictors.sklearn_predictor.sklearn_predictor import LinearRegressionPredictor, RandomForestPredictor"


Had to rename this folder because it was getting confused with the real sklearn library. It took me an hour to debug this!!

danyoungday · 2024-07-22T17:22:09Z

use_cases/eluc/predictors/custom/README.md

Instructions on how to create a custom predictor. We link to relevant files to read over to get an idea of how to do it.

danyoungday · 2024-07-22T17:22:29Z

use_cases/eluc/predictors/custom/template/template_predictor.py

+
+    def predict(self, context_actions_df: pd.DataFrame) -> pd.DataFrame:
+        dummy_eluc = list(range(len(context_actions_df)))
+        return pd.DataFrame({"ELUC": dummy_eluc}, index=context_actions_df.index)


Dummy predictor just to show people how to create one. Just returns random numbers.

danyoungday · 2024-07-22T17:22:59Z

use_cases/eluc/predictors/custom/template/template_predictor.py

+        Dummy load function that just returns a new instance of the class.
+        """
+        print("Loading model from", path)
+        return cls()


Dummy predictor implements the load function rather than being in its own serializer to show that this is possible. All you need is a load and predict function but the serializer makes things nicer in our official predictors.

danyoungday · 2024-07-22T17:23:39Z

use_cases/eluc/predictors/evaluation/config.json

+            "filepath": "predictors/custom/template/model.pt"
+        },
+        {
+            "type": "hf",


We can load with local or huggingface from the config

danyoungday · 2024-07-22T17:24:05Z

use_cases/eluc/predictors/evaluation/evaluation.py

+            spec = importlib.util.spec_from_file_location(model["name"], model["classpath"])
+            module = importlib.util.module_from_spec(spec)
+            spec.loader.exec_module(module)
+            model_instance = getattr(module, model["name"])


Dynamic load function uses importlib to load arbitrary classes. Did not realize python could do this!

danyoungday · 2024-07-22T17:24:24Z

use_cases/eluc/predictors/evaluation/evaluation.py

+                persistor = HuggingFacePersistor(model_instance())
+                predictor = persistor.from_pretrained(model["url"], local_dir=model["filepath"])
+            elif model["type"] == "local":
+                predictor = model_instance().load(Path(model["filepath"]))


We can load from huggingface or locally

danyoungday · 2024-07-22T17:24:56Z

use_cases/eluc/predictors/evaluation/validator.py

+            raise ValueError(f"Columns {not_seen} not found in input dataframe.")
+
+        seen_outcomes = [col for col in self.outcomes if col in context_actions_df.columns]
+        return context_actions_df.drop(columns=seen_outcomes).copy()


We validate that the context and actions columns are in the input, then remove outcomes from the input

danyoungday · 2024-07-22T17:25:19Z

use_cases/eluc/predictors/evaluation/validator.py

+        if not set(self.outcomes) == set(outcomes_df.columns):
+            print(self.outcomes, outcomes_df.columns)
+            not_seen = set(self.outcomes) - set(outcomes_df.columns)
+            raise ValueError(f"Outcomes {not_seen} not found in output dataframe.")


We make sure indices are the same in the output so we can compare and make sure the column name matches up.

ofrancon

Looks good to me. The README.md in custom is helping a lot. When I tried it in the GitHub branch the links were broken though - please fix them and I'll approve

ofrancon · 2024-07-22T21:54:29Z

use_cases/eluc/predictors/custom/README.md

+
+## Create a Custom Predictor
+
+An example custom predictor can be found in the [template](predictors/custom/template) folder. In order to create a custom predictor, 2 steps must be completed.


The link seems wrong. We're already in use_cases/eluc/predictors/custom. The link should be (template)

ofrancon · 2024-07-22T21:54:48Z

use_cases/eluc/predictors/custom/README.md

+
+An example custom predictor can be found in the [template](predictors/custom/template) folder. In order to create a custom predictor, 2 steps must be completed.
+
+1. You need to implement the `Predictor` interface. This is defined in [predictor.py](predictors/predictor.py). It is a simple abstract class that requires a `predict` method that takes in a dataframe of context and actions and returns a dataframe of outcomes.


The link seems wrong. We're already in use_cases/eluc/predictors/custom. The link should probably be (../predictor.py)

ofrancon · 2024-07-22T21:57:42Z

use_cases/eluc/predictors/custom/README.md

+
+1. You need to implement the `Predictor` interface. This is defined in [predictor.py](predictors/predictor.py). It is a simple abstract class that requires a `predict` method that takes in a dataframe of context and actions and returns a dataframe of outcomes.
+
+2. You need, either in the same class or a specific serializer class, to implement a `load` method that takes in a path to a model on disk and returns an instance of the `Predictor`. (See [serializer.py](persistence/persistors/serializers/serializer.py) for the interface for serialization and [neural_network_serializer.py](persistence/persistors/serializers/neural_network_serializer.py) for an example of how to implement serialization.)


Check these other links too

Fixed all links to be relative rather than absolute

ofrancon · 2024-07-22T22:02:19Z

use_cases/eluc/predictors/evaluation/validator.py

+    """
+    Validates input and output dataframes for predictor evaluation.
+    Context, actions, outcomes do not necessarily have to match the project's CAO_MAPPING. For example, if we are
+    just evaluating ELUC we can just pass the single column as outcomes.


But we still need to have at least the context and actions in the input of the predictor, right?

Have you looked at https://github.com/cognizant-ai-labs/covid-xprize/blob/master/covid_xprize/validation/predictor_validation.py for inspiration?

But we still need to have at least the context and actions in the input of the predictor, right?

Currently we leave it open to the Validator which context/actions to make sure are in the inputs. When we create the Validator in the Scoring script we pass in our correct CA and just ELUC as O.

danyoungday · 2024-07-23T18:51:26Z

use_cases/eluc/predictors/scoring/scorer.py

+from predictors.predictor import Predictor
+from predictors.scoring.validator import Validator
+
+class PredictorScorer:


Changed name to Scorer to avoid confusion with Evaluation during evolution

ofrancon

lgtm

danyoungday added 5 commits July 18, 2024 15:55

Added badge showing status of ELUC use case unit tests

3891f5a

Renamed sklearn folder to avoid import errors. Added evaluator class …

b468e7b

…to evaluate predictors

Added some documentation showing how to evaluate

35c01a9

Moved evaluation to its own folder, created a config file, added a RE…

e90b126

…ADME

Added validation to evaluation

3e28d36

danyoungday requested a review from ofrancon July 22, 2024 17:21

danyoungday self-assigned this Jul 22, 2024

danyoungday commented Jul 22, 2024

View reviewed changes

ofrancon requested changes Jul 22, 2024

View reviewed changes

Fixed file links in README, renamed evaluation to scoring for clarity.

523c3e8

danyoungday commented Jul 23, 2024

View reviewed changes

ofrancon approved these changes Jul 23, 2024

View reviewed changes

danyoungday merged commit baa5c42 into main Jul 23, 2024
1 check passed

danyoungday deleted the contribution branch July 23, 2024 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictor Contribution #96

Predictor Contribution #96

danyoungday commented Jul 22, 2024

danyoungday Jul 22, 2024

danyoungday Jul 22, 2024

danyoungday Jul 22, 2024

danyoungday Jul 22, 2024

danyoungday Jul 22, 2024

danyoungday Jul 22, 2024

danyoungday Jul 22, 2024

danyoungday Jul 22, 2024

danyoungday Jul 22, 2024

ofrancon left a comment

ofrancon Jul 22, 2024

ofrancon Jul 22, 2024

ofrancon Jul 22, 2024

danyoungday Jul 23, 2024

ofrancon Jul 22, 2024

ofrancon Jul 22, 2024

danyoungday Jul 23, 2024

danyoungday Jul 23, 2024

ofrancon left a comment


		## Create a Custom Predictor

		An example custom predictor can be found in the [template](predictors/custom/template) folder. In order to create a custom predictor, 2 steps must be completed.


		An example custom predictor can be found in the [template](predictors/custom/template) folder. In order to create a custom predictor, 2 steps must be completed.

		1. You need to implement the `Predictor` interface. This is defined in [predictor.py](predictors/predictor.py). It is a simple abstract class that requires a `predict` method that takes in a dataframe of context and actions and returns a dataframe of outcomes.


		1. You need to implement the `Predictor` interface. This is defined in [predictor.py](predictors/predictor.py). It is a simple abstract class that requires a `predict` method that takes in a dataframe of context and actions and returns a dataframe of outcomes.

		2. You need, either in the same class or a specific serializer class, to implement a `load` method that takes in a path to a model on disk and returns an instance of the `Predictor`. (See [serializer.py](persistence/persistors/serializers/serializer.py) for the interface for serialization and [neural_network_serializer.py](persistence/persistors/serializers/neural_network_serializer.py) for an example of how to implement serialization.)

Predictor Contribution #96

Predictor Contribution #96

Conversation

danyoungday commented Jul 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ofrancon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ofrancon left a comment

Choose a reason for hiding this comment