diff --git a/examples/building_datasets.ipynb b/examples/building_datasets.ipynb index 060149f..6077c0c 100644 --- a/examples/building_datasets.ipynb +++ b/examples/building_datasets.ipynb @@ -6,12 +6,12 @@ "source": [ "# Building Datasets\n", "\n", - "In most of our examples, we use dataset_loader to avoid boilerplate code when training fair classifiers. \n", + "In most of our examples, we use `dataset_loader` to avoid boilerplate code when training fair classifiers. \n", "This notebook sets out how to create similar code for new datasets.\n", "\n", - "For evaluating and fitting fair classifiers we require access to the group each datapoint is assigned to and the target (i.e. ground-truth) label the classifier is trying to predict. \n", + "For evaluating and fitting fair classifiers we require access to the group each datapoint is assigned to and the target (i.e., ground-truth) label the classifier is trying to predict. \n", "\n", - "For sklearn, classifiers assume that they only recieve the data used to predict, and as the target labels should never be passed with the rest of the data, and the groups should only be passed if the classifier uses them directly (i.e. if we are not using infered attributes).\n", + "For sklearn, classifiers assume that they only receive the data used to predict; that target labels should never be passed with the rest of the data; and that the groups should only be passed if the classifier uses them directly (i.e., if we are not using inferred attributes).\n", "\n", "\n", "\n", @@ -19,13 +19,13 @@ "1. Fair Classifiers using autogluon.\n", " Create a dataframe or tabular dataset containing all data used for classification, target labels, and groups.\n", " Autogluon takes pandas dataframes or their own internal tabular dataset type and only uses the columns the model was trained on to classify the data.\n", - " When using infered attributes you should ensure that neither the classifier predicting groups nor the classifier predicting target labels has access to the groups or target labels at training time. This is taken care for you automatically by using `oxonfair.inferred_attribute_builder`.\n", + " When using inferred attributes you should ensure that neither the classifier predicting groups nor the classifier predicting target labels has access to the groups or target labels at training time. This is taken care for you automatically by using `oxonfair.inferred_attribute_builder`.\n", "2. Fair Classifiers using Sklearn with known attributes.\n", - " Create a dataset by calling `oxonfair.build_data_dict` with two arguments - the target labels `y` and the data `X` used by the classifier. \n", + " Create a dataset by calling `oxonfair.DataDict` with two arguments -- the target labels `y` and the data `X` used by the classifier. \n", "3. Fair Classifiers using Sklearn with inferred attributes. \n", - " Create a dataset by calling `oxonfair.build_data_dict` with three arguments - the target labels `y`, the data `X` used by the classifier, and the groups. \n", + " Create a dataset by calling `oxonfair.DataDict` with three arguments -- the target labels `y`, the data `X` used by the classifier, and the groups. \n", "4. Fair Classifiers using Deep networks.\n", - " Create a classifier by calling `oxonfair.DeepFairPredictor` with three arguments - the target labels, the predictions made by the classifier, and the groups. See [this notebook for examples](quickstart_DeepFairPredictor_computer_vision.ipynb)." + " Create a classifier by calling `oxonfair.DeepFairPredictor` with three arguments -- the target labels, the predictions made by the classifier, and the groups. See [this notebook for examples](quickstart_DeepFairPredictor_computer_vision.ipynb)." ] }, { @@ -757,7 +757,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Autogluon with inferered attributes " + "# Autogluon with inferred attributes " ] }, { @@ -2014,7 +2014,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-06-17T19:19:28.649192Z", @@ -2023,455 +2023,7 @@ "shell.execute_reply": "2024-06-17T19:19:29.659894Z" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
XGBClassifier(base_score=None, booster=None, callbacks=None,\n", - " colsample_bylevel=None, colsample_bynode=None,\n", - " colsample_bytree=None, device=None, early_stopping_rounds=None,\n", - " enable_categorical=False, eval_metric=None, feature_types=None,\n", - " gamma=None, grow_policy=None, importance_type=None,\n", - " interaction_constraints=None, learning_rate=None, max_bin=None,\n", - " max_cat_threshold=None, max_cat_to_onehot=None,\n", - " max_delta_step=None, max_depth=None, max_leaves=None,\n", - " min_child_weight=None, missing=nan, monotone_constraints=None,\n", - " multi_strategy=None, n_estimators=None, n_jobs=None,\n", - " num_parallel_tree=None, random_state=None, ...)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
XGBClassifier(base_score=None, booster=None, callbacks=None,\n", - " colsample_bylevel=None, colsample_bynode=None,\n", - " colsample_bytree=None, device=None, early_stopping_rounds=None,\n", - " enable_categorical=False, eval_metric=None, feature_types=None,\n", - " gamma=None, grow_policy=None, importance_type=None,\n", - " interaction_constraints=None, learning_rate=None, max_bin=None,\n", - " max_cat_threshold=None, max_cat_to_onehot=None,\n", - " max_delta_step=None, max_depth=None, max_leaves=None,\n", - " min_child_weight=None, missing=nan, monotone_constraints=None,\n", - " multi_strategy=None, n_estimators=None, n_jobs=None,\n", - " num_parallel_tree=None, random_state=None, ...)