diff --git a/README.md b/README.md index 724742e..dca83ef 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ More demo notebooks are present in the [examples folder](./examples/README.md). from oxonfair import group_metrics as gm import xgboost - # Download and partition the adult dataset into training and test datta + # Download and partition the adult dataset into training and test data train_data, _, test_data = dataset_loader.adult('sex', train_proportion=0.7, test_proportion=0.3) # Train an XGBoost classifier on the training set predictor = xgboost.XGBClassifier().fit(X=train_data['data'], y=train_data['target']) @@ -109,9 +109,9 @@ The full set of constraints and objectives can be seen in the list of measures i Fundamentally, most existing fairness methods are not appropriate for use with complex classifiers on high-dimensional data. These classifiers are prone to overfitting on the training data, which means that trying to balance error rates (e.g., when using equal opportunity) on the training data, is unlikely to transfer well to new unseen data. This is a particular problem when using computer vision (see [Zietlow et al.](https://arxiv.org/abs/2203.04913)), but can also occur with tabular data. Moreover, iteratively retraining complex models (a common requirement of many methods for enforcing fairness) is punitively slow when training the model once might take days, or even weeks, if you are trying to maximize performance. -At the same time, postprocessing methods which allow you to train once, and then improve fairness on held-out validation data generally requires the protected attributes to be available at test time, which is often infeasible, particularly with computer vision. +At the same time, postprocessing methods which allow you to train once, and then improve fairness on held-out validation data generally require the protected attributes to be available at test time, which is often infeasible, particularly with computer vision. -OxonFair is build from the ground up to avoid these issues. It is a postprocessing approach, explicitly designed to use inferred attributes where protected attributes are not available to enforce fairness. Fairness can be enforced both on validation, or on the train set, when you are short of data and overfitting is not a concern. When enforcing fairness in deep networks or using provided attributes, a classifier is only trained once, for non network-based approaches, e.g., scikit-learn or xgboost, with inferred attributes we require the training of two classifier (one to predict the original task, and a second to estimate groups membership). +OxonFair is built from the ground up to avoid these issues. It is a postprocessing approach, explicitly designed to use inferred attributes where protected attributes are not available to enforce fairness. Fairness can be enforced both on validation, or on the train set, when you are short of data and overfitting is not a concern. When enforcing fairness in deep networks or using provided attributes, a classifier is only trained once, for non network-based approaches, e.g., scikit-learn or xgboost, with inferred attributes we require the training of two classifier (one to predict the original task, and a second to estimate groups membership). That said, we make several additional design decisions which we believe make for a better experience for data scientists: @@ -143,7 +143,7 @@ See this [paper](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4331652) fo An Ipython notebook generating many of the figures in the paper can be found here: [Levelling up notebook](examples/levelling_up.ipynb) -Many fairness measures can be understood as identifying a harm, and then equalizing this harm across the population as a whole. For example, the use demographic parity of identifies people as being harmed by a low selection rate, which is then set to be the same for all groups, while equal opportunity identifies people as being harmed by low recall, and balances this harm across all groups. However, these fairness formulations often explicitly +Many fairness measures can be understood as identifying a harm, and then equalizing this harm across the population as a whole. For example, the use of demographic parity of identifies people as being harmed by a low selection rate, which is then set to be the same for all groups. Similarly, equal opportunity identifies people as being harmed by low recall, and balances this harm across all groups. However, these fairness formulations often implicitly *level down* and achieve fairness, by decreasing recall for some of the groups. As an alternative to equalizing the harm across the population, we allow data scientists to specify minimum rates of e.g., recall, precision, or selection rate for every group, with one line of code. E.g. @@ -161,7 +161,9 @@ We provide support for the utility based approach set out in [Fairness On The Gr Utility functions can be defined in one line. -For example, consider a situation where an ML system identifies potential problems that require intervening. Every intervention may have a cost of 1, regardless of if it was needed, but a missed intervention that was needed has a cost of 5. Finally, not making an intervention when one was unneeded has a cost of 0. This can be written as: +For example, consider a situation where an ML system identifies potential problems that require intervening. + +Every intervention may have a cost of 1, regardless of if it was needed, but a missed intervention that was needed has a cost of 5. Finally, not making an intervention when one was unneeded has a cost of 0. This can be written as: my_utility=gm.Utility([1, 1, 5, 0], 'Testing Costs') @@ -207,7 +209,7 @@ If, for example, you wish to optimize F1 without any additional constraints, you fpredictor.fit(gm.f1) -Note that the default behavior (we should minimize the demographic parity violation, but maximize F1) is inferred from standard usage but can be overwritten by setting the optional parameters `greater_is_better_obj` and `greater_is_better_const` to `True` or `False`. +The default behavior (we should minimize the demographic parity violation, but maximize F1) is inferred from standard usage but can be overwritten by setting the optional parameters `greater_is_better_obj` and `greater_is_better_const` to `True` or `False`. Where constraints cannot be satisfied, for example, if we require that the F1 score must be above 1.1, `fit` returns the solution closest to satisfying it. @@ -231,7 +233,7 @@ By default, this method reports the standard fairness metrics of SageMaker Clari fpredictor.evaluate_groups(data (optional), groups (optional), dictionary_of_methods (optional), return_original=False, verbose=False) -By default, this method reports, per group, the standard binary evaluation criteria of autogluon for both the updated predictor only, over the data used by fit. The behavior can be altered by providing either alternate data or a new dictionary of methods. Where groups is not provided, it will use the same groups as passed to `fit`, but this can be altered. If you wish to also see the per group performance of the original classifier, use `return_original=True` to receive a dict containing the per_group performance of the original and updated classifier. If verbose is set to true, the table contains the long names of methods, otherwise it reports the dictionary keys. +By default this method reports, per group, the standard binary evaluation criteria of autogluon for both the updated predictor only, over the data used by fit. The behavior can be altered by providing either alternate data or a new dictionary of methods. Where groups is not provided, it will use the same groups as passed to `fit`, but this can be altered. If you wish to also see the per group performance of the original classifier, use `return_original=True` to receive a dict containing the per_group performance of the original and updated classifier. If verbose is set to true, the table contains the long names of methods, otherwise it reports the dictionary keys. ### Fairness using Inferred Attributes @@ -244,7 +246,7 @@ Then `fpredictor` can be used in same way described above. Note that the labeled attribute is used to evaluate fairness, and the use of the inferred attributes are tuned to optimize fairness with respect to the labeled attributes. This means that even if the inferred attributes are not that accurate, they can be still used to enforce fairness, albeit with a drop in performance. -To make it easier to use inferred attributes, we provide a helper function: +To make it easier to use inferred attributes, we provide a helper function for autogluon: predictor, attribute_pred = fair.inferred_attribute_builder(train_data, 'class', 'sex')