Skip to content

Commit

Permalink
spelling
Browse files Browse the repository at this point in the history
  • Loading branch information
ChrisMRuss committed Jun 19, 2024
1 parent 6312122 commit 98b1896
Show file tree
Hide file tree
Showing 13 changed files with 3,495 additions and 3,411 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ OxonFair is an expressive toolkit designed to enforce a wide-range of fairness d
The toolkit is designed to overcome a range of shortcomings in existing fairness toolkits for high-capacity models that overfit to the training data.
It is designed and works for computer vision and NLP problems alongside tabular data.

For low-capacity models (e.g. linear regression over a small number of variables, and decision-trees of limited depth), we recommend [fairlearn](https://github.com/fairlearn/fairlearn).
For low-capacity models (, linear regression over a small number of variables, and decision-trees of limited depth), we recommend [fairlearn](https://github.com/fairlearn/fairlearn).

We support a range of complex classifiers including [pytorch](https://pytorch.org/), [scikit learn](https://scikit-learn.org/stable/), and ensembles provided by [autogluon](https://auto.gluon.ai/stable/index.html).

Expand Down Expand Up @@ -44,14 +44,14 @@ By default, this will only install the necessary dependencies sklearn; pandas; a

### Full install for running the test suite

Download the source of oxonfair and in the source directory run:
Download the source of OxonFair and in the source directory run:
pip install -e .\[tests\]

Now run the [Example Notebook](examples/quickstart_autogluon.ipynb) or try some of the example below.

For scikit/XGBoost, see [sklearn.md](./sklearn.md) and the [Example Notebook](examples/quickstart_xgboost.ipynb)

For pytorch, see a toy example on [adult](./examples/pytorch_minimal_demo.ipynb) and for computer vision, this [Example Notebook](examples/quickstart_DeepFairPredictor_computer_vision.ipynb)
For pytorch, see a toy example on [adult](./examples/pytorch_minimal_demo.ipynb) and for computer vision, this [Example Notebook](examples/quickstart_DeepFairPredictor_computer_vision.ipynb)

More demo notebooks are present in the [examples folder](./examples/README.md).

Expand Down Expand Up @@ -106,11 +106,11 @@ The full set of constraints and objectives can be seen in the list of measures i

### Why Another Fairness Library?

Fundamentally, most existing fairness methods are not appropriate for use with complex classifiers on high-dimensional data. These classifiers are prone to overfitting on the training data, which means that trying to balance error rates (e.g. when using equal opportunity) on the training data, is unlikely to transfer well to new unseen data. This is a particular problem when using computer vision (see [Zietlow et al.](https://arxiv.org/abs/2203.04913)), but can also occur with tabular data. Moreover, iteratively retraining complex models (a common requirement of many methods for enforcing fairness) is punitively slow when training the model once might take days, or even weeks, if you are trying to maximize performance.
Fundamentally, most existing fairness methods are not appropriate for use with complex classifiers on high-dimensional data. These classifiers are prone to overfitting on the training data, which means that trying to balance error rates (e.g., when using equal opportunity) on the training data, is unlikely to transfer well to new unseen data. This is a particular problem when using computer vision (see [Zietlow et al.](https://arxiv.org/abs/2203.04913)), but can also occur with tabular data. Moreover, iteratively retraining complex models (a common requirement of many methods for enforcing fairness) is punitively slow when training the model once might take days, or even weeks, if you are trying to maximize performance.

At the same time, postprocessing methods which allow you to train once, and then improve fairness on held-out validation data generally requires the protected attributes to be available at test time, which is often infeasible, particularly with computer vision.

OxonFair is build from the ground up to avoid these issues. It is a postprocessing approach, explicitly designed to use inferred attributes where protected attributes are not available to enforce fairness. Fairness can be enforced both on validation, or on the train set, when you are short of data and overfitting is not a concern. When enforcing fairness in deep networks or using provided attributes, a classifier is only trained once, for non network-based approaches, e.g. scikit-learn or xgboost, with inferred attributes we require the training of two classifier (one to predict the original task, and a second to estimate groups membership).
OxonFair is build from the ground up to avoid these issues. It is a postprocessing approach, explicitly designed to use inferred attributes where protected attributes are not available to enforce fairness. Fairness can be enforced both on validation, or on the train set, when you are short of data and overfitting is not a concern. When enforcing fairness in deep networks or using provided attributes, a classifier is only trained once, for non network-based approaches, e.g., scikit-learn or xgboost, with inferred attributes we require the training of two classifier (one to predict the original task, and a second to estimate groups membership).

That said, we make several additional design decisions which we believe make for a better experience for data scientists:

Expand Down Expand Up @@ -245,7 +245,7 @@ See this [notebook](./examples/compas_autogluon.ipynb) for details.

### Best Practices

It is common for machine learning algorithms to overfit training data. Therefore, if you want your fairness constraints to carry over to unseen data we recommend that they are enforced on a large validation set, rather than the training set. For low-dimensional datasets, many classifiers, with a careful choice of hyperparameter, are robust to overfitting and fairness constraints enforced on training data can carry over to unseen test data. In fact, given the choice between enforcing fairness constraints on a large training set, vs. using a significantly smaller validation set, reusing the training set may result in better generalization of the desired behavior to unseen data. However, this behavior is not guaranteed, and should always be empirically validated.
It is common for machine learning algorithms to overfit training data. Therefore, if you want your fairness constraints to carry over to unseen data we recommend that they are enforced on a large validation set, rather than the training set. For low-dimensional datasets, many classifiers, with a careful choice of hyperparameter, are robust to overfitting and fairness constraints enforced on training data can carry over to unseen test data. In fact, given the choice between enforcing fairness constraints on a large training set, vs. using a significantly smaller validation set, reusing the training set may result in better generalization of the desired behavior to unseen data. However, this behavior is not guaranteed, and should always be empirically validated.

#### Challenges with unbalanced data

Expand Down
10 changes: 5 additions & 5 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Tutorial Notebooks

This folder contains a collection of example ipython notebooks illustating different use cases.
This folder contains a collection of example ipython notebooks illustrating different use cases.

1. [Getting started with XGBoost](quickstart_xboost.ipynb)
2. [Getting started with Autogluon](quickstart_autogluon.ipynb)
3. [Getting started with Deep Learning and Computer Vision](quickstart_DeepFairPredictor_computer_vision.ipynb)
4. [Code for training deep models compatible with OxonFair](training_a_two_head_model/two_head_model_demo.py)
5. [Levelling up](levelling_up.ipynb)
6. Comparisions with FairLearn.
a. A comparision using random forests and decision trees on the adult dataset. [Here](adult_fairlearn_comparision.ipynb)
b. A comparision using xgboost on medical data. [Here](high-dim_fairlearn_comparision.ipynb)
c. A comparision of run time using xgboost on multiple groups. [Here](multi_group_fairlearn_comparision.ipynb)
6. Comparisons with FairLearn
a. A comparison using random forests and decision trees on the adult dataset. [Here](adult_fairlearn_comparision.ipynb)
b. A comparison using xgboost on medical data. [Here](high-dim_fairlearn_comparision.ipynb)
c. A comparison of run time using xgboost on multiple groups. [Here](multi_group_fairlearn_comparision.ipynb)
Loading

0 comments on commit 98b1896

Please sign in to comment.