When building classification systems with demographic fairness considerations, there are two objectives to satisfy:

- Maximizing utility for the specific target task
- Ensuring fairness w.r.t. a known demographic attribute.

- What are the optimal trade-offs between utility and fairness?
- How can we numerically quantify these trade-offs from data for a desired prediction task and demographic attribute of interest?

\[ P(\hat{Y} = 1 | S = s) = P(\hat{Y} = 1 | S = s') \quad \forall s, s' \in \mathcal{S} \]

\[ P(\hat{Y} = 1 | Y = 1, S = s) = P(\hat{Y} = 1 | Y = 1, S = s') \quad \forall s, s' \in \mathcal{S} \]

\[ P(\hat{Y} = 1 | Y = y, S = s) = P(\hat{Y} = 1 | Y = y, S = s') \quad \forall s, s' \in \mathcal{S}, \forall y \in \mathcal{Y} \]

Now, we turn to the second goal of this paper, numerically quantifying the trade-offs from data. The above shows a highlevel overview of U-FaTE to learn a fair representation for a given trade-off parameter \(\lambda\). U-FaTE comprises a feature extractor and a fair encoder. It receives raw data as input and uses a feature extractor to provide features for the fair encoder. The encoder uses the extracted features and employs a closed-form solver to find the optimum function that maps these features to a new feature space that minimizes the dependency on the sensitive attribute while maximizing the dependency on the target attribute. Following this, to predict the target \(Y\) , a classifier is trained with the standard cross-entropy loss for classification problems. This process is repeated for multiple values of \(\lambda\) with \(0 \leq \lambda < 1\) to obtain the full trade-off curves.

The objective function of U-FaTE is defined as follows: \[ \min_{\Theta_{FE}, \Theta_{Enc}} \Big\{ \mathcal{L} = {\color{green}-\ \text{Dep}(f(X; \Theta_{FE}, \Theta_{Enc}), Y)} {\color{red}\ +}\ \frac{\lambda}{(1 - \lambda)} {\color{red}\text{Dep}(f(X; \Theta_{FE}, \Theta_{Enc}), S)}\Big\}, \] where \(\lambda\) is the trade-off parameter, \(f(X; \Theta_{FE}, \Theta_{Enc})\) is the output of the feature extractor and encoder (\(Z\) in the above figure), and \(\text{Dep}(\cdot, \cdot)\) is the dependency measure.

The above figure shows an overview of the training process of U-FaTE which includes two phases. In the first phase, the features of the training samples are extracted and used to find a closed-form solution for the encoder to maximize the previously mentioned objective function while the parameters of the feature extractor (\(\Theta_{FE}\)) are frozen. In the second phase, the feature extractor is trained by updating its parameters using backpropagation in order to minimize \(\mathcal{L}\) while the encoder is frozen. These two phases are repeated until convergence.

We designed experiments to answer the following:

- How far are existing fair representation learning methods from the two trade-offs?
- How far are zero-shot representations from the two trade-offs? What is the effect of network architecture and pre-training dataset?
- How far are pre-trained image representations trained in a supervised fashion from the two trade-offs?

To answer the first question, we estimate the LST and DST through U-FaTE and the trade-offs from the other baselines across various settings and datasets.

The first row depicts trade-offs on **CelebA** dataset (\(Y\): high cheekbone, \(S\): age and sex) for EOD, EOOD,
and DPV. Similarly, the second row represents trade-offs on **Folktable** dataset (\(Y\): employment status, \(S\):
age).

**Observations:** While certain methods may approach optimal accuracy in specific cases, they often fail to cover
the entire spectrum of fairness values and exhibit stability issues.

To study the fairness of zero-shot predictions of currently available CLIP models, we consider more than 90 pre-trained models from OpenCLIP and evaluate them on
**CelebA** (\(Y\): high cheekbone and \(S\): age and sex) and **FairFace** (\(Y\): sex and \(S\): ethnicity) for the same target and sensitive labels as the previous experiment.

From the results of CelebA (\(1^{\text{st}}\) row of the above figure), we observe that zero-shot models perform poorly on CelebA task accuracy, likely due to rarity of high cheekbones label in their training data.

On FairFace (\(2^{\text{nd}}\) row), we observe that CLIP models demonstrate high accuracy on FairFace due to the abundance of the target task (sex) in pre-training datasets.

Both results suggest that models trained on CommonPool exhibit higher fairness while models trained on DataComp show slightly better accuracy.

To examine the fairness of image representations from supervised pre-trained models, we assess over 900 models from
PyTorch Image Models. They are evaluated on **CelebA** and **FairFace** using the same target and sensitive labels
as previously.

From the CelebA results in the first row, we observe that models pre-trained on ImageNet22K, and OpenAI WIT have the best accuracy. However, with more than 11% EOD and EOOD and more than 20% DPV, they have significant levels of bias between the two sexes.

Results on FairFace (\(2^{\text{nd}}\) row) also reiterate that models trained on OpenAI WIT and ImageNet22K are more fair and more accurate than other datasets. We also observe that the models are generally more fair on FairFace than on CelebA.

```
@inproceedings{
dehdashtian2024utility,
title={Utility-Fairness Trade-Offs and How to Find Them},
author={Dehdashtian, Sepehr and Sadeghi, Bashir and Boddeti, Vishnu Naresh},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
```