FairerCLIP

FairerCLIP: Debiasing CLIP's Zero-Shot Predictions using Functions in RKHSs

Sepehr Dehdashtian*, Lan Wang*, Vishnu N. Boddeti

{sepehr, wanglan3, vishnu} @ msu.edu

Michigan State University

Bias in CLIP's Zero-Shot Prediction

Avg 88%
Gap 15%

Types of Attribute Dependency

Y: Hair Color
S: Gender

Spurious Correlation

Y: Cheekbone Height
S: Gender

Intrinsic Dependency

Drawbacks of Prior Works

Type of correlation: Only spurious correlation.
Need of labels: rely on GT labels.
Efficiency: struggle in convergence.

FairerCLIP

Problem Setting

Objective Function

FairerCLIP tries to maximize its objective function by (next) finding the optimal encoders for text and image that maximizes (next) the dependency between the generated representations and target attribute Y. (next) while minimizing the dependency between the generated representations and the sensitive attribute S. Also, (next) for preserving the accuracy of the model based on the cosine similarity, it tries to maximize the dependency between the generated image representations and the their corresponding text prompt. (next) tau_I, tau_T, and tau_z are the hyperparameters that control the strength of each of these terms.

We use a simplified definition of Hilbert-Schmidt Independence Criterion (HSIC) as our dependence measure.
This dependence measure has some attractive properties, such as practical ability to capture all linear and non-linear modes of dependencies, and, also analytical tractability which can help to find closed-form solutions.

Choice of Dep

Simplified Definition of HSIC

\( \text{Dep}(Z, S) :=\sum_{j=1}^r \sum_{\beta \in \mathcal U_S } \text{Cov}^2\left(Z_j, \beta(S)\right) \)

Empirical Estimation

\( \text{Dep}(f(X), S) := \frac{1}{n^2} \| \bm \Theta \bm K_X \bm H \bm L_S \|^2_F \)

Choice of Encoder

Functions in RKHSs

Universal Approximation
Computationally Efficient

Training

This is the training process of FairerCLIP and its objective function.

It takes the image and text features extracted from the CLIP model and maps them to the output representation space. In each iteration of training, a closed-form solution is calculated for each encoder, and the generated representation is passed to the other encoder. This process continues until the convergence.

The parameters of these two encoders (next) are calculated to maximizes (next) dependency between the generated representations Z and target attribute Y. (next) while minimizing dependency between Z and the sensitive attribute S. Also, (next) for preserving or improving the accuracy of the cosine similarity based classification, FairerCLIP tries to maximize the dependency between the generated image representations and their corresponding text prompt. (next) Also, tau_I and tau_T are the hyperparameters that control the strength of debiasing constraints (next) and tau_z controls the strength of the similarity constraint.

We use a simplified definition of Hilbert-Schmidt Independence Criterion (HSIC) as our dependence measure. which has some attractive properties, such as practical ability to capture all linear and non-linear modes of dependencies.

The majority of the prior works assume that they have access to the ground-truth labels for the target attribute Y and the sensitive attribute S. However, in many real-world scenarios and applications, these labels are not available.

Pseudo-Label Prediction

A Geometric Illustration

FairerCLIP

Inference Overview

Experimental Results

Settings
- Mitigating Intrinsic Dependency
- Mitigating Spurious Correlation

Mitigating Intrinsic Dependency

CelebA

Y: High Cheekbone
S: Gender

Mitigating Spurious Correlation

W/O Labels

W/ Labels

Mitigating Spurious Correlation

CFD

Computational Efficiency of Training

Summary

Mitigates:
- Spurious Correlation
- Intrinsic Dependency
Setting:
- w/ Labels
- w/o Labels

FairerCLIP: Debiasing CLIP's Zero-Shot Predictions using Functions in RKHSs

Michigan State University

Bias in CLIP's Zero-Shot Prediction

Types of Attribute Dependency

Drawbacks of Prior Works

FairerCLIP

Objective Function

Choice of Dep

Choice of Encoder

Training

Pseudo-Label Prediction

A Geometric Illustration

FairerCLIP

Experimental Results

Mitigating Intrinsic Dependency

Mitigating Spurious Correlation

Mitigating Spurious Correlation

Computational Efficiency of Training

Summary

Thank you!