Options
June 17, 2024
Conference Paper
Title
Inspecting and Measuring Fairness of unlabeled Image Datasets
Abstract
Bias in training data can lead to algorithmic unfairness in machine learning tasks. Therefore, a general requirement for trustworthy AI is that data should be representative and free of bias. There are several approaches to measure fairness of a given dataset based on attributes such as gender or race. However, for unstructured data, such measures require the dataset to be labeled with respect to these attributes, and cannot be directly applied to unlabeled image datasets. We present an approach using foundation models to analyze the fairness of unlabeled images, exploiting the fact that foundation models implement a semantically consistent mapping from the unstructured image space to the embedding space. In particular, we systematically compare the embedding of a reference dataset known to be "fair" to an unlabeled image dataset. We show that the resulting data structures in the embedding support a systematic comparative analysis based on both qualitative as well as quantitative evaluation. We evaluate our approach analyzing the fairness of the target image dataset CelebA while using the FairFace dataset as reference. The validation against the ground truth labels of the CelebA dataset demonstrates the principal applicability of the overall approach. In sum, our work offers a novel perspective on fairness evaluation of images, as it requires no labeling but rather makes use of existing already labeled reference datasets.
Project(s)
ZERTIFIZIERTE KI
Funder
Ministerium für Wirtschaft, Innovation, Digitalisierung und Energie des Landes Nordrhein-Westfalen