Identity-preserving Synthetic Data Generation using GANs for Face Recognition Development

Kramer, Julia Selua

2026

Master Thesis

Abstract

The availability of large-scale training datasets remains a key factor in the performance of modern Face Recognition (FR) systems. While synthetic face datasets generated using Generative Adversarial Networks (GANs) provide a promising alternative to authentic data, existing approaches rely on explicit style conditioning from style-bank or conditioning using predefined attribute sets to induce intra-class variation. Such reliance on explicit annotations limits scalability and diversity and introduces additional data annotation requirements. This thesis proposes a method for generating synthetic face datasets with high intra-class variation without explicitly conditioning on style attributes. Instead, style diversity is learned implicitly while preserving intra-class compactness. To this end, a GAN-based architecture is extended with complementary Identity and Style Loss functions. The Identity Loss promotes inter-class separability and identity preservation, whereas the Style Loss encourages perceptual diversity among samples of the same identity. Different weighting strategies are investigated to balance these competing objectives. The resulting synthetic datasets are evaluated using dataset-specific metrics as well as their effectiveness for training FR models. Experimental results demonstrate that an appropriate balance between Identity and Style Losses yields a clear separation between genuine and imposter score distributions while maintaining increased intra-class variation. Face recognition models trained on the proposed synthetic datasets achieve verification accuracies of up to 0:89 on LFW, over 0:77 on CFP-FP, and over 0:72 on AgeDB-30. Although these results do not match the performance of models trained on large-scale authentic datasets, they are competitive with several GAN-based synthetic data approaches that rely on explicit attribute conditioning. Overall, the findings in this thesis reveal that implicit modeling of intra-class variation is a viable and scalable strategy for generating synthetic datasets suitable for downstream tasks such as face recognition development.

Thesis Note

Darmstadt, TU, Master Thesis, 2026

Author(s)

Kramer, Julia Selua

Fraunhofer-Institut für Graphische Datenverarbeitung IGD

Advisor(s)

Damer, Naser

Fraunhofer-Institut für Graphische Datenverarbeitung IGD

Boutros, Fadi