SFace2: Synthetic-Based Face Recognition with w-Space Identity-Driven Sampling

Boutros, Fadi; Huber, Marco; Luu, Anh Thi; Siebke, Patrick; Damer, Naser

doi:10.1109/TBIOM.2024.3371502

2024

Journal Article

Abstract

The use of synthetic data for training neural networks has recently received increased attention, especially in the area of face recognition. This was mainly motivated by the increase of privacy, ethical, and legal concerns of using privacy-sensitive authentic data to train face recognition models. Many authentic datasets such as MS-Celeb-1M or VGGFace2 that have been widely used to train state-of-the-art deep face recognition models are retracted and officially no longer maintained or provided by official sources as they often have been collected without explicit consent. Toward this end, we first propose a synthetic face generation approach, SFace which utilizes a class-conditional generative adversarial network to generate class-labeled synthetic face images. To evaluate the privacy aspect of using such synthetic data in face recognition development, we provide an extensive evaluation of the identity relation between the generated synthetic dataset and the original authentic dataset used to train the generative model. The investigation proved that the associated identity of the authentic dataset to the one with the same class label in the synthetic dataset is hardly possible, strengthening the possibility for privacy-aware face recognition training. We then propose three different learning strategies to train the face recognition model on our privacy-friendly dataset, SFace, and report the results on five authentic benchmarks, demonstrating its high potential. Noticing the relatively low (in comparison to authentic data) identity discrimination in SFace, we started by analysing the w-space of the class-conditional generator, finding identity information that is highly correlated to that in the embedding space. Based on this finding, we proposed an approach that performs the sampling in the w-space driven to generate data with higher identity discrimination, the SFace2. Our experiments showed the disentanglement of the latent w-space and the benefit of training face recognition models on the more identity-discriminated synthetic dataset SFace2.