• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. FRoundation: Are Foundation Models Ready for Face Recognition?
 
  • Details
  • Full
Options
2025
Journal Article
Title

FRoundation: Are Foundation Models Ready for Face Recognition?

Abstract
Foundation models are predominantly trained in an unsupervised or self-supervised manner on highly diverse and large-scale datasets, making them broadly applicable to various downstream tasks. In this work, we investigate for the first time whether such models are suitable for the specific domain of face recognition (FR). We further propose and demonstrate the adaptation of these models for FR across different levels of data availability, including synthetic data. Extensive experiments are conducted on multiple foundation models and datasets of varying scales for training and fine-tuning, with evaluation on a wide range of benchmarks. Our results indicate that, despite their versatility, pre-trained foundation models tend to underperform in FR in comparison with similar architectures trained specifically for this task. However, fine-tuning foundation models yields promising results, often surpassing models trained from scratch, particularly when training data is limited. For example, after fine-tuning only on 1K identities, DINOv2 ViT-S achieved average verification accuracy on LFW, CALFW, CPLFW, CFP-FP, and AgeDB30 benchmarks of 87.10%, compared to 64.70% achieved by the same model and without fine-tuning. While training the same model architecture, ViT-S, from scratch on 1k identities reached 69.96%. With access to larger-scale FR training datasets, these performances reach 96.03% and 95.59% for the DINOv2 and CLIP ViT-L models, respectively. In comparison to the ViT-based architectures trained from scratch for FR, fine-tuned same architectures of foundation models achieve similar performance while requiring lower training computational costs and not relying on the assumption of extensive data availability. We further demonstrated the use of synthetic face data, showing improved performances over both pre-trained foundation and ViT models. Additionally, we examine demographic biases, noting slightly higher biases in certain settings when using foundation models compared to models trained from scratch. We release our code and pre-trained models’ weights at github.com/TaharChettaoui/FRoundation.
Author(s)
Chettaoui, Tahar
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Damer, Naser  
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Boutros, Fadi  orcid-logo
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Journal
Image and Vision Computing  
Project(s)
Next Generation Biometric Systems  
Next Generation Biometric Systems  
Funder
Bundesministerium für Bildung und Forschung -BMBF-  
Hessisches Ministerium für Wissenschaft und Kunst -HMWK-  
Open Access
DOI
10.1016/j.imavis.2025.105453
10.24406/h-484289
File(s)
1-s2.0-S0262885625000411-main.pdf (1.39 MB)
Rights
CC BY-NC 4.0: Creative Commons Attribution-NonCommercial
Language
English
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Keyword(s)
  • Branche: Information Technology

  • Research Line: Computer vision (CV)

  • Research Line: Human computer interaction (HCI)

  • Research Line: Machine learning (ML)

  • LTA: Interactive decision-making support and assistance systems

  • LTA: Machine intelligence, algorithms, and data structures (incl. semantics)

  • Face recognition

  • Biometrics

  • Machine learning

  • Deep learning

  • ATHENE

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024