Options
2026
Paper (Preprint, Research Paper, Review Paper, White Paper, etc.)
Title
ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers
Abstract
Face Image Quality Assessment (FIQA) is essential for reliable face recognition systems. Current approaches primarily exploit only final-layer representations, while training-free methods require multiple forward passes or backpropagation. We propose ViTNT-FIQA1, a trainingfree approach that measures the stability of patch embedding evolution across intermediate Vision Transformer (ViT) blocks. We demonstrate that high-quality face images exhibit stable feature refinement trajectories across blocks, while degraded images show erratic transformations. Our method computes Euclidean distances between L2-normalized patch embeddings from consecutive transformer blocks and aggregates them into image-level quality scores. We empirically validate this correlation on a quality-labeled synthetic dataset with controlled degradation levels. Unlike existing training-free approaches, ViTNT-FIQA requires only a single forward pass without backpropagation or architectural modifications. Through extensive evaluation on eight benchmarks (LFW, AgeDB- 30, CFP-FP, CALFW, Adience, CPLFW, XQLFW, IJBC), we show that ViTNT-FIQA achieves competitive performance with state-of-the-art methods while maintaining computational efficiency and immediate applicability to any pre-trained ViT-based face recognition model.
Author(s)
Open Access
File(s)
Rights
Use according to copyright law
Language
English
Keyword(s)
Branche: Infrastructure and Public Services
Research Line: Computer vision (CV)
Research Line: Human computer interaction (HCI)
Research Line: Machine learning (ML)
LTA: Interactive decision-making support and assistance systems
LTA: Machine intelligence, algorithms, and data structures (incl. semantics)
LTA: Generation, capture, processing, and output of images and 3D models
Biometrics
Face Recognition
Machine learning
Deep learning
ATHENE