Options
2026
Bachelor Thesis
Title
Exploring CrossViT for Robust Face Recognition in Low-Resolution Scenarios
Abstract
Recent progress in face recognition (FR) has been driven largely by deep convolutional neural networks and, more recently, Vision Transformers (ViTs). While ViTs provide strong global feature modeling and have achieved state-of-the-art performance on high-resolution (HR) datasets, their ability to operate on low-resolution (LR) scenarios is still limited. LR face images appear frequently in real-world scenarios, such as surveillance, mobile capture,... Standard ViTs, which operate on fixed-size patches, normally struggle in these settings. To address these challenges, this work investigates the CrossViT architecture, an efficient multi-scale ViT originally proposed for generic image classification, and evaluates its suitability for robust LR FR. CrossViT processes images using parallel branches with different patch sizes and fuses them through a mechanism called cross-attention, enabling the model to analyze both global structure and local detail simultaneously. In this thesis, we systematically compare CrossViT variants against standard ViT baselines using largescale FR training data and multiple evaluation benchmarks, including challenging LR datasets such as TinyFace. We further analyze performance trade-offs, discuss limitations, and outline potential extensions for future research.
Thesis Note
Darmstadt, TU, Bachelor Thesis, 2026
Author(s)