Options
2025
Conference Paper
Title
Advancing reliability in self-supervised transformer models through hierarchical mask attention heads
Abstract
Self-supervised learning (SSL) has proven to be a powerful technique across various domains, including computer vision, natural language processing, and, more recently, medical image analysis. In critical applications such as medical diagnosis and clinical decision-making, understanding a model's predictive accuracy and confidence is essential for building trustworthy and reliable machine learning systems. However, despite the rapid advancements in SSL, few studies have focused on assessing or enhancing the reliability of these models. To address this gap, we build on Plex's definition of reliability, which emphasizes robust generalization to new tasks, adaptability to new datasets, and accurate representation of uncertainty. We propose a simple yet effective technique to improve the reliability of SSL models by introducing randomness into self-supervised transformers while maintaining their accuracy. Our approach involves training a hierarchical mask on the multi-headed attention mechanism, a key component of transformer models, and implementing a masking scheduler to adjust the masking portion dynamically during training. Through extensive experiments on diverse tasks, including in-distribution generalization, out-of-distribution generalization, semi-supervised learning, and transfer learning, we demonstrate that our method enhances prediction reliability. Using chest X-ray and ophthalmic fundus datasets such as CheXpert, ChestX-ray14, EyePACS, and APTOS, we validate our approach on chest X-ray images and retinal color fundus photos, achieving improved calibration and accuracy compared to baseline models. Our method performs on par with ensemble techniques, offering a scalable and effective solution for building more robust and trustworthy SSL models in medical and clinical applications.
Author(s)