Advancing reliability in self-supervised transformer models through hierarchical mask attention heads

Baur, Simon; Vahidi, Amirhossein; Wang, Mengyu; Zebardast, Nazlee; Elze, Tobias; Bischl, Bernd; Rezaei, Mina; Eslami, Mohammad

doi:10.1117/12.3047444

2025

Conference Paper

Abstract

Self-supervised learning (SSL) has proven to be a powerful technique across various domains, including computer vision, natural language processing, and, more recently, medical image analysis. In critical applications such as medical diagnosis and clinical decision-making, understanding a model's predictive accuracy and confidence is essential for building trustworthy and reliable machine learning systems. However, despite the rapid advancements in SSL, few studies have focused on assessing or enhancing the reliability of these models. To address this gap, we build on Plex's definition of reliability, which emphasizes robust generalization to new tasks, adaptability to new datasets, and accurate representation of uncertainty. We propose a simple yet effective technique to improve the reliability of SSL models by introducing randomness into self-supervised transformers while maintaining their accuracy. Our approach involves training a hierarchical mask on the multi-headed attention mechanism, a key component of transformer models, and implementing a masking scheduler to adjust the masking portion dynamically during training. Through extensive experiments on diverse tasks, including in-distribution generalization, out-of-distribution generalization, semi-supervised learning, and transfer learning, we demonstrate that our method enhances prediction reliability. Using chest X-ray and ophthalmic fundus datasets such as CheXpert, ChestX-ray14, EyePACS, and APTOS, we validate our approach on chest X-ray images and retinal color fundus photos, achieving improved calibration and accuracy compared to baseline models. Our method performs on par with ensemble techniques, offering a scalable and effective solution for building more robust and trustworthy SSL models in medical and clinical applications.

Author(s)

Baur, Simon

Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI

Vahidi, Amirhossein

Wellcome Sanger Institute

Wang, Mengyu

Harvard Medical School

Zebardast, Nazlee

Harvard Medical School

Elze, Tobias

Harvard Medical School

Bischl, Bernd

Ludwig-Maximilians-Universität München

Rezaei, Mina

Ludwig-Maximilians-Universität München

Eslami, Mohammad

Harvard Medical School

Mainwork

Medical Imaging 2025. Image Processing

Conference

Conference "Medical Imaging - Image Processing" 2025

Options

Advancing reliability in self-supervised transformer models through hierarchical mask attention heads