Options
2024
Conference Paper
Title
Fusing Speech and Language Models for Dementia Detection
Abstract
Accurate detection of dementia is crucial for timely intervention and care, and leveraging multimodal data holds significant potential for improving diagnostic accuracy. In this study, we explore deep learning approaches for dementia classification using the Pitt corpus, which includes brief participant descriptions of a cookie theft scene. We analyze 242 control and 307 dementia audio clips to investigate various representation learning techniques. Our best-performing approach fuses audio spectrograms with advanced language models, including Whisper model transcriptions and transformer-based feature extraction. We rigorously evaluate these models and find that our multimodal approach with an F1-score of 86.42% eclipses other single modality approaches by a considerable margin. Our findings underscore the promise of multimodal deep learning techniques in advancing the reliability of dementia detection through audio analysis, possibly paving the way for more robust and accessible diagnostic tools.
Author(s)
Conference
Rights
Under Copyright
Language
English