Options
2024
Conference Paper
Title
A Multimodal, Multilabel Approach to Recognize Emotions in Oral History Interviews
Abstract
In this paper, we present a multilabel approach for multimodal emotion recognition in the challenging but unexplored use case of analyzing oral history interviews. Oral history is a methodological tool where historical research into the past is conducted through recorded video interviews that reflect the narrator's personal experiences. The analysis of emotional content in oral history interviews is helpful in studies to understand trauma and historical memories of individuals. The emotions present in these interviews are subtle, natural, and complex. The lack of self-reported labels necessitates the use of observer-reported emotion labels. However, the complexity of the emotions as well as the diversity in perception of emotions makes it difficult to describe these emotions using single categories. Furthermore, unimodal analysis relying only on one of facial expressions, vocal cues, or spoken words is inadequate given the multimodal nature of the narration. To address these challenges, this paper proposes a multilabel, multimodal approach to perform emotion recognition on the novel HdG dataset, consisting of German oral history interviews. The proposed approach utilizes the state-of-the-art Multimodal Transformer for the fusion of audio, textual, and visual features and extends the work to perform multilabel classification on the six Ekman emotion classes. Our approach achieves a mean AUC score of 0.74 and a mean balAcc score of 0.70, significantly outperforming previous unimodal multiclass methods and setting a benchmark for future multimodal emotion recognition research in this domain.
Author(s)