Options
2025
Journal Article
Title
AudioProtoPNet: An interpretable deep learning model for bird sound classification
Abstract
Deep learning models have significantly advanced acoustic bird monitoring by recognizing numerous bird species based on their vocalizations. However, traditional deep learning models are black boxes that provide no insight into their underlying computations, limiting their usefulness to ornithologists and machine learning engineers. Explainable models could facilitate debugging, knowledge discovery, trust, and interdisciplinary collaboration. We introduce AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is inherently interpretable, leveraging a ConvNeXt backbone to extract embeddings and a prototype learning classifier trained on these embeddings. The classifier learns prototypical patterns of each bird species’ vocalizations from spectrograms of instances in the training data. During inference, recordings are classified by comparing them to learned prototypes in the embedding space, providing explanations for the model’s decisions and insights into the most informative embeddings of each bird species. The model was trained on the BirdSet training dataset, which consists of 9734 bird species and over 6800 h of recordings. Its performance was evaluated on the seven BirdSet test datasets, covering different geographical regions. AudioProtoPNet outperformed the state-of-the-art bird sound classification model Perch, which is superior to the more popular BirdNet, achieving an average AUROC of 0.90 and a cmAP of 0.42, with relative improvements of 7.1% and 16.7% over Perch, respectively. These results demonstrate that even for the challenging task of multi-label bird sound classification, it is possible to develop powerful yet interpretable deep learning models that provide valuable insights for professionals in ornithology and machine learning.
Author(s)
Heinrich, René Patrick Gerald