Local Key Estimation in Music Recordings: A Case Study Across Songs, Versions, and Annotators
While global key and chord estimation for both popular and classical music recordings have received a lot of attention, little research has been devoted to estimating the local key for classical music. Partly, this may be due to its inherent ambiguity and subjectivity, which makes annotating local keys a challenging task. In this article, we approach local key estimation with a cross-version dataset comprising nine performances (versions) of Schubert's song cycle Winterreise annotated by three different music theory experts. We consider two baseline methods that are representative for common types of signal processing algorithms: an HMM-based system and a CNN-based approach. For both models, we employ a similar training procedure including the optimization of hyperparameters on a validation split. We systematically evaluate the model predictions and provide musical explanations for key confusions. As a central contribution, we explore how different training-test splits affect the models' efficacy. Splitting along the song axis, we find that both methods perform similarly well. Splitting along the version axis, we obtain substantially higher accuracies, especially for the CNN, which seems to effectively learn the harmonic progressions of the songs (""cover song effect"") and successfully generalizes to unseen versions. We further discuss the results for several songs in detail and assess our results from the perspective of multiple annotators. This cross-annotator study reveals that a substantial part of the systems' errors coincides with annotator disagreement and an even larger part can be traced back to musically explainable relationships among different keys.