Towards evaluating multiple predominant melody annotations in jazz recordings

Balke, Stefan; Driedger, Jonathan; Abeßer, Jakob; Dittmar, Christian; Müller, Meinard

2016

Conference Paper

Abstract

Melody estimation algorithms are typically evaluated by separately assessing the task of voice activity detection and fundamental frequency estimation. For both subtasks, computed results are typically compared to a single human reference annotation. This is problematic since different human experts may differ in how they specify a predominant melody, thus leading to a pool of equally valid reference annotations. In this paper, we address the problem of evaluating melody extraction algorithms within a jazz music scenario. Using four human and two automatically computed annotations, we discuss the limitations of standard evaluation measures and introduce an adaptation of Fleiss' kappa that can better account for multiple reference annotations. Our experiments not only highlight the behavior of the different evaluation measures, but also give deeper insights into the melody extraction task.

Author(s)

Balke, Stefan

Driedger, Jonathan

Abeßer, Jakob

Dittmar, Christian

Müller, Meinard

Mainwork

17th International Society for Music Information Retrieval Conference, ISMIR 2016. Proceedings

Conference

International Society for Music Information Retrieval (ISMIR Conference) <17, 2016, New York/NY)

Options

Towards evaluating multiple predominant melody annotations in jazz recordings