Predominant Jazz Instrument Recognition. Empirical Studies on Neural Network Architectures
Musicological studies on jazz performance analysis commonly require a manual selection and transcription of improvised solo parts, both of which can be time-consuming. In order to expand these studies to larger corpora of jazz recordings, algorithms for automatic content analysis can accelerate these processes. In this study, we aim to detect the presence of predominant music instruments in jazz ensemble recordings. This information can guide a structural analysis in order to detect improvised solo parts. As the main contribution, we perform a comparative study on predominant automatic instrument recognition (AIR) in jazz ensembles using a taxonomy of 11 common instruments including singing voice. We compare the performance of three state-of-the-art convolutional neural networks (CNNs) including a recurrent variant and one with an attention mechanism. Our main finding is that while all networks perform comparably, the attention-based model learns the most compact feature representation as it is by orders of magnitude smaller than the other models.