Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks

Taenzer, MichaelMichaelTaenzerMimilakis, Stylianos I.Stylianos I.MimilakisAbeßer, JakobJakobAbeßer2022-03-062024-04-122022-03-062021https://publica.fraunhofer.de/handle/publica/26683910.3390/electronics10070851In this work, we propose considering the information from a polyphony for multi-pitch estimation (MPE) in piano music recordings. To that aim, we propose a method for local polyphony estimation (LPE), which is based on convolutional neural networks (CNNs) trained in a supervised fashion to explicitly predict the degree of polyphony. We investigate two feature representations as inputs to our method, in particular, the Constant-Q Transform (CQT) and its recent extension Folded-CQT (F-CQT). To evaluate the performance of our method, we conduct a series of experiments on real and synthetic piano recordings based on the MIDI Aligned Piano Sounds (MAPS) and the Saarland Music Data (SMD) datasets. We compare our approaches with a state-of-the art piano transcription method by informing said method with the LPE knowledge in a postprocessing stage. The experimental results suggest that using explicit LPE information can refine MPE predictions. Furthermore, it is shown that, on average, the CQT representation is preferred over F-CQT for LPE.enConvolutional Neural Networksmulti-pitch estimationmusic information retrievalpolyphony estimationautomatic music analysis621006Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networksjournal article