Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks
In this work, we propose considering the information from a polyphony for multi-pitch estimation (MPE) in piano music recordings. To that aim, we propose a method for local polyphony estimation (LPE), which is based on convolutional neural networks (CNNs) trained in a supervised fashion to explicitly predict the degree of polyphony. We investigate two feature representations as inputs to our method, in particular, the Constant-Q Transform (CQT) and its recent extension Folded-CQT (F-CQT). To evaluate the performance of our method, we conduct a series of experiments on real and synthetic piano recordings based on the MIDI Aligned Piano Sounds (MAPS) and the Saarland Music Data (SMD) datasets. We compare our approaches with a state-of-the art piano transcription method by informing said method with the LPE knowledge in a postprocessing stage. The experimental results suggest that using explicit LPE information can refine MPE predictions. Furthermore, it is shown that, on average, the CQT representation is preferred over F-CQT for LPE.