Options
2023
Conference Paper
Title
Introducing DiMCAT for processing and analyzing notated music on a very large scale
Abstract
As corpora of digital musical scores continue to grow, the need for research tools capable of manipulating such data efficiently, with an intuitive interface, and support for a diversity of file formats, becomes increasingly pressing. In response, this paper introduces the Digital Musicology Corpus Analysis Toolkit (DiMCAT), a Python library for processing large corpora of digitally encoded musical scores. Equally aimed at music-analytical corpus studies, MIR, and machine-learning research, DiMCAT performs common data transformations and analyses using dataframes. Dataframes reduce the inherent complexity of atomic score contents (e.g., notes), larger score entities (e.g., measures), and abstractions (e.g., chord symbols) into easily manipulable computational structures, whose vectorized operations scale to large quantities of musical material. The design of DiMCAT's API prioritizes computational speed and ease of use, thus aiming to cater to machine-learning practitioners and musicologists alike.
Author(s)
Anton Bruckner Private University, EPFL - École Polytechnique Fédérale de Lausanne
Keyword(s)