Audio Augmentations for Semi-Supervised Learning with FixMatch

CC BY 4.0Grollmisch, SaschaSaschaGrollmischCano, EstefaniaEstefaniaCanoAbeßer, JakobJakobAbeßer2025-06-162025-06-162022https://doi.org/10.24406/publica-4779https://publica.fraunhofer.de/handle/publica/48867610.24406/publica-4779FixMatch, a semi-supervised learning method proposed for image classification, includes unlabeled data instances into the training procedure by predicting labels for differently augmented versions of the unlabeled data. In our previous work, we adapted FixMatch to audio classification by applying image augmentations to spectral representations of the audio signal. While this approach matched the performance of the supervised baseline with only a fraction of the training data, the performance of audio-specific augmentation techniques, and their effect on the FixMatch approach was not evaluated. In this work, we replace all image-based augmentation techniques with audio-specific ones and keep the feature extraction unchanged. The audio-specific approach improved upon the supervised baseline which confirms the effectiveness of the FixMatch approach for semi-supervised learning even with a completely different set of augmentations. However, the image-based approach outperforms the audio-based approach on the three audio classification tasks evaluated.enAutomatic Music AnalysisSemi-Supervised LearningAudio ClassificationAudio Augmentations for Semi-Supervised Learning with FixMatchconference paper not in proceedings