Anderes
Permanent URI for this collection
Browse
Browsing Anderes by Department "Fraunhofer-Institut für Digitale Medientechnologie IDMT"
Results Per Page
Sort Options
-
PublicationBest Practice Manual for Digital Audio Authenticity Analysis( 2022)
;Bartle, Anna ;Boss, Dagmar ;Boyarov, Alexander G. ;Grigoras, Catalin ;Michałek, MarcinNyberg, Dan -
PublicationGenerative AI and Disinformation( 2024)
;Bontcheva, Kalina ;Papadopoulous, Symeon ;Tsalakanidou, Filareti ;Gallotti, Riccardo ;Dutkiewicz, Lidia ;Krack, Noémie ;Nucci, Francesco Severio ;Spangenberg, Jochen ;Srba, Ivan ;Verdoliva, LuisaBontcheva, KalinaThe goal of this white paper is to deepen understanding of the disinformation-generation capabilities of state-of-the-art AI, as well as the use of AI in the development of new disinformation detection technologies, along with the associated ethical and legal challenges. We conclude by revisiting the challenges and opportunities brought by generative AI in the context of disinformation production, spread, detection, and debunking. -
PublicationRevisiting Representation Learning for Singing Voice Separation with Sinkhorn Distances( 2020)
;Drossos, KonstantinosIn this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a simple sinusoidal model as decoding functions to reconstruct the singing voice. To demonstrate the benefits of our method, we employ the obtained representations to the task of informed singing voice separation via binary masking, and measure the obtained separation quality by means of scale-invariant signal to distortion ratio. Our findings suggest that our method is capable of learning meaningful representations for singing voice separation, while preserving conveniences of the the short-time Fourier transform like non-negativity, smoothness, and reconstruction subject to time-frequency masking, that are desired in audio and music source separation. -
PublicationSignalTrain: Profiling Audio Compressors with Deep Neural Networks( 2019)
;Hawley, Scott H. ;Colburn, BenjaminIn this work we present a data-driven approach for predicting the behavior of (i.e., profiling) a given non-linear audio signal processing effect (henceforth "audio effect"). Our objective is to learn a mapping function that maps the unprocessed audio to the processed by the audio effect to be profiled, using time-domain samples. To that aim, we employ a deep auto-encoder model that is conditioned on both time-domain samples and the control parameters of the target audio effect. As a test-case study, we focus on the offline profiling of two dynamic range compression audio effects, one software-based and the other analog. Compressors were chosen because they are a widely used and important set of effects and because their parameterized nonlinear time-dependent nature makes them a challenging problem for a system aiming to profile "general" audio effects. Results from our experimental procedure show that the primary functional and auditory characteristics of the compressors can be captured, however there is still sufficient audible noise to merit further investigation before such methods are applied to real-world audio processing workflows. -
PublicationSound Event Detection with Depthwise Separable and Dilated Convolutions( 2020)
;Drossos, Konstantinos ;Gharib, Shayan ;Li, YanxiongVirtanen, TuomasState-of-the-art sound event detection (SED) methods usually employ a series of convolutional neural networks (CNNs) to extract useful features from the input audio signal, and then recurrent neural networks (RNNs) to model longer temporal context in the extracted features. The number of the channels of the CNNs and size of the weight matrices of the RNNs have a direct effect on the total amount of parameters of the SED method, which is to a couple of millions. Additionally, the usually long sequences that are used as an input to an SED method along with the employment of an RNN, introduce implications like increased training time, difficulty at gradient flow, and impeding the parallelization of the SED method. To tackle all these problems, we propose the replacement of the CNNs with depthwise separable convolutions and the replacement of the RNNs with dilated convolutions. We compare the proposed method to a baseline convolutional neural network on a SED task, and achieve a reduction of the amount of parameters by 85% and average training time per epoch by 78%, and an increase the average frame-wise F1 score and reduction of the average error rate by 4.6% and 3.8%, respectively.