Audio-visual fingerprinting and cross-modal aggregation: Components and applications

Dunker, Peter; Gruhne, Matthias

2008

Conference Paper

Abstract

Within the last years the amount of digital media has been spread due to efficient media encoding algorithms. Hence, a large number of audio and video files are stored on the users hard disks and on popular video community platforms. Due to the lack of suitable or disobeyed metadata standards, the description of these data is often missing or misleading. Therefore, audio and visual identification algorithms have been developed, which identify videos or pieces of music and provide a suitable metadata description or copyright information based on a content database. Integrating both information, the visual and the audio part of the video for simultaneous identification is called cross-modal processing. In this paper the principle structure of an audio and a visual identification system is identified and different state-of-the-art algorithms are discussed. Furthermore, a cross-modal system is presented and especially the cross aggregation is discussed. Finally, current use cases for audio, visual and cross-modal search and retrieval are depicted.

Author(s)

Dunker, Peter

Gruhne, Matthias

Hauptwerk

IEEE International Symposium on Consumer Electronics, ISCE 2008

Konferenz

International Symposium on Consumer Electronics (ISCE) 2008

Options

Audio-visual fingerprinting and cross-modal aggregation: Components and applications