Options
2008
Conference Paper
Titel
Audio-visual fingerprinting and cross-modal aggregation: Components and applications
Abstract
Within the last years the amount of digital media has been spread due to efficient media encoding algorithms. Hence, a large number of audio and video files are stored on the users hard disks and on popular video community platforms. Due to the lack of suitable or disobeyed metadata standards, the description of these data is often missing or misleading. Therefore, audio and visual identification algorithms have been developed, which identify videos or pieces of music and provide a suitable metadata description or copyright information based on a content database. Integrating both information, the visual and the audio part of the video for simultaneous identification is called cross-modal processing. In this paper the principle structure of an audio and a visual identification system is identified and different state-of-the-art algorithms are discussed. Furthermore, a cross-modal system is presented and especially the cross aggregation is discussed. Finally, current use cases for audio, visual and cross-modal search and retrieval are depicted.