Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Audio-visual fingerprinting and cross-modal aggregation: Components and applications

: Dunker, Peter; Gruhne, Matthias


Institute of Electrical and Electronics Engineers -IEEE-:
IEEE International Symposium on Consumer Electronics, ISCE 2008 : Vilamoura, Portugal, 14 - 16 April 2008
Piscataway/NJ: IEEE, 2008
ISBN: 978-1-4244-2422-1
4 pp.
International Symposium on Consumer Electronics (ISCE) <12, 2008, Vilamoura>
Conference Paper
Fraunhofer IDMT ()
audio coding; meta-data; video coding; video retrieval; cross-modal analysis; audio-visual analysis

Within the last years the amount of digital media has been spread due to efficient media encoding algorithms. Hence, a large number of audio and video files are stored on the users hard disks and on popular video community platforms. Due to the lack of suitable or disobeyed metadata standards, the description of these data is often missing or misleading. Therefore, audio and visual identification algorithms have been developed, which identify videos or pieces of music and provide a suitable metadata description or copyright information based on a content database. Integrating both information, the visual and the audio part of the video for simultaneous identification is called cross-modal processing. In this paper the principle structure of an audio and a visual identification system is identified and different state-of-the-art algorithms are discussed. Furthermore, a cross-modal system is presented and especially the cross aggregation is discussed. Finally, current use cases for audio, visual and cross-modal search and retrieval are depicted.