Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Adaptive perceptual audio hashing

: Rosenbaum, Clemens
: Schmucker, Martin

Darmstadt, 2007, 52 pp.
Darmstadt, TU, Bachelor Thesis, 2007
Bachelor Thesis
Fraunhofer IGD ()
audio; fingerprinting; speech processing; perceptual hashing

This work deals with a perceptual hashing algorithm for audio streams. Perceptual hashing, or fingerprinting, is a technique which extracts the perceptual most relevant data from a piece of media. This process results in a hash, which on the one hand enables easy recognition of the same data later on and on the other hand to give similiarity measures between different pieces of media. The probably best known application is recognition of, in case of audio hashing, music pieces in the radio. Imagine, you were sitting in your car and were listening to the radio. You hear a song you do not know, but you would like to know it's name and it's writer. So you call a phone number and hold your cell phone to the radio speaker, thus enabling the other side of the call listening to the music. Ten seconds later, you end the call, and another ten seconds later, you receive a SMS containing all relevant information on that song. The process which enables this automated recognition is the mentioned perceptual hashing. The ten seconds of audio are hashed and checked to a database. If a match was found, the computer will tell you the song's information. This thesis dealt with a problem most modern audio perceptual hashing functions have - they are optimized on one single audio data type only - music, in most cases. Hence, if these algorithms have to deal with a different audio data type, they mostly failed. So a new algorithm was developed, meant to deal with any kind of audio. First, a general perceptual hashing algorithm for audio data was designed, which uses a general streaming approach but than extracts fingerprints by calculating statistically independent frequencies from very small fragments of audio, using an algorithm from the group of multi-variate analyses, the independent component analysis (ICA). These frequencies are reduced to integer-vectors used as fingerprints or hashes. Second, a way to maximize information from any kind of audio data was developed, purposed to enable this algorithm dealing with speech in special. Different approaches were tried. In the end, these two parts were combined to one algorithm, which is, up to a certain point, able to extract perceptually unique information from most audio, if a sample of the to be hashed data was given the algorithm beforehand.