Publica
Hier finden Sie wissenschaftliche Publikationen aus den FraunhoferInstituten. Efficient learning for hashing proportional data
 Zaki, M.J. ; Institute of Electrical and Electronics Engineers IEEE; IEEE Computer Society: IEEE 12th International Conference on Data Mining, ICDM 2012. Proceedings. Pt.2 : Brussels, Belgium, 10  13 December 2012 Piscataway, NJ: IEEE, 2012 ISBN: 9781467346498 ISBN: 9780769549057 S.735744 
 International Conference on Data Mining (ICDM) <12, 2012, Brussels> 

 Englisch 
 Konferenzbeitrag 
 Fraunhofer IAIS () 
Abstract
Spectral hashing (SH) seeks compact binary codes of data points so that Hamming distances between codes correlate with data similarity. Quickly learning such codes typically boils down to principle component analysis (PCA). However, this is only justified for normally distributed data. For proportional data (normalized histograms), this is not the case. Due to the sumtounity constraint, features that are as independent as possible will not all be uncorrelated. In this paper, we show that a lineartime transformation efficiently copes with sumtounity constraints: first, we select a small number K of diverse data points by maximizing the volume of the simplex spanned by these prototypes; second, we represent each data point by means of its cosine similarities to the K selected prototypes. This maximum volume hashing is sensible since each dimension in the transformed space is likely to follow a von Mises (vM) distribution, and, in very high dimensions, the vM distribution closely resembles a Gaussian distribution. This justifies to employ PCA on the transformed data. Our extensive experiments validate this: maximum volume hashing outperforms spectral hashing and other state of the art techniques.