Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Efficient learning for hashing proportional data

: Xu, Z.; Kersting, K.; Bauckhage, C.


Zaki, M.J. ; Institute of Electrical and Electronics Engineers -IEEE-; IEEE Computer Society:
IEEE 12th International Conference on Data Mining, ICDM 2012. Proceedings. Pt.2 : Brussels, Belgium, 10 - 13 December 2012
Piscataway, NJ: IEEE, 2012
ISBN: 978-1-4673-4649-8
ISBN: 978-0-7695-4905-7
International Conference on Data Mining (ICDM) <12, 2012, Brussels>
Fraunhofer IAIS ()

Spectral hashing (SH) seeks compact binary codes of data points so that Hamming distances between codes correlate with data similarity. Quickly learning such codes typically boils down to principle component analysis (PCA). However, this is only justified for normally distributed data. For proportional data (normalized histograms), this is not the case. Due to the sum-to-unity constraint, features that are as independent as possible will not all be uncorrelated. In this paper, we show that a linear-time transformation efficiently copes with sum-to-unity constraints: first, we select a small number K of diverse data points by maximizing the volume of the simplex spanned by these prototypes; second, we represent each data point by means of its cosine similarities to the K selected prototypes. This maximum volume hashing is sensible since each dimension in the transformed space is likely to follow a von Mises (vM) distribution, and, in very high dimensions, the vM distribution closely resembles a Gaussian distribution. This justifies to employ PCA on the transformed data. Our extensive experiments validate this: maximum volume hashing outperforms spectral hashing and other state of the art techniques.