Support Estimation in Frequent Itemset Mining by Locality Sensitive Hashing

CC BY 4.0Pick, AnnikaAnnikaPickHorvath, TamasTamasHorvathWrobel, StefanStefanWrobel2022-03-1426.9.20192019https://publica.fraunhofer.de/handle/publica/40529910.24406/publica-fhg-4052992-s2.0-85073192850The main computational effort in generating all frequent itemsets in a transactional database is in the step of deciding whether an itemset is frequent, or not. We present a method for estimating itemset supports with two-sided error. In a preprocessing step our algorithm first partitions the database into groups of similar transactions by using locality sensitive hashing and calculates a summary for each of these groups. The support of a query itemset is then estimated by means of these summaries. Our preliminary empirical results indicate that the proposed method results in a speed-up of up to a factor of 50 on large datasets. The F-measure of the output patterns varies between 0.83 and 0.99.en005006629Support Estimation in Frequent Itemset Mining by Locality Sensitive Hashingconference paper