Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Mining data streams with dynamic confidence intervals

: Trabold, D.; Horvath, T.


Madria, S.:
Big data analytics and knowledge discovery : 18th international conference, DaWaK 2016, Porto, Portugal, September 6-8, 2016. Proceedings
Cham: Springer, 2016 (Lecture Notes in Computer Science 9829)
ISBN: 978-3-319-43945-7 (print)
ISBN: 978-3-319-43946-4 (electronic)
International Conference on Data Warehousing and Knowledge Discovery (DaWaK) <18 2016, Porto>
Conference Paper
Fraunhofer IAIS ()

We consider data streams of transactions that are generated independently with some non-stationary distribution and regard an itemset to be interesting if its average success probability in the data stream reaches a user specified threshold. We propose an algorithm approximating the family of all interesting itemsets in a data stream. Using Chernoff bounds, our algorithm dynamically adjusts the confidence intervals of the candidate itemsetsâ probabilities. Though the method proposed assumes the itemsets to be independent Poisson trials, our extensive empirical evaluations on synthetic and real-world benchmark datasets clearly demonstrate that it can be applied also to frequent itemset mining from data streams. In addition, the transactions are not necessarily independent.