Mining data streams with dynamic confidence intervals
We consider data streams of transactions that are generated independently with some non-stationary distribution and regard an itemset to be interesting if its average success probability in the data stream reaches a user specified threshold. We propose an algorithm approximating the family of all interesting itemsets in a data stream. Using Chernoff bounds, our algorithm dynamically adjusts the confidence intervals of the candidate itemsetsâ probabilities. Though the method proposed assumes the itemsets to be independent Poisson trials, our extensive empirical evaluations on synthetic and real-world benchmark datasets clearly demonstrate that it can be applied also to frequent itemset mining from data streams. In addition, the transactions are not necessarily independent.