Optimistic estimate pruning strategies for fast exhaustive subgroup discovery
Subgroup discovery is the task of finding subgroups of a population which exhibit both distributional unusualness and high generality at the same time. Since the corresponding evaluation functions are not monotonic, the standard pruning techniques from monotonic problems such as frequent set discovery cannot be used. In this paper, we show that optimistic estimate pruning, previously considered only in a very simple and heuristic way, can be developed into a sound and highly effective pruning approach for subgroup discovery. We present and prove new optimistic estimates for several commonly used subgroup quality functions, describe a subgroup discovery algorithm with novel exploration strategies based on optimistic estimates, and show that this algorithm significantly outperforms previous algorithms by a wide margin of an order of magnitude or more.