Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Tight optimistic estimates for fast subgroup discovery

: Grosskreutz, H.; Rüping, S.; Wrobel, S.

Postprint urn:nbn:de:0011-n-798043 (241 KByte PDF)
MD5 Fingerprint: 4f48116c596060e18a7dddd6bf1056d6
The original publication is available at
Created on: 9.9.2010

Daelemans, W.:
Machine learning and knowledge discovery in databases. European conference, ECML PKDD 2008. Vol.1 : Antwerp, Belgium, September 15 - 19, 2008; Proceedings
Berlin: Springer, 2008 (Lecture Notes in Artificial Intelligence 5211)
ISBN: 978-3-540-87479-9
ISBN: 978-3-540-87478-2
ISBN: 3-540-87478-X
European Conference on Machine Learning (ECML) <19, 2008, Antwerp>
European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) <12, 2008, Antwerp>
Conference Paper, Electronic Publication
Fraunhofer IAIS ()

Subgroup discovery is the task of finding subgroups of a population which exhibit both distributional unusualness and high generality. Due to the non monotonicity of the corresponding evaluation functions, standard pruning techniques cannot be used for subgroup discovery, requiring the use of optimistic estimate techniques instead. So far, however, optimistic estimate pruning has only been considered for the extremely simple case of a binary target attribute and up to now no attempt was made to move beyond suboptimal heuristic optimistic estimates. In this paper, we show that optimistic estimate pruning can be developed into a sound and highly effective pruning approach for subgroup discovery. Based on a precise definition of optimality we show that previous estimates have been tight only in special cases. Thereafter, we present tight optimistic estimates for the most popular binary and multi-class quality functions, and present a family of increasingly efficient approximations to these optimal functions. As we show in empirical experiments, the use of our newly proposed optimistic estimates can lead to a speed up of an order of magnitude compared to previous approaches.