An enhanced relevance criterion for more concise supervised pattern discovery
Supervised local pattern discovery aims to find subsets of a database with a high statistical unusualness in the distribution of a target attribute. Local pattern discovery is often used to generate a human-understandable representation of the most interesting dependencies in a data set. Hence, the more crisp and concise the output is, the better. Unfortunately, standard algorithm often produce very large and redundant outputs. In this paper, we introduce delta-relevance, a definition of a more strict criterion of relevance. It will allow us to significantly reduce the output space, while being able to guarantee that every local pattern has a delta-relevant representative which is almost as good in a clearly defined sense. We show empirically that delta-relevance leads to a considerable reduction of the amount of returned patterns. We also demonstrate that in a top-k setting, the removal of not delta-relevant patterns improves the quality of the result set.