Secure Top-k subgroup discovery

Grosskreutz, Henrik; Lemmen, B.; Rüping, Stefan

doi:10.1007/978-3-642-19896-0_4

2011

Conference Paper

Abstract

Supervised descriptive rule discovery techniques like subgroup discovery are quite popular in applications like fraud detection or clinical studies. Compared with other descriptive techniques, like classical support/confidence association rules, subgroup discovery has the advantage that it comes up with only the top-k patterns, and that it makes use of a quality function that avoids patterns uncorrelated with the target. If these techniques are to be applied in privacy-sensitive scenarios involving distributed data, precise guarantees are needed regarding the amount of information leaked during the execution of the data mining. Unfortunately, the adaptation of secure multi-party protocols for classical support/confidence association rule mining to the task of subgroup discovery is impossible for fundamental reasons. The source is the different quality function and the restriction to a fixed number of patterns - i.e. exactly the desired features of subgroup discovery. In this paper, we present a new protocol which allows distributed subgroup discovery while avoiding the disclosure of the individual databases. We analyze the properties of the protocol, describe a prototypical implementation and present experiments that demonstrate the feasibility of the approach.

Author(s)

Grosskreutz, Henrik

Lemmen, B.

Rüping, Stefan

Mainwork

Privacy and security issues in data mining and machine learning. International ECML/PKDD workshop, PSDML 2010

Conference

Workshop on Privacy and Security Issues in Data Mining and Machine Learning (PSDML) 2010

Options

Secure Top-k subgroup discovery