Secure distributed subgroup discovery in horizontally partitioned data
Supervised descriptive rule discovery techniques like subgroup discovery are quite popular in applications like fraud detection or clinical studies. Compared with other descriptive techniques, like classical support/confidence association rules, subgroup discovery has the advantage that it comes up with only the top-k patterns, and that it makes use of a quality function that avoids patterns uncorrelated with the target. If these techniques are to be applied in privacy-sensitive scenarios involving distributed data, precise guarantees are needed regarding the amount of information leaked during the execution of the data mining. Unfortunately, the adaptation of secure multi-party protocols for classical support/confidence association rule mining to the task of subgroup discovery is impossible for fundamental reasons. The source is the different quality function and the restriction to a fixed number of patterns -i.e. exactly the desired features of subgroup discovery. In this paper, we present new protocols which allow distributed subgroup discovery while avoiding the disclosure of the individual databases. We analyze the properties of the protocols, describe a prototypical implementation and present experiments that demonstrate the feasibility of the approach.