Bloom Encodings in DGA Detection: Improving Machine Learning Privacy by Building on Privacy-Preserving Record Linkage

Nitz, Lasse; Mandal, Avikarsha

doi:10.3897/jucs.134762

September 14, 2024

Journal Article

Abstract

The use of machine learning has shown to benefit a wide range of applications, especially for classification tasks. As such, the detection of algorithmically generated domains to identify corrupted machines has proven itself to be a mature use case with good classification performance. The use of privacy and security sensitive data, however, raises concerns in scenarios that require interaction with external parties. As one of such scenarios, we consider the training of domain generation algorithm detection classifiers in a Machine-Learning-as-a-Service (MLaaS) scenario. We evaluate the use of a Bloom encoding approach from the area of privacy-preserving record linkage to prevent the MLaaS provider from getting to know the exact classification task as well as the data samples transmitted for training and classification. We investigate the threat associated with pattern mining attacks by performing a privacy analysis for two versions of these encodings (basic and randomized). We further identify sets of parameter values which we find to provide an adequate level of protection against these attacks. We see the potential for this approach in machine learning use cases dealing with sensitive data or tasks, especially for MLaaS scenarios dealing with short data samples that lack a clear structure.