Options
September 14, 2024
Journal Article
Title
Bloom Encodings in DGA Detection: Improving Machine Learning Privacy by Building on Privacy-Preserving Record Linkage
Abstract
The use of machine learning has shown to benefit a wide range of applications, especially for classification tasks. As such, the detection of algorithmically generated domains to identify corrupted machines has proven itself to be a mature use case with good classification performance. The use of privacy and security sensitive data, however, raises concerns in scenarios that require interaction with external parties. As one of such scenarios, we consider the training of domain generation algorithm detection classifiers in a Machine-Learning-as-a-Service (MLaaS) scenario. We evaluate the use of a Bloom encoding approach from the area of privacy-preserving record linkage to prevent the MLaaS provider from getting to know the exact classification task as well as the data samples transmitted for training and classification. We investigate the threat associated with pattern mining attacks by performing a privacy analysis for two versions of these encodings (basic and randomized). We further identify sets of parameter values which we find to provide an adequate level of protection against these attacks. We see the potential for this approach in machine learning use cases dealing with sensitive data or tasks, especially for MLaaS scenarios dealing with short data samples that lack a clear structure.
Open Access
Rights
CC BY-ND 4.0: Creative Commons Attribution-NoDerivatives
Language
English