• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Privacy and Utility Evaluation of Synthetic Tabular Data for Machine Learning
 
  • Details
  • Full
Options
2024
Conference Paper
Title

Privacy and Utility Evaluation of Synthetic Tabular Data for Machine Learning

Abstract
Synthetic data generation approaches have attracted a lot of attention as a potential substitute for classical anonymization methods. However, synthetic data still pose a wide range of privacy risks, for example, dataset containing data points close to real data points, thus, increasing risks of linkage attacks. While differentially private generative models are generally considered immune to privacy attacks, it is not immediately evident how these models maintain privacy with reasonable utility. In this study, we evaluate the privacy and utility trade-offs in synthetic data generated by the state-of-the-art generative model CTGAN and its differentially private variant DPCTGAN for mixed tabular data domain. We conduct experiments using widely recognized benchmark datasets to highlight the importance of selecting optimal hyperparameters such that the model converges during training and produces synthetic data with satisfactory utility. Our experiments show that synthetic data generators, which were trained with differential privacy, may experience collapse during the training phase. While the addition of a smaller noise allows the training to converge, still could limit risks against privacy attacks such as membership inference and linkage.
Author(s)
Hermsen, Felix  
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Mandal, Avikarsha  
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Mainwork
Privacy and Identity Management. Sharing in a Digital World. 18th IFIP WG 9.2, 9.6/11.7, 11.6 International Summer School, Privacy and Identity 2023  
Project(s)
Digital Technologies ActiNg as a Gatekeeper to information and data flOws  
Funder
European Commission  
Conference
International Summer School, Privacy and Identity 2023  
DOI
10.1007/978-3-031-57978-3_17
10.24406/publica-4638
File(s)
synthetic_for_IFIP.pdf (1.05 MB)
Rights
Under Copyright
Language
English
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Keyword(s)
  • Synthetic Data

  • Membership Inference Attack

  • Distance to Closest Record

  • Generative Adversarial Networks

  • Differential Privacy

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024