On the fidelity versus privacy and utility trade-off of synthetic patient data

Adams, Tim; Birkenbihl, Colin; Otte, Karen; Ng, Hwei Geok; Rieling, Jonas Adrian; Näher, Anatol-Fiete; Sax, Ulrich; Prasser, Fabian; Fröhlich, Holger

doi:10.1016/j.isci.2025.112382

May 16, 2025

Journal Article

Abstract

The use of synthetic data is a widely discussed and promising solution for privacy-preserving medical research. Synthetic data may, however, not always rule out the risk of re-identifying characteristics of real patients and can vary greatly in terms of data fidelity and utility. We systematically evaluate the trade-offs between privacy, fidelity, and utility across five synthetic data models and three patient-level datasets. We evaluate fidelity based on statistical similarity to the real data, utility on three machine learning use cases, and privacy via membership inference, singling out, and attribute inference risks. Synthetic data without differential privacy (DP) maintained fidelity and utility without evident privacy breaches, whereas DP-enforced models significantly disrupted correlation structures. K-anonymity-based data sanitization of demographic features, while preserving fidelity, introduced notable privacy risks. Our findings emphasize the need to advance methods that effectively balance privacy, fidelity, and utility in synthetic patient data generation.