Enabling Federated Learning: Generating Synthetic Clients through Time Series Data Augmentation

Heinrich, Ferdinand; Egger, Tim; Ghaeni, Hadi; Kormann, Benjamin; Wenninger, Franz

doi:10.1109/SSI65953.2025.11107210

2025

Conference Paper

Abstract

When developing a federated machine learning concept, there is often insufficient data available to test its feasibility and scalability. This work focuses particularly on time series data, which often occurs in distributed predictive maintenance scenarios. Data engineers are often provided with limited self-recorded datasets from laboratory test rigs or a small number of test systems. However, during inference, the installed sensors of each distributed client may differ from those of other clients and may not follow the same distribution as laboratory data. This non-IID property is inherent in many federated learning problems. The provided data augmentation software addresses the research question of how to test whether a federated learning concept still converges when only limited datasets are available. We have therefore developed open-source software that can perform data augmentation to generate synthetic time series datasets to simulate federated learning with many clients. The software emulates typical sensor data challenges, such as offset, noise, drift, time shifts, and missing data, in order to augment the given data. This gives researchers a tool with which they can quickly test the limits of their federated learning approach, helping them improve the robustness and scalability of their training. The source code is publicly available at: https://gitlab. cc-asp.fraunhofer.de/MLS-public/federated-data-augmentation.