Data augmentation for time series: Traditional vs generative models on capacitive proximity time series
Large labeled quantities and diversities of training data are often needed for supervised, data-based modelling. Data distribution should cover a rich representation to support the generalizability of the trained end-to-end inference model. However, this is often hindered by limited labeled data and the expensive data collection process, especially for human activity recognition tasks. Extensive manual labeling is required. Data augmentation is thus a widely used regularization method for deep learning, especially applied on image data to increase the classification accuracy. But it is less researched for time series. In this paper, we investigate the data augmentation task on continuous capacitive time series with the example on exercise recognition. We show that the traditional data augmentation can enrich the source distribution and thus make the trained inference model more generalized. This further increases the recognition performance for unseen target data around 21.4 percentage points compared to inference model without data augmentation. The generative models such as variational autoencoder or conditional variational autoencoder can further reduce the variance on the target data.