Options
2021
Journal Article
Title
Benchmarking of Data Preprocessing Methods for Machine Learning-Applications in Production
Abstract
The application of machine learning (ML) is becoming increasingly common in production. However, many ML-projects in production fail due to poor data quality. To increase the quality, data needs to be preprocessed. Hundreds of methods exist for data preprocessing (DPP) that are selected manually depending on use-case requirements. For these reasons, DPP is currently performed unstructured and accounts for 80 % of ML-projects duration. Thus, we introduce a structured DPP-approach, in which DPP-methods are recommended based on production use-case requirements by benchmarking identified DPP-methods according to ML-model performance on five data sets. The approach is validated through two new use-cases.