Options
2020
Conference Paper
Title
Structured Data Preparation Pipeline for Machine Learning-Applications in Production
Abstract
The application of machine learning (ML) is becoming increasingly common in production. However, many ML-projects fail due to the existence of poor data quality. To increase its quality, data needs to be prepared. Through the consideration of versatile requirements, data preparation (DPP) is a challenging task, while accounting for 80 % of ML-projects duration. Nowadays, DPP is still performed manually and individually making it essential to structure the preparation in order to achieve high-quality data in a reasonable amount of time. Thus, we present a holistic concept for a structured and reusable DPP-pipeline for ML-applications in production. In a first step, requirements for DPP are determined based on project experiences and detailed research. Subsequently, individual steps and methods of DPP are identified and structured. The concept is successfully validated through two production use-cases by preparing data sets and implementing ML-algorithms.
Author(s)