Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Structured Data Preparation Pipeline for Machine Learning-Applications in Production

 
: Frye, Maik; Schmitt, Robert Heinrich

:
Fulltext ()

Viharos, Zsolt János ; International Measurement Confederation -IMEKO-, Budapest; European Federation of National Associations of Measurement, Testing and Analytical Laboratories -EUROLAB-:
17th IMEKO TC 10 and EUROLAB Virtual Conference 2020 : Global trends in Testing, Diagnostics & Inspection for 2030, 20 - 22 October 2020, Virtual Conference
Budapest, 2020
ISBN: 978-92-990084-6-1
pp.241-246
International Measurement Confederation (IMEKO Virtual Conference) <17, 2020, Online>
European Federation of National Associations of Measurement, Testing and Analytical Laboratories (EUROLAB Virtual Conference) <2020, Online>
European Commission EC
H2020; 739592; EPIC
Centre of Excellence in Production Informatics and Contro
English
Conference Paper, Electronic Publication
Fraunhofer IPT ()
artificial intelligence; machine learning; data preparation; data quality

Abstract
The application of machine learning (ML) is becoming increasingly common in production. However, many ML-projects fail due to the existence of poor data quality. To increase its quality, data needs to be prepared. Through the consideration of versatile requirements, data preparation (DPP) is a challenging task, while accounting for 80 % of ML-projects duration. Nowadays, DPP is still performed manually and individually making it essential to structure the preparation in order to achieve high-quality data in a reasonable amount of time. Thus, we present a holistic concept for a structured and reusable DPP-pipeline for ML-applications in production. In a first step, requirements for DPP are determined based on project experiences and detailed research. Subsequently, individual steps and methods of DPP are identified and structured. The concept is successfully validated through two production use-cases by preparing data sets and implementing ML-algorithms.

: http://publica.fraunhofer.de/documents/N-635182.html