Enhancing Datasets For Artificial Intelligence Through Model-Based Methods

Mayer, Dirk; Wetzker, Ulf

10 February 2022

Blog Post

Titel

Enhancing Datasets For Artificial Intelligence Through Model-Based Methods

Titel Supplements

Getting enough data to train useful AI models for industrial processes

Blogbeitrag auf der Internetseite Semiconductor Engineering (https://semiengineering.com), 10. Februar 2022

Abstract

Industrial plants and processes are now digitized and networked, and AI can be used to evaluate the data generated by those facilities to increase productivity and quality. Some of the most popular machine learning applications are based on smartphone user data or sources from the Internet (social networks, Wikipedia, image databases, etc.) For the latter, very large training datasets are used, e.g., about 45 TB of text data for training the OpenAI GPT-3 [1]. Real industrial applications leverage much smaller data sets. This makes it difficult to train high-performance ML models and consequently to fully leverage the potential added value. The data sets are often incomplet, which leads directly to over-fitted AI models and a lack of generalization. At the same time, measurement campaigns covering all possible variations are not economically feasible.

Author(s)

Mayer, Dirk

Fraunhofer-Institut für Integrierte Schaltungen IIS

Wetzker, Ulf

Fraunhofer-Institut für Integrierte Schaltungen IIS

Options

Enhancing Datasets For Artificial Intelligence Through Model-Based Methods