Options
10 February 2022
Blog Post
Titel
Enhancing Datasets For Artificial Intelligence Through Model-Based Methods
Titel Supplements
Getting enough data to train useful AI models for industrial processes
Blogbeitrag auf der Internetseite Semiconductor Engineering (https://semiengineering.com), 10. Februar 2022
Abstract
Industrial plants and processes are now digitized and networked, and AI can be used to evaluate the data generated by those facilities to increase productivity and quality. Some of the most popular machine learning applications are based on smartphone user data or sources from the Internet (social networks, Wikipedia, image databases, etc.) For the latter, very large training datasets are used, e.g., about 45 TB of text data for training the OpenAI GPT-3 [1]. Real industrial applications leverage much smaller data sets. This makes it difficult to train high-performance ML models and consequently to fully leverage the potential added value. The data sets are often incomplet, which leads directly to over-fitted AI models and a lack of generalization. At the same time, measurement campaigns covering all possible variations are not economically feasible.