Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Goal-based evaluation of text mining results in an industrial use case

: Drawehn, Jens; Blohm, Matthias; Kintz, Maximilien; Kochanowski, Monika


Marsico, M. de ; Institute for Systems and Technologies of Information, Control and Communication -INSTICC-, Setubal:
ICPRAM 2020. Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods : Valletta, Malta, February 22-24, 2020
Setubal: SciTePress, 2020
ISBN: 978-989-758-397-1
International Conference on Pattern Recognition Applications and Methods (ICPRAM) <9, 2020, Valletta>
Conference Paper
Fraunhofer IAO ()

Artificial intelligence boosted the interest in text mining solutions in the last few years. Especially in non-English-speaking countries, where there might not be clear market leaders, a variety of solutions for different text mining scenarios has become available. Most of them support special use cases and have strengths and weaknesses in others. In text or page classification, standard measures like precision, recall, sensitivity or F1-score are prevalent. However, evaluation of feature extraction results requires more tailored approaches. We experienced many issues on the way to benchmarking feature extraction results from text, like whether a result is correct, partly correct, helpful or useless. The main contribution of this work is a method for designing a tailored evaluation procedure in an individual text extraction benchmark for one specific use case. In this context, we propose a general way of mapping the common CRISP-DM process to particularities of text mi ning projects. Furthermore, we describe possible goals of information extraction, the features to be extracted, suitable evaluation criteria and a corresponding customized scoring system. This is applied in detail in an industrial use case.