
Publica
Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten. Evaluating the performance of text mining systems on real-world press archives
| Spiliopoulou, M. ; Gesellschaft für Klassifikation: From data and information analysis to knowledge engineering : Proceedings of the 29th Annual Conference of The Gesellschaft für Klassifikation e.V., University of Magdeburg, March 9-11, 2005 Berlin: Springer, 2006 (Studies in classification, data analysis, and knowledge organization) ISBN: 3-540-31313-3 ISBN: 978-3-540-31313-7 pp.414-421 |
| Gesellschaft für Klassifikation (Annual Conference) <29, 2005, Magdeburg> |
|
| English |
| Conference Paper |
| Fraunhofer AIS ( IAIS) () |
| text mining; classification; Named Entities; user interface |
Abstract
We investigate the performance of text mining systems for annotating press articles in two real-world press archives. Seven commercial systems are tested which recover the categories of a document as well named entities and catchphrases. Using cross-validation we evaluate the precision-recall characteristic. Depending on the depth of the category tree 39-79% breakeven is achieved. For one corpus 45% of the documents can be classified automatically, based on the system\'s confidence estimates. In a usability experiment the formal evaluation results are confirmed. It turns out that with respect to some features human annotators exhibit a lower performance than the text mining systems. This establishes a convincing argument to use text mining systems to support indexing of large document collections.