• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Evaluating the performance of text mining systems on real-world press archives
 
  • Details
  • Full
Options
2006
Conference Paper
Title

Evaluating the performance of text mining systems on real-world press archives

Abstract
We investigate the performance of text mining systems for annotating press articles in two real-world press archives. Seven commercial systems are tested which recover the categories of a document as well named entities and catchphrases. Using cross-validation we evaluate the precision-recall characteristic. Depending on the depth of the category tree 39-79% breakeven is achieved. For one corpus 45% of the documents can be classified automatically, based on the system\'s confidence estimates. In a usability experiment the formal evaluation results are confirmed. It turns out that with respect to some features human annotators exhibit a lower performance than the text mining systems. This establishes a convincing argument to use text mining systems to support indexing of large document collections.
Author(s)
Paaß, G.
Vries, H. de
Mainwork
From data and information analysis to knowledge engineering  
Conference
Gesellschaft für Klassifikation (Annual Conference) 2005  
DOI
10.1007/3-540-31314-1_50
Language
English
AIS  
Keyword(s)
  • text mining

  • classification

  • Named Entities

  • user interface

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024