• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Visual interactive creation and validation of text clustering workflows to explore document collections
 
  • Details
  • Full
Options
2017
Conference Paper
Title

Visual interactive creation and validation of text clustering workflows to explore document collections

Abstract
The exploration of text document collections is a complex and cumbersome task. Clustering techniques can help to group documents based on their content for the generation of overviews. However, the underlying clustering workflows comprising preprocessing, feature selection, clustering algorithm selection and parameterization offer several degrees of freedom. Since no "best" clustering workflow exists, users have to evaluate clustering results based on the data and analysis tasks at hand. In our approach, we present an interactive system for the creation and validation of text clustering workflows with the goal to explore document collections. The system allows users to control every step of the text clustering workflow. First, users are supported in the feature selection process via feature selection metrics-based feature ranking and linguistic filtering (e.g., part-of-speech filtering). Second, users can choose between different clustering methods and their parameterizations. Third, the clustering results can be explored based on the cluster content (documents and relevant feature terms), and cluster quality measures. Fourth, the results of different clusterings can be compared, and frequent document subsets in clusters can be identified. We validate the usefulness of the system with a usage scenario describing how users can explore document collections in a visual and interactive way.
Author(s)
Ruppert, Tobias
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Staab, Michael
TU Darmstadt
Bannach, Andreas
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Lücke-Tieke, Hendrik  
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Bernard, Jürgen
TU Darmstadt GRIS
Kuijper, Arjan  orcid-logo
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Kohlhammer, Jörn  orcid-logo
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Mainwork
IS&T International Symposium on Electronic Imaging. Visualization and Data Analysis 2017. Online resource  
Conference
International Symposium on Electronic Imaging (EI) 2017  
DOI
10.2352/ISSN.2470-1173.2017.1.VDA-388
Language
English
Fraunhofer-Institut für Graphische Datenverarbeitung IGD  
Keyword(s)
  • Visual analytics

  • information visualization

  • text mining

  • text analysis

  • clustering

  • Lead Topic: Digitized Work

  • Lead Topic: Smart City

  • Research Line: Human computer interaction (HCI)

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024