• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Investigating word correlation at different scopes
 
  • Details
  • Full
Options
2005
Conference Paper
Title

Investigating word correlation at different scopes

Title Supplement
A latent-concept approach
Abstract
This paper presents work in progress on clustering methods that identify semantic concepts in a document collection. These methods are based on the observation that semantically related words occur close together. We investigate the size of neighborhood which should be taken into account for this purpose: sentences or documents. We further investigate how local co-occurrence affects the clustering quality by including word bigrams as additional terms. We apply two different latent-concept models, probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), to a corpus of German news stories. The resulting soft clusterings are compared with a given a priori classification of documents using an information-based distance metric. Preliminary results show that this cluster distance was smaller using (1) entire documents (compared to combinations of documents and sentences), as well as (2) combinations of unigrams and bigrams (compared to exclusive use of unigrams or bigrams).
Author(s)
Heinrich, G.
Kindermann, J.
Lauth, C.
Paaß, G.
Monzon, J.S.
Mainwork
Learning and extending lexical ontologies by using machine learning methods  
Conference
Workshop on Learning and Extending Lexical Ontologies by Using Machine Learning Methods 2005  
Language
English
AIS  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024