Human perception of enriched topic models

Lukasiewicz, W.; Todor, A.; Paschke, A.

doi:10.1007/978-3-319-93931-5_2

2018

Conference Paper

Abstract

Topic modeling algorithms, such as LDA, find topics, hidden structures, in document corpora in an unsupervised manner. Traditionally, applications of topic modeling over textual data use the bag-of-words model, i.e. only consider words in the documents. In our previous work we developed a framework for mining enriched topic models. We proposed a bag-of-features approach, where a document consists not only of words but also of linked named entities and their related information, such as types or categories. In this work we focused on the feature engineering and selection aspects of enriched topic modeling and evaluated the results based on two measures for assessing the understandability of estimated topics for humans: model precision and topic log odds. In our 10-model experimental setup with 7 pure resource-, 2 hybrid words/resource- and one word-based model, the traditional bag-of-words models were outperformed by 5 pure resource-based models in both measures. These results show that incorporating background knowledge into topic models makes them more understandable for humans.

Author(s)

Lukasiewicz, W.

Todor, A.

Paschke, A.

Mainwork

Business information systems. 21st international conference, BIS 2018. Proceedings

Conference

International Conference on Business Information Systems (BIS) 2018

Options

Human perception of enriched topic models