Chen, L.B.L.B.ChenThiel, U.U.Thiel2022-03-102022-03-102004https://publica.fraunhofer.de/handle/publica/34480410.1007/b104284In this paper we present an approach for effective construction of domain specific thesauri. We assume that the collection is partitioned into document categories. By taking advantage of these pre-defined categories, we are able to conceptualize a new topical language model to weight term topicality more accurately. With the help of information theory, interesting relationships among thesaurus elements are discovered deductively. Based on the "Layer-Seeds" clustering algorithm, topical terms from documents in a certain category will be organized according to their relationships in a tree-like hierarchical structure --- a thesaurus. Experimental results show that the thesaurus contains satisfactory structures, although it differs to some extent from a manually created thesaurus. A first evaluation of the thesaurus in a query expansion task yields evidence that an increase of recall can be achieved without loss of precision.en004005400Language modeling for effective construction of domain specific thesauriconference paper