Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Impact of model settings on the text-based Rao diversity index

: Zielinski, Andrea

Glänzel, Wolfgang (Ed.) ; International Society for Scientometrics and Informetrics -ISSI-:
18th International Conference on Scientometrics & Informetrics, ISSI 2021. Proceedings : Virtual Event, 12-15 July 2021, KU Leuven, Belgium
Leuven: ISSI, 2021
ISBN: 978-90-8032-822-8
International Conference on Scientometrics and Informetrics (ISSI) <18, 2021, Online>
Fraunhofer ISI ()

Topic models such as Latent Dirichlet Allocation (LDA) have been proved to be effective tools to discover latent topics in text collections in a data-driven way. These topics can be further utilized to investigate academic disciplines in terms of interdisciplinarity by means of indicators that reflect the diversity of the scientific output. This study provides a systematic analysis of model parameters that affect the diversity scores which are computed directly from the output of the LDA model. We present an empirical study on a real data set, upon which we quantify the diversity of the research within several departments of Fraunhofer (FH) and Max Planck Society (MPG) by means of scientific abstracts published in Scopus between 2008 and 2018. Our experiments show that parameter variations, i.e. the choice of the number of topics, hyper-parameters, and size and balance of the underlying data used for training the model, have a strong effect on the LDA-based Rao metrics. In particular, we could observe sharp fluctuations of the Rao index when varying over the number of topics. Due to its instability, it might not be a useful indicator of interdisciplinary.