Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Identifying ISI-indexed articles by their lexical usage: A text analysis approach

: Moohebat, Mohammadreza; Raj, Ram Gopal; Thorleuchter, Dirk


Journal of the Association for Information Science and Technology : JASIST 66 (2015), Nr.3, S.501-511
ISSN: 2330-1635
ISSN: 2330-1643
Fraunhofer INT ()
automatic classification; machine learning; text mining

This research creates an architecture for investigating the existence of probable lexical divergences between articles, categorized as Institute for Scientific Information (ISI) and non-ISI, and consequently, if such a difference is discovered, to propose the best available classification method. Based on a collection of ISI- and non-ISI-indexed articles in the areas of business and computer science, three classification models are trained. A sensitivity analysis is applied to demonstrate the impact of words in different syntactical forms on the classification decision. The results demonstrate that the lexical domains of ISI and non-ISI articles are distinguishable by machine learning techniques. Our findings indicate that the support vector machine identifies ISI-indexed articles in both disciplines with higher precision than do the Naïve Bayesian and K-Nearest Neighbors techniques.