Identifying ISI-indexed articles by their lexical usage: A text analysis approach

Moohebat, Mohammadreza; Raj, Ram Gopal; Thorleuchter, Dirk

doi:10.1002/asi.23194

2015

Journal Article

Abstract

This research creates an architecture for investigating the existence of probable lexical divergences between articles, categorized as Institute for Scientific Information (ISI) and non-ISI, and consequently, if such a difference is discovered, to propose the best available classification method. Based on a collection of ISI- and non-ISI-indexed articles in the areas of business and computer science, three classification models are trained. A sensitivity analysis is applied to demonstrate the impact of words in different syntactical forms on the classification decision. The results demonstrate that the lexical domains of ISI and non-ISI articles are distinguishable by machine learning techniques. Our findings indicate that the support vector machine identifies ISI-indexed articles in both disciplines with higher precision than do the Naïve Bayesian and K-Nearest Neighbors techniques.

Author(s)

Moohebat, Mohammadreza

Faculty of Computer Science and Information Technology, Department of Artificial Intelligence, University of Malaya, Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia

Raj, Ram Gopal

Faculty of Computer Science and Information Technology, Department of Artificial Intelligence, University of Malaya, Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia

Thorleuchter, Dirk

Fraunhofer-Institut für Naturwissenschaftlich-Technische Trendanalysen INT

Journal

Journal of the Association for Information Science and Technology : JASIST

Options

Identifying ISI-indexed articles by their lexical usage: A text analysis approach