Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Linguistic feature classifying and tracing

: Moohebat, Mohammadreza; Raj, Ram Gopal; Thorleuchter, Dirk; Kareem, Sameem Binti Abdul

Fulltext (PDF; )

Malaysian journal of computer science 30 (2017), No.2, pp.77-90
ISSN: 0127-9084
Journal Article, Electronic Publication
Fraunhofer INT ()
linguistic; Latent Semantic Indexing; text mining; classification

We investigate the identification and analysis of linguistic (lexico-grammatical) features used by articles of a specific year of publication. Linguistic features differ from shallow features because they represent authors’ lexico-grammatical writing styles and do not consider the bag-of-words model. Current literature uses shallow features for analyzing articles rather than using linguistic features. In contrast to related work, a new methodology is provided that identifies linguistic features that occur non-randomly for a specific year and that traces the development of the features over time based on training data. The features are created by applying a semantic structure using latent semantic indexing (LSI). This is also in contrast to related work. A linguistic feature-based prediction model is built that enables an automated assignment of articles to their years of publication. Test data are used to evaluate the prediction model. In a case study, the proposed methodology is applied to articles of the Springer book series 'Communications in Computer and Information Science' published from 2009 to 2013. They are assigned to their years of publication by the prediction model. Case study results show that the performance of the model outperforms the baseline.