Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Text classification of news articles with support vector machines

: Paaß, G.; Kindermann, J.; Leopold, E.

Sirmakessis, S.:
Text mining and its applications : Results of the NEMIS Launch Conference
Berlin: Springer, 2004 (Studies in fuzziness and soft computing 138)
ISBN: 3-540-20238-2
ISBN: 978-3-540-20238-7
NEMIS Launch Conference <1, 2003, Patrai>
International Workshop on Text Mining and its Applications <1, 2003, Patrai>
Conference Paper
Fraunhofer AIS ( IAIS) ()
text mining; press archive; kernel classifier

Support Vector Machines (SVM) can classify objects described by an effectively infinite-dimensional feature vector. This gives them the ability to use counts of different words in a document, i.e. more than 100000 words, directly for classification. In this paper we describe the results of a large number of experiments of different preprocessing strategies to generate effective input features. It turns out that n-grams of syllables and phonemes are especially effective for classification.