Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

HASKER: An efficient algorithm for string kernels. Application to polarity classification in various languages

 
: Popescu, Marius; Grozea, Cristian; Ionescu, Radu Tudor

:
Volltext (PDF; )

Procedia computer science 112 (2017), S.1755-1763
ISSN: 1877-0509
International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (KES) <21, 2017, Marseille>
Englisch
Zeitschriftenaufsatz, Elektronische Publikation
Fraunhofer FOKUS ()
string kernels; blended spectrum kernel; intersection kernel; kernel methods; similarity-based learning; polarity classification; opining mining; sentiment analysis; string kernels tool; open-source code

Abstract
String kernels have successfully been used for various NLP tasks, ranging from text categorization by topic to native language identification. In this paper, we present a simple and efficient algorithm for computing various spectrum string kernels. When comparing two strings, we store the p-grams in the first string into a hash table, and then we apply a hash table lookup for the p-grams that occur in the second string. In terms of time, we show that our algorithm can outperform a state-of-the-art tool for computing string similarity. In terms of accuracy, we show that our approach can reach state-of-the-art performance for polarity classification in various languages. Our efficient implementation is provided online for free at http://string-kernels.herokuapp.com.

: http://publica.fraunhofer.de/dokumente/N-467750.html