HASKER: An efficient algorithm for string kernels. Application to polarity classification in various languages

Popescu, MariusMariusPopescuGrozea, CristianCristianGrozeaIonescu, Radu TudorRadu TudorIonescu2022-03-052022-03-052017https://publica.fraunhofer.de/handle/publica/24976310.1016/j.procs.2017.08.207String kernels have successfully been used for various NLP tasks, ranging from text categorization by topic to native language identification. In this paper, we present a simple and efficient algorithm for computing various spectrum string kernels. When comparing two strings, we store the p-grams in the first string into a hash table, and then we apply a hash table lookup for the p-grams that occur in the second string. In terms of time, we show that our algorithm can outperform a state-of-the-art tool for computing string similarity. In terms of accuracy, we show that our approach can reach state-of-the-art performance for polarity classification in various languages. Our efficient implementation is provided online for free at http://string-kernels.herokuapp.com.enstring kernelsblended spectrum kernelintersection kernelkernel methodssimilarity-based learningpolarity classificationopining miningsentiment analysisstring kernels toolopen-source codeHASKER: An efficient algorithm for string kernels. Application to polarity classification in various languagesjournal article