Commercial Sentiment Analysis Solutions: A Comparative Study
Empirical insights into high-promising commercial sentiment analysis solutions that go beyond their vendors' claims are rare. Moreover, due to ongoing advances in the field, earlier studies are far from reflecting the current situation due to the constant evolution of the field. The present research aims to evaluate and compare current solutions. Based on tweets on the airline service quality, we test the solutions of six vendors with different market power, such as Amazon, Google, IBM, Microsoft, and Lexalytics, and MeaningCloud, and report their measures of accuracy, precision, recall, (macro) F1, time performance, and service level agreements (SLA). For positive and neutral classifications, none of the solutions showed precision of over 70%. For negative classifications, all of them demonstrate high precision of around 90%, however, only IBM Watson NLU and Google Cloud Natural Language achieve recall of over 70% and thus can be seen as worth considering for application scenarios w here negative text detection is a major concern. Overall, our study shows that an independent, critical experimental analysis of sentiment analysis services can provide interesting insights into their general reliability and particular classification accuracy beyond marketing claims to critically compare solutions based on real-world data and analyze potential weaknesses and margins of error before making an investment.