Machine learning in sentiment-analysis of text information on the example of user attitudes regarding candidates for Ukrainian presidential elections 2019
Keywords:machine learning, sentiment analysis, text mining
AbstractThe main methods of machine learning for the sentiment analysis of the text are described and a comparative analysis of their effectiveness is performed. The stages of pre-processing of the text, such as stemming, deletion of stop words, algorithms for converting the text to vector form, such as bag-of-words (Bag-of-Words), TF-IDF vectorizer and Word2Vec, are considered. The goal of this study was to determine the sentiment of the comments under the publications of Ukrainian Presidential candidates (V. Zelensky and P. Poroshenko) during the 2019 election campaign.Three algorithms were used to determine the tonality of the text: the naive Bayes classifier, the support vector machine, and the convolutional neural network. Separate models were built for each candidate and a comparison of the classification quality was performed (according to metric F1). The most precise model for both data samples was a convolutional neural network.
T. Nasukawa and J.Yi, “Sentiment analysis: Capturing favorability using natural language processing”, Proc. of the 2nd Int. Conf. on Knowledge capture (KCAP), pp. 7077, 2003.
K. Dave, St. Lawrence, D. Pennock, “Mining the peanut gallery: Opinion extraction and semantic classification of product reviews”, Proc. of the Int. Conf. on World Wide Web (WWW), pp. 519528, 2003.
A.Barsegyan, Technologies of data analysis: Data Mining, Text Mining, Visual Mining, OLAP, 2nd ed. BHV-Petersburg, 2008, 384 p.
Vimala Balakrishnan, Stemming and Lemmatization: A Comparison of Retrieval Performances, 2014, 204 p.
Liu Bing, Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, May 2012.
Vincent Ng, Claire Cardie, Weakly Supervised Natural Language Learning Without Redundant Views, 2003.
X. Fulin, D. Yihao, and T. Xiaosheng, “The Architecture of Word2vec and Its Applications”, Journal of Nanjing, 2015.
Bo Pang and Lillian Lee,Opinion Mining and Sentiment Analysis, 2008.
Bo Pang and Lillian Lee, A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, 2004.
Janyce M. Wiebe, Rebecca F. Bruce, Thomas P. O’Hara, Development and use of a gold-standard data set for subjectivity classifications, 1999.
JindalLiu, Mining comparative sentences and relations, 2006.
Liu Bing, Sentiment analysis and subjectivity. Handbook of natural language processing, 2nd ed., Boca Raton: CRC Press, 2010.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781. ICLR Workshop, pp. 1–12, 2013.
N. Sebe, MS. Lew, I. Cohen, and A. Garg, “Emotion recognition using a cauchy naive bayes classifier”, in IEEE, Quebec, 2002.
Y. Kim, “Convolutional neural networks for sentence classification”, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Association for Computational Linguistics, October 2014, pp. 1746–1751.
G. Katz, N. Ofek, and B. Shapira, “Context-based sentiment analysis”, Knowledge-Based Systems. ConSent, vol. 84, no. 1, pp. 162–178, 2015.