Sentiment analysis in Russian-language texts, part 3: challenges and prospects





Sentiment analysis has been used successfully for social media, reviews, news, and even textbooks. Based on the key research for the Russian language described in a previous article , here we look at the main challenges faced by researchers, as well as promising directions for the future. Unlike previous works, I focused on applied applications, and not on the approaches themselves and their quality of classification.



NB: The article was written for a scientific journal, so there will be many links to sources.


1. Ongoing calls



Based on the analysis of research articles, ten common problems were identified. In general, researchers typically face multiple challenges, including access to representative historical data and teaching data, as well as annotating emotions, exhaustively describing research limitations, and extracting topics from texts.



1.1. Access to representative historical data in analyzed sources



Historical data - such as publications and reviews - collected through source APIs or aggregation platforms is often used and analyzed in sentiment research. Sometimes API developers only provide partial access to published data. For example, Twitter's core API follows a policy that only the historical Twitter API provides access to all open posts. As for aggregation platforms, even if they claim that they have full access to the data of a particular source, this is impossible to verify. Therefore, there are only two ways to ensure that the data is representative of the survey:



  1. API , . , API .
  2. . , OK Data Science Lab [98].


1.2.



Although Russian is one of the most widely spoken on the Internet, the number of sources in it is significantly less than in English, especially in the field of sentiment analysis. Although a lot of studies have been devoted to the classification of emotions in Russian-language texts, only the authors of some of them have made their data sets publicly available. If none of the available sets can be applied to the research topic, then authors mark up the training sets manually. After analyzing literary sources and scientific works [142], [173], I identified and described 14 publicly available data sets for analyzing the sentiment of Russian-language texts (see Table 2). I have considered only those kits that can be accessed in accordance with the instructions described in the relevant scientific papers or on the official sites. In this regard, they were not included in the list,for example, ROMIP sets [174], [175], because it was not possible to access the data through their official website.



Table 2. Russian language datasets for sentiment analysis.

Dataset Description Annotation Classes Access
RuReviews [143] A set with examples of moods from reviews of products in the category "Women's Clothing and Accessories" in a large Russian online store. Automatic 3 GitHub Page
RuSentiment [142] An open set with examples of moods from publications on the social network VKontakte. Manual five Project page
Russian Hotel Reviews Dataset [171] Aspect set of mood examples from 50,329 Russian-language hotel reviews. Automatic five Google drive
RuSentRel [172] A set of analytical articles from the InoSMI website, which presents the author's opinion on the topic covered and numerous links mentioned by participants in the described situations. Manual 2 GitHub Page
LINIS Crowd [26] An open source set of sentiment examples, compiled from social and political articles on various media sites. Manual five Project page
Twitter Sentiment for 15 European Languages [173] , 1,6 Twitter- ( ID) 15 , . 3
SemEval-2016 Task 5: Russian [49] , , . SentiRuEval-2015 [2017]. 3
SentuRuEval-2016 [18] , Twitter- . 3
SentuRuEval-2015 [17] , . 4
RuTweetCorp [141] , , . Twitter [144]. 3
Kaggle Russian News Dataset . 3 Kaggle
Kaggle Sentiment Analysis Dataset . 3 Kaggle
Kaggle IS161AIDAY , Alem Research. 3 Kaggle
Kaggle Russian_twitter_sentiment Twitter-. 2 Kaggle


1.3. .



Using third-party analysis systems such as SentiStrength [22], Medialogy algorithms or POLYARNIK [107], authors usually do not write about the quality of the classification on the analyzed texts, so it becomes difficult to verify the accuracy of the research results. I assume that the use of third-party solutions is also due to the fact that the researchers did not annotate the test sets of texts for calculating the classification metrics. However, it seems that the introduction of this stage will significantly increase the scientific value of the work. Therefore, I highly recommend that authors manually annotate the target data samples to measure classification metrics in sentiment analysis.



1.4. Extracting topics from texts



To extract topics, most studies use topic modeling techniques. But if the share of texts related to the topic of interest is significantly lower than 1%, then topic modeling will not allow working with topic extraction [54]. Moreover, topic modeling shows low accuracy when analyzing short texts, especially if they represent everyday speech [54]. Therefore, more accurate and less noise-dependent approaches need to be developed.



1.5. Sentiment annotation guides for manual markup.



Since relevant Russian-language training kits on topics of interest are not always available, researchers usually annotate texts by hand. Without a description of the manual and other details of the annotation process, it is difficult to validate the markup quality for a dataset. Clear and simple step-by-step instructions are essential for obtaining high quality annotations from both certified linguists and non-linguistic assessors [176]. Some types of texts are especially difficult to annotate tonality, for example, the emotional state of the speaker, neutral communication of valuable information, sarcasm, ridicule, and others [162].



As an example of a guide for annotating sentiments for the Russian language, further research can use guidelines developed with the annotation of RuSentiment [142]. If you do not have certified linguists for annotating, then you can use the help of assessors from Yandex.Toloka, a crowdsourcing platform for manually annotating data. It has already been used in several academic studies of Russian-language texts [177] - [180]. It is also highly recommended to publish agreements between annotators, such as Fleiss' kappa [181] or Krippendorff's alpha [182], as well as other details of the annotation process.



1.6 Comprehensive description of limitations



Most of the papers analyzed provide incomplete lists of restrictions. In addition to technical and methodological limitations, it is strongly recommended to describe:



  • The prevalence of the Internet in the country. One of the critical limitations, because certain groups of people will not be covered by the study. According to the results of Omnibus GFK polls in December 2018 [9], the prevalence of the Internet in Russia reached 75.4%, it is used by 90 million Russians aged 16 and over. Internet use by young people (16-29 years old) and middle-aged people (20-54 years old) is close to saturation levels - 99% and 88%, respectively. But despite a significant increase in prevalence, only 36% of people over 55 use the Internet.
  • . , [183]. , . , . , , , , .
  • . , , . , , . , , , , ; ; ; , , , , ; . , . , .
  • . Freedom House 2018- [184], 53 65. 2012- , IP-, URL. 2019- . , , . , , .




1.7. .



Since people can express their opinions on a huge number of topics, analyzing all of these opinions can be resource intensive, because training sets must be annotated for each topic [186]. The absence of annotated collections of texts for training all-theme sentiment analysis models leads to a decrease in the accuracy of the analysis. According to a study [187], there are three important issues with inter-topic analysis. Opinions expressed in the context of one topic may be reversed in the context of another topic. The second problem relates to the differences between the vocabularies of emotions for different topics that need to be considered in the analysis. And finally, it is reasonable to assign a marker of the strength of emotion to each token in the dictionary of emotions.



1.8. Definition of sarcasm and irony



Online communication often contains sarcastic and ironic phrases [188] that even humans are not always easy to recognize, much less natural language processing algorithms. So far, very little research [189] has been devoted to the definition of irony and sarcasm in the Russian language. Therefore, for the correct processing of a wide range of opinions, it is required to develop and apply more approaches with automatic classification of complex speech techniques.



1.9. Defining bots



Bots have a strong impact on various aspects of social media, especially when they make up the majority of users. They can be used for various malicious tasks related to public opinion. For example, to inflate the popularity of celebrities or spread false information about politicians [190]. As a consequence, bot detection methods need to be developed and applied in sentiment studies.



1.10. Efficiency of analysis results



There is still considerable disagreement about the effectiveness of measuring responses through automatic analysis of data on the web. Several studies [191], [192] believe that social media approaches are less accurate than traditional research. Others claim [193] that these approaches show better performance than traditional methods. Therefore, it is strongly recommended, if possible, to compare the results of the study with the results obtained using other methods.



2. Promising areas of research



After reviewing the literature, I identified seven opportunities for future research.



Overall, future research should carefully examine the approaches to monitoring sentiment presented in this article in order to identify potential synergies between the individual approaches for a more complete analysis of sentiment expressed in different text sources.



2.1 Learning with the transfer of knowledge of language models



Most work uses rule-based or simple machine learning approaches. Only two studies [69], [72] used neural networks. However, recent work has shown that learning with the transfer of knowledge from pre-trained language models can effectively solve problems of classification of emotions, confidently achieving good results [43], [194] - [198].



Thus, the use of fine-tuned language models can significantly improve the quality of sentiment analysis, and hence improve the accuracy of sentiment monitoring results. Initial research was carried out in [199], the authors of which trained a shallow-and-wide convolutional neural network with ELMo-embeddings [42] and obtained new record classification metrics on the RuSentiment dataset [142], surpassing all previous neural network approaches. As a first step in this direction, researchers could train and publish baseline learning transfer rates for different Russian-language text sets.



2.2. Sentiment analysis of multilingual texts



Russia is a multinational country, and therefore multilingual. Therefore, different people and groups of people can express their opinions in different languages. Linguists in Russia count more than 150 languages, starting with Russian, which is spoken by 96.25% of the population, and ending with Negidal, which is spoken by several hundred people in the Amur region. Several studies analyzed texts in multiple languages, allowing authors to cover a wider range of sources and compare expressions of opinion on the same topic in different languages.



To classify emotions in different languages, some researchers translated all texts into one language and carried out monolingual sentiment analysis (for example, [72]). Others have developed multilingual classification models (eg, [79]). As a development of the latter approach, researchers can use pre-trained language models, for example, Bidirectional Encoder Representations from Transformers [43] and Multilingual Universal Sentence Encoder [198].



2.3. Extraction from texts of general subject topics



In most of the case modeling studies, the authors selected only a few topics for extraction and future analysis. However, this approach does not allow extracting relevant topics from large sets of text, for example, when the share of text related to topics of interest is much less than 1% [54]. Moreover, topic modeling demonstrates low accuracy in the analysis of short texts, especially if it is everyday speech [54]. The problem of topic extraction can be narrowed down not only to topic modeling, but also to the problem of text classification, if an extensive set of training data on the extraction of general subject topics is available.



The creation of such a dataset appears to be a time-consuming and resource-intensive process in the case of a basic approach with annotation with a linguist team or crowdsourcing. However, some social media platforms provide users with the ability to tag their posts, such as Reddit and Pikabu. This means that users of such social networks take over the annotation process, therefore, with additional verification, this data can potentially be used to create a training set for extracting general subject topics from messages.



2.4. Likes and other types of reaction to content as an indirect way of expressing emotions



In most studies, expressions of opinion were assessed only by the content of publications. However, likes and other types of reactions to posts can be a source of emotions expressed by readers. Therefore, this information can be taken into account when monitoring sentiment. In the study [200], preliminary work was carried out on the study of the relationship between posting likes and emotions about the publication: the researchers studied the role of the content of the publications, the relationship between the author of the publication and the personality of the user. Based on online research, the authors argue that posts with positive emotions are usually automatically Liked without careful reading. It was also noted that the positivity of publications correlates with relative and literal motives.In addition to the simple Like button, some social media platforms have introduced responsive functionality to allow users to easily show their emotional reaction to a message. For example, Facebook's set of reactions consists of Like, Love, Wow, Haha, Angry, and Sad.



In their study of emotional stimuli in the reactionary behavior of Russian-speaking Facebook users, Smolyarova et al. [201] show that the Love reaction is usually used in a straightforward manner, becoming an alternative to the traditional Like. Conversely, a post that triggers a Wow reaction is likely to be flagged with other emotions as well. Reactions such as Love, Haha and Wow tend to discourage the desire to further interact with posts through comments or a share button [202]. Thus, a potentially significant area of ​​research is the relationship between the reaction, mood of people and the mood of the publication, which can be used in the future in monitoring the mood.



2.5. Contextual classification of emotions



The user's emotional reaction in the text can strongly depend on the context: the same text in one context can express a positive tone, and in another - a negative one [203]. Therefore, when analyzing the tone of conversations, for example, responses in comments, it is very important to capture the context of the conversation in addition to the emotional reactions themselves. Researchers should pay attention to the contextual classification of emotions when they analyze conversations.



2.6. Content analysis of less researched sources



A significant proportion of research operates on data from VKontakte, Twitter, LiveJournal and YouTube, although there are other popular social networks that can be used as a data source, for example, Odnoklassniki, Moi Mir and RuTube. Thus, researchers can pay attention to Odnoklassniki, because it is the second largest Russian social network, which is used by 42% of the country's population [98]. The platform is popular with users over 35, so it can be a useful source of opinions from older generations. Moreover, Odnoklassniki's representative statistics can be accessed through OK Data Science Lab, a platform developed by Odnoklassniki for research.



2.7. Automatic analysis of social media content as an alternative to traditional surveys



At present, the results of the analysis of online texts cannot be considered as a full-fledged alternative to classical approaches to assessing opinions based on mass polls [204]. To overcome this obstacle, a theoretical basis is needed to generalize the data to the level of larger population groups [205]. Traditional mass polling assumes the association of opinions with socio-demographic groups, and reliable demographic information is usually not available on social media. Researchers can use geolocation information, user profile data, and gender and age prediction systems [206] - [211] to compare their findings with traditional opinion polls.



2.8. Monitoring the sentiment index of the Russian-speaking segment of social networks



In a groundbreaking 2010 paper [212], Mislov et al. Investigated the dynamics of sentiment over the course of the day by analyzing more than 300 million location-based Twitter messages from the United States using a dictionary-based approach. Some interesting trends were noted, such as the highest level of happiness in the early morning and late evening. Weekends were much happier than weekdays. The revealed patterns were confirmed by a study of the mood of Brazilians on Twitter [213], which used a naive Bayesian classification of moods [30]. Dzogang also investigated circadian patterns in mood changes [214]. If for many languages ​​such studies have already been carried out, then Russian-language texts have so far been little studied [93], [137]. They can be explored wider and deeper in terms of the amount of analyzed data,quality of emotion classification models and methods for calculating social indices.



Also, some studies have been devoted to the development of systems for monitoring emotions in Russian-speaking social networks, but the authors usually do not report the monitoring results. For example, researchers from ITMO University described an approach to assessing the emotional sentiment of public opinion [215], the authors of [216] considered the general principle of monitoring social networks using intelligent analysis of text messages, and in article [148] the authors described the development of software for monitoring public sentiment through Russian-language Twitter messages.



3. Conclusion



As we can see, there is already a good research base for the Russian language, covering a wide range of research objectives and analyzed sources. However, there are also a number of challenges and promising areas that should be considered when conducting new research.



4. Sources



A complete list of sources can be found here .



All Articles