👨🏾‍🌾 👚 👩🏽‍🍳 Definition of toxic comments in Russian 🙋🏽 ⛹️ ⛵️

Today social media has become one of the main communication platforms both online and in real life. The freedom to express different points of view, including toxic, aggressive and offensive comments, can have long-term negative consequences for people's opinions and social cohesion. Therefore, one of the most important tasks of modern society is the development of tools for the automatic detection of toxic information on the Internet to reduce the negative consequences.

This article describes how to solve this problem for the Russian language. As a data source, we used a dataset published anonymously on Kaggle, additionally checking the quality of the annotation. To create a classifying model, we fine-tuned two versions of the Multilingual Universal Sentence Encoder, Bidirectional Encoder Representations from Transformers and ruBERT. Customized model ruBERT showed F ₁ = 92.20%, it was the best classification result. We have released the trained models and code examples to the public.

1. Introduction

Today, the problem of identifying toxic comments is well solved using advanced deep learning techniques [1], [35]. Although some works directly investigate the topic of detecting insults, toxic and hate speech in Russian [2], [8], [17], there is only one publicly available dataset with toxic Russian-language comments [5]. It was published on Kaggle without any explanation of the annotation process, so for academic and practical purposes it may be unreliable without additional in-depth study.

This article is dedicated to the automatic detection of toxic comments in Russian. For this task, we checked the annotation of the Russian Language Toxic Comments Dataset [5]. Then, a classification model was created based on fine-tuning of the pretrained multilingual versions of the Multilingual Universal Sentence Encoder (M-USE) [48], Bidirectional Encoder Representations from Transformers (M-BERT) [13] and ruBERT [22]. The most accurate model ruBERT-Toxic showed F ₁ = 92,20% in the binary classification problem of toxic comments. The resulting M-BERT and M-USE models can be downloaded from github.

The structure of the article is as follows. In section 2we briefly describe other works on this topic, as well as available Russian-language datasets. In Section 3, we give a general overview of the Russian Language Toxic Comments Dataset and describe the process for checking its annotation. In Section 4, we describe the refinement of language models for the task of text classification. In section 5, we describe the classification experiment. Finally, let's talk about the performance of our system and directions for future research.

2. Other works on the topic

Extensive work has been done to detect toxic comments on various data sources. For example, Prabowo and colleagues have used Naive Bayesian Classification (NB), Support Vector Machines (SVM) and Ensemble Decision Trees (RFDT) classifier to detect hate and offensive language on Indonesian Twitter [34]. The experimental results showed an accuracy of 68.43% for the hierarchical approach with the signs of dictionary unigrams and for the SVM model. In the work of a team led by Founta [15], a deep learning neural network based on GRU with pretrained GloVe embeddings was proposed for the classification of toxic texts. The model showed high accuracy on five data sets, with AUC ranging from 92% to 98%.

More and more seminars and competitions are dedicated to detecting toxic, hateful and offensive comments. For example, HatEval and OffensEval at SemEval-2019; HASOC at FIRE-2019; Shared Task on the Identification of Offensive Language at GermEval-2019 and GermEval-2018; TRAC at COLING-2018. The models used in the problems range from traditional machine learning (eg SVM and logistic regression) to deep learning (RNN, LSTM, GRU, CNN, CapsNet, including the attention mechanism [45], [49], as well as advanced models like ELMo [31], BERT [13] and USE [9], [48]). A significant number of teams that have achieved good results [18], [24], [27], [28], [30], [36], [38], used embeddings from the listed pre-trained language models.Since representations from pretrained models performed well in classification, they were widely used in subsequent studies. For example, researchers from the University of Lorraine conducted a multiclass binary classification of Twitter messages using two approaches: training a DNN classifier with pretrained vocabulary embeddings and a carefully tuned pretrained BERT model [14]. The second approach showed significantly better results compared to CNN and bidirectional LSTM neural networks based on FastText embeddings.by training a DNN classifier with pretrained vocabulary embeddings, and a carefully tuned pretrained BERT model [14]. The second approach showed significantly better results compared to CNN and bidirectional LSTM neural networks based on FastText embeddings.by training a DNN classifier with pretrained vocabulary embeddings, and a carefully tuned pretrained BERT model [14]. The second approach showed significantly better results compared to CNN and bidirectional LSTM neural networks based on FastText embeddings.

Although a significant number of studies [7], [33], [41] have been devoted to the study of toxic and aggressive behavior in Russian-speaking social networks, not much attention has been paid to their automatic classification. To determine aggressiveness in English and Russian-language texts, Gordeev used convolutional neural networks and a random forest classifier (RFC) [17]. The set of messages annotated as aggressive contained about 1000 messages in Russian and about the same in English, but it is not publicly available. The trained CNN model showed the accuracy of the binary classification of Russian-language texts 66.68%. Based on these results, the authors concluded that convolutional neural networks and deep learning approaches are more promising for identifying aggressive texts.Andruziak et al. Proposed an unsupervised probabilistic approach with a source vocabulary to classify offensive comments on YouTube written in Ukrainian and Russian [2]. The authors published a manually tagged dataset of 2,000 comments, but it contains both Russian and Ukrainian texts, so it cannot be directly used for researching Russian-language text.

Several recent studies have focused on the automatic identification of attitudes towards migrants and ethnic groups in Russian-speaking social networks, including identification of attacks based on identity. Bodrunova with co-authors studied 363,000 Russian-language publications in LiveJournal on the topic of attitudes towards migrants from the post-Soviet republics in comparison with other nations [8]. It turned out that in Russian-language blogs, migrants did not cause significant discussion and were not subjected to the worst treatment. At the same time, representatives of the North Caucasian and Central Asian nationalities are treated in completely different ways. A group of researchers led by Bessudnov found that Russians are traditionally more hostile to people from the Caucasus and Central Asia; at the same time, Ukrainians and Moldovans are generally accepted as potential neighbors [6].And according to the findings of the collective led by Koltsova, the attitude towards representatives of Central Asian nationalities and Ukrainians is the most negative [19]. Although some academic research has focused on the definition of toxic, offensive, and hate speech, none of the authors have made their Russian language datasets publicly available. As far as we know, Russian Language Toxic Comments Dataset [5] is the only set of Russian-language toxic comments in the public domain. However, it was published on Kaggle without describing the creation and annotation process, so without detailed study it is not recommended for use in academic and practical projects.Although some academic research has focused on identifying toxic, offensive, and hate speech, none of the authors have made their Russian language datasets publicly available. As far as we know, Russian Language Toxic Comments Dataset [5] is the only set of Russian-language toxic comments in the public domain. However, it was published on Kaggle without describing the creation and annotation process, so without detailed study it is not recommended for use in academic and practical projects.Although some academic research has focused on identifying toxic, offensive, and hate speech, none of the authors have made their Russian language datasets publicly available. As far as we know, Russian Language Toxic Comments Dataset [5] is the only set of Russian-language toxic comments in the public domain. However, it was published on Kaggle without describing the creation and annotation process, so without detailed study it is not recommended for use in academic and practical projects.Russian Language Toxic Comments Dataset [5] is the only set of Russian language toxic comments in the public domain. However, it was published on Kaggle without describing the creation and annotation process, so without detailed study it is not recommended for use in academic and practical projects.Russian Language Toxic Comments Dataset [5] is the only set of Russian language toxic comments in the public domain. However, it was published on Kaggle without describing the creation and annotation process, so without detailed study it is not recommended for use in academic and practical projects.

Since there is little research devoted to the definition of toxic Russian-language comments, we decided to evaluate the work of deep learning models on the Russian Language Toxic Comments Dataset [5]. We are not aware of any classification studies based on this data source. The Multilingual BERT and Multilingual USE models are among the most widespread and successful in recent research projects. And only they officially support the Russian language. We chose to use fine-tuning as a learning transfer approach because in recent studies it gave the best classification results [13], [22], [43], [48].

3. Dataset with toxic comments

Set Russian the Language Toxic Comments the Dataset [5] is a collection of annotated comments from sites Dvach and Peekaboo . It was posted on Kaggle in 2019 and contains 14,412 comments, of which 4,826 are labeled toxic and 9,586 are non-toxic. The average comment length is 175 characters, the minimum is 21, and the maximum is 7 403.

To check the quality of annotation, we manually annotated some of the comments and compared them with the original tags using the inter-annotator agreement. We decided to consider the existing annotations as correct when reaching a significant or high level of the inter-annotator agreement.

First, we manually tagged 3000 comments and compared the resulting class labels with the original ones. The annotations were made by Russian-speaking participants of the Yandex.Toloka crowdsourcing platform, which has already been used in several academic studies of Russian-language texts [10], [29], [32], [44]. As a guide to the markup, we used the toxicity recognition instructions with additional attributes that were used in the Jigsaw Toxic Comment Classification Challenge. The annotators were asked to determine the toxicity in the texts, the level of which had to be indicated for each comment. To improve the accuracy of the markup and limit the possibility of deception, we used the following technique:

We assigned annotators a level based on their answers to control tasks and banned those who gave incorrect answers.
Restricted access to tasks for those who respond too quickly.
Restricted access to the topics' tasks, does not enter the correct captcha several times in a row.

Each comment was annotated by 3-8 annotators using dynamic overlap technique . The results were aggregated using the Dawid-Skene method [12] based on the recommendations of Yandex.Toloka. The annotators showed a high level of inter-annotator agreement, the Kripppendorf alpha coefficient was 0.81. And the Cohen's kappa coefficient between the original and our aggregated labels was 0.68, which corresponds to a significant level of inter-annotator agreement [11]. Therefore, we decided to consider the markup of the dataset as correct, especially considering the possible differences in the annotation instructions.

4. Machine learning models

4.1. Baseline approaches

For the baseline approaches, we took one basic machine learning approach and one modern neural network approach. In both cases, we did some preliminary preparation: we replaced URL and nicknames with keywords, removed punctuation marks and replaced uppercase letters with lowercase letters.

First, we applied the Multinomial Naive Bayes (MNB) model, which performed well in text classification problems [16], [40]. To create the model, we took Bag-of-Words and TF-IDF vectorization. The second model was the Bidirectional Long Short-Term Memory (BiLSTM) neural network. For the embedding layer, we pre-trained the Word2Vec embeddings ( dim= 300) [25] based on the collection of Russian-language Twitter messages from RuTweetCorp [37]. And on top of the Word2Vec embeddings, we've added two Bidirectional LSTM layers. Then we added a hidden fully connected layer and a sigmoid output layer. To reduce overfitting, we added regularization layers with Gaussian noise and exclusion layers (Dropout) to the neural network. We used Adam's optimizer with an initial learning rate of 0.001 and categorical binary cross-entropy as the loss function. The model was trained with fixed embeddings for 10 epochs. We tried to unlock embeddings in different eras while reducing the learning rate, but the results were worse. The reason was probably the size of the training set [4].

4.2. BERT Model

Two versions of the multilingual BERT _BASE model are now officially available , but only the Cased version is officially recommended. BERT _BASE takes a sequence of no more than 512 tokens and returns its representation. Tokenization is performed using WordPiece [46] with preliminary text normalization and punctuation separation. Researchers from MIPT trained BERT _BASE Cased and published ruBERT - a model for the Russian language [22]. We used both models - Multilingual BERT _BASECased and ruBERT, which contain 12 sequential transformation blocks, have a hidden size of 768, contain 12 self-attention heads and 110 million parameters. The fine tuning stage was performed with the recommended parameters from [43] and the official repository : three learning epochs, 10% warm-up stages, maximum sequence length 128, packet size 32, learning rate 5e-5.

4.3. MUSE model

Multilingual USE _Trans takes a sequence of no more than 100 tokens as input , and Multilingual USE _CNN takes a sequence of no more than 256 tokens. SentencePiece [20] tokenization is used for all supported languages. We used a pre-trained Multilingual USE _Trans , which supports 16 languages, including Russian, contains an encoder-converter with 6 transformation layers, 8 attention head blocks, has a filter size of 2048, a hidden size of 512. We also used a pre-trained Multilingual USE _CNN that supports 16 languages, including Russian, contains a CNN encoder with two CNN layers, a filter width (1, 2, 3, 5), has a filter size. For both models, we used the recommended parameters withTensorFlow Hub pages : 100 learning epochs, batch size 32, learning rate 3e-4.

5. Experiment

We compared baseline and learning transfer approaches:

Multinomial Naive Bayes classifier;
neural network Bidirectional Long Short-Term Memory (BiLSTM);
multilingual version of Bidirectional Encoder Representations from Transformers (M-BERT);
ruBERT;
two versions of Multilingual Universal Sentence Encoder (M-USE).

The quality of the classification of trained models on the test set (20%) is shown in the table. All tuned language models exceeded baseline levels in accuracy, recall and measure F ₁ . ruBERT showed F ₁ = 92.20%, this is the best result.

Binary classification of toxic Russian-language comments:

System	P	R	F ₁
MNB	87,01 %	81,22 %	83,21 %
BiLSTM	86,56 %	86,65 %	86,59 %
M − BERT_BASE − Toxic	91,19 %	91,10 %	91,15 %
ruBert − Toxic	91,91 %	92,51 %	92,20 %
M − USE_CNN − Toxic	89,69 %	90,14%	89,91 %
M − USE_Trans − Toxic	90,85 %	91,92 %	91,35 %

6.

In this article, we have used two fine-tuned versions of the Multilingual Universal Sentence Encoder [48], the Multilingual Bidirectional Encoder Representations from Transformers [13] and ruBERT [22] to identify toxic Russian-language comments. Tuned rubert _Toxic showed F ₁ = 92.20%, is the best classification result.

The resulting M-BERT and M-USE models are available on github.

Literary sources

List

Aken, B. van et al.: Challenges for toxic comment classification: An in-depth error analysis. In: Proceedings of the 2nd workshop on abusive language online (ALW2). pp. 33–42. Association for Computational Linguistics, Brussels, Belgium (2018).
Andrusyak, B. et al.: Detection of abusive speech for mixed sociolects of russian and ukrainian languages. In: The 12th workshop on recent advances in slavonic natural languages processing, RASLAN 2018, karlova studanka, czech republic, december 7–9, 2018. pp. 77–84 (2018).
Basile, V. et al.: SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In: Proceedings of the 13th international workshop on semantic evaluation. pp. 54–63. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019).
Baziotis, C. et al.: DataStories at SemEval-2017 task 4: Deep LSTM with attention for message-level and topic-based sentiment analysis. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017). pp. 747–754. Association for Computational Linguistics, Vancouver, Canada (2017).
Belchikov, A.: Russian language toxic comments, https://www.kaggle.com/ blackmoon/russian-language-toxic-comments.
Bessudnov, A., Shcherbak, A.: Ethnic discrimination in multi-ethnic societies: Evidence from russia. European Sociological Review. (2019).
Biryukova, E. V. et al.: READER’S comment in on-line magazine as a genre of internet discourse (by the material of the german and russian languages). Philological Sciences. Issues of Theory and Practice. 12, 1, 79–82 (2018).
Bodrunova, S. S. et al.: Who’s bad? Attitudes toward resettlers from the post-soviet south versus other nations in the russian blogosphere. International Journal of Communication. 11, 23 (2017).
Cer, D. M. et al.: Universal sentence encoder. ArXiv. abs/1803.11175, (2018).
Chernyak, E. et al.: Char-rnn for word stress detection in east slavic languages. CoRR. abs/1906.04082, (2019).
Cohen, J.: A coefficient of agreement for nominal scales. Educational and psychological measurement. 20, 1, 37–46 (1960).
Dawid, A. P., Skene, A. M.: Maximum likelihood estimation of observer errorrates using the em algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics). 28, 1, 20–28 (1979).
Devlin, J. et al.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers). pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019).
d’Sa, A. G. et al.: BERT and fastText embeddings for automatic detection of toxic speech. In: SIIE 2020-information systems and economic intelligence. (2020).
Founta, A. M. et al.: A unified deep learning architecture for abuse detection. In: Proceedings of the 10th acm conference on web science. pp. 105–114. Association for Computing Machinery, New York, NY, USA (2019).
Frank, E., Bouckaert, R.: Naive bayes for text classification with unbalanced classes. In: Fürnkranz, J. et al. (eds.) Knowledge discovery in databases: PKDD 2006. pp. 503–510. Springer Berlin Heidelberg, Berlin, Heidelberg (2006).
Gordeev, D.: Detecting state of aggression in sentences using cnn. In: International conference on speech and computer. pp. 240–245. Springer (2016).
Indurthi, V. et al.: FERMI at SemEval-2019 task 5: Using sentence embeddings to identify hate speech against immigrants and women in twitter. In: Proceedings of the 13th international workshop on semantic evaluation. pp. 70–74. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019).
Koltsova, O. et al.: FINDING and analyzing judgements on ethnicity in the russian-language social media. AoIR Selected Papers of Internet Research. (2017).
Kudo, T., Richardson, J.: SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing: System demonstrations. pp. 66–71. Association for Computational Linguistics, Brussels, Belgium (2018).
Kumar, R. et al. eds: Proceedings of the first workshop on trolling, aggression and cyberbullying (TRAC-2018). Association for Computational Linguistics, Santa Fe, New Mexico, USA (2018).
Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilingual transformers for Russian language. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference «Dialogue». pp. 333–340. RSUH, Moscow, Russia (2019).
Lenhart, A. et al.: Online harassment, digital abuse, and cyberstalking in america. Data; Society Research Institute (2016).
Liu, P. et al.: NULI at SemEval-2019 task 6: Transfer learning for offensive language detection using bidirectional transformers. In: Proceedings of the 13th international workshop on semantic evaluation. pp. 87–91. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019).
Mikolov, T. et al.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems—volume 2. pp. 3111–3119. Curran Associates Inc., Red Hook, NY, USA (2013).
Mishra, P. et al.: Abusive language detection with graph convolutional networks. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers). pp. 2145–2150 (2019).
Mishra, S., Mishra, S.: 3Idiots at HASOC 2019: Fine-tuning transformer neural networks for hate speech identification in indo-european languages. In: Working notes of FIRE 2019—forum for information retrieval evaluation, kolkata, india, december 12–15, 2019. pp. 208–213 (2019).
Nikolov, A., Radivchev, V.: Nikolov-radivchev at SemEval-2019 task 6: Offensive tweet classification with BERT and ensembles. In: Proceedings of the 13th international workshop on semantic evaluation. pp. 691–695. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019).
Panchenko, A. et al.: RUSSE’2018: A Shared Task on Word Sense Induction for the Russian Language. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference «Dialogue». pp. 547–564. RSUH, Moscow, Russia (2018).
Paraschiv, A., Cercel, D.-C.: UPB at germeval-2019 task 2: BERT-based offensive language classification of german tweets. In: Preliminary proceedings of the 15th conference on natural language processing (konvens 2019). Erlangen, germany: German society for computational linguistics & language technology. pp. 396–402 (2019).
Peters, M. et al.: Deep contextualized word representations. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: Human language technologies, volume 1 (long papers). pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (2018).
Ponomareva, M. et al.: Automated word stress detection in Russian. In: Proceedings of the first workshop on subword and character level models in NLP. pp. 31–35. Association for Computational Linguistics, Copenhagen, Denmark (2017).
Potapova, R., Komalova, L.: Lexico-semantical indices of «deprivation–aggression» modality correlation in social network discourse. In: International conference on speech and computer. pp. 493–502. Springer (2017).
Prabowo, F. A. et al.: Hierarchical multi-label classification to identify hate speech and abusive language on indonesian twitter. In: 2019 6th international conference on information technology, computer and electrical engineering (icitacee). pp. 1–5 (2019).
Risch, J., Krestel, R.: Toxic comment detection in online discussions. In: Deep learning-based approaches for sentiment analysis. pp. 85–109. Springer (2020).
Risch, J. et al.: HpiDEDIS at germeval 2019: Offensive language identification using a german bert model. In: Preliminary proceedings of the 15th conference on natural language processing (konvens 2019). Erlangen, germany: German society for computational linguistics & language technology. pp. 403–408 (2019).
Rubtsova, Y.: A method for development and analysis of short text corpus for the review classification task. Proceedings of conferences Digital Libraries: Advanced Methods and Technologies, Digital Collections (RCDL’2013). Pp. 269–275 (2013).
Ruiter, D. et al.: LSV-uds at HASOC 2019: The problem of defining hate. In: Working notes of FIRE 2019—forum for information retrieval evaluation, kolkata, india, december 12–15, 2019. pp. 263–270 (2019).
Sambasivan, N. et al.: «They don’t leave us alone anywhere we go»: Gender and digital abuse in south asia. In: Proceedings of the 2019 chi conference on human factors in computing systems. Association for Computing Machinery, New York, NY, USA (2019).
Sang-Bum Kim et al.: Some effective techniques for naive bayes text classification. IEEE Transactions on Knowledge and Data Engineering. 18, 11, 1457–1466 (2006).
Shkapenko, T., Vertelova, I.: Hate speech markers in internet comments to translated articles from polish media. Political Linguistics. 70, 4, Pages 104–111 (2018).
Strus, J. M. et al.: Overview of germeval task 2, 2019 shared task on the identification of offensive language. Presented at the (2019).
Sun, C. et al.: How to fine-tune bert for text classification? In: Sun, M. et al. (eds.) Chinese computational linguistics. pp. 194–206. Springer International Publishing, Cham (2019).
Ustalov, D., Igushkin, S.: Sense inventory alignment using lexical substitutions and crowdsourcing. In: 2016 international fruct conference on intelligence, social media and web (ismw fruct). (2016).
Vaswani, A. et al.: Attention is all you need. In: Proceedings of the 31st international conference on neural information processing systems. pp. 6000–6010. Curran Associates Inc., Red Hook, NY, USA (2017).
Wu, Y. et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144. (2016).
Yang, F. et al.: Exploring deep multimodal fusion of text and photo for hate speech classification. In: Proceedings of the third workshop on abusive language online. pp. 11–18. Association for Computational Linguistics, Florence, Italy (2019).
Yang, Y. et al.: Multilingual universal sentence encoder for semantic retrieval. CoRR. abs/1907.04307, (2019).
Yang, Z. et al.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the north American chapter of the association for computational linguistics: Human language technologies. pp. 1480–1489. pp. Association for Computational Linguistics, San Diego, California (2016).

Definition of toxic comments in Russian