Children, Russian and R

A typical situation in the current educational process at school. At 22:00, a new task appears in the child's electronic diary. At best, the day after tomorrow, but usually tomorrow.







There are three reaction options:







  • do not do at all;
  • “Not notice” and postpone the solution of the issue until later;
  • try to do it.


The second reaction is essentially identical to the first, since the snowball of such tasks will rapidly accumulate without any chance of disassembling it.







Choosing the third option, in some cases even tasks in the Russian language can be solved using R, taking into account that there is a maximum of 15-20 minutes for everything. 5 minutes for "extreme programming", 10-15 minutes for finishing. When the problem has been solved in principle, the registration can be done in the morning







It is a continuation of a series of previous publications .







What tasks are we trying to solve?



Naturally, you have to write or think an essay yourself. But there is a certain class of tasks that looks like a task for a robot and is well algorithmized.







Below are just generalized examples, for sure, many will find something to add.







Problem 1



(/). — . .







2



N



() ( ).







3



N



, '' , .







R



. «» .

N



( ), 5 .







№1. . , . . , .







library(tidyverse)
library(readr)
library(magrittr)
library(stringi)
library(udpipe)
library(tictoc)

# C    
# http://www.speakrus.ru/dict/

#  -, 125723 
voc1_df <- here::here("data", "pldf-win.zip") %>%
  readr::read_delim(col_names = "word", delim = " ",
                    locale = locale("ru", encoding = "windows-1251"))

#   , 162232 
voc2_df <- here::here("data", "litf-win.zip") %>%
  readr::read_delim(col_names = c("word", "freq"), delim = " ",
                  locale = locale("ru", encoding = "windows-1251")) %>%
  select(-freq)

#  . . , 93392 
voc3_df <- here::here("data", "zdf-win.zip") %>%
  readr::read_delim(col_names = "word", delim = " ",
                    locale = locale("ru", encoding = "windows-1251"))

#    . C. . , 1991 ., 61458 
voc4_df <- here::here("data", "ozhegovw.zip") %>%
  readr::read_delim(delim = "|", quote = "", locale = locale("ru", encoding = "windows-1251")) %>%
  select(word = VOCAB)

voc_df <- bind_rows(voc1_df, voc2_df, voc3_df, voc4_df) %>%
  distinct()

# --------------- udpipe
# ud_model <- udpipe_download_model(language = "russian")
ud_model <- udpipe_download_model(language = "russian-syntagrus")
      
      





№2.







1. , 7 , 1- — '', 3- — ''







words_df <- voc_df %>%
  filter(stri_length(word) == 7) %>%
  filter(stri_sub(word, 3, 3) == "") %>%
  filter(stri_sub(word, 1, 1) == "")
      
      





2. , " "







voc_df %>%
  filter(stri_detect_regex(tolower(word), "^[]+$")) %>%
  mutate(l = stri_length(word)) %>%
  arrange(desc(l)) %>%
  print(n = 400)
      
      





3. '' , ?







tic("")
print(lubridate::now())
ann_tbl <- voc_df %>%
  mutate(ne_word = stri_c("", word)) %>%
  inner_join(voc_df, by = c("ne_word" = "word")) %>%
  # stri_trans_general(id ="Latin-ASCII")
  {udpipe_annotate(ud_model, x = .$word, trace = TRUE)} %>%
  as_tibble()
toc()

ne_tbl <- ann_tbl %>%
  filter(upos == "NOUN") %>%
  select(word = token) %>%
  #  
  filter(stri_length(word) > 3) %>%
  filter(!stri_detect_regex(word, "$")) %>%
  mutate(ne_word = stri_c("", word)) %>%
  sample_n(200) %T>%
  print(n = 200)
      
      





.. .. , — 99% , , . , .







P.S.







  1. kremlin.ru, .
  2. , . . , . 100% .
  3. , , . .
  4. « » , , .


Previous publication - "IT Service Health Monitoring by means of R. View from a different angle . "








All Articles