I want to share one of my crafts, perhaps it will be useful to someone too. In this article, I will share what I did to read Musk's Twitter account in a place convenient to me and have at hand the translation of the English-language tweets into Russian.
Problem
, , ( , , ), . , , , - . "" , , - .
, , , , . , . - , , . , .
, , , , , . , , , .
, API . -. .
, , , , . , , , .
- "parsing twitter without API". , , twint — , .
, , google translate, , , fairseq Facebook AI Research. , , .
python .
, , API, :
pip3 install twint
twint -u <name_of_twitter_user> -o output.csv --csv --since 2020-01-01 --retweets
, - bash, python API ( ), . - .
, :
twint -u username -s pineapple
twint -u username --email --phone
twint -g="48.880048,2.385939,1km" -o file.csv --csv
Elasticsearch SQLite
twint -u username -es localhost:9200 twint -u username --database tweets.db
,
twint -u username --followers twint -u username --following twint -u username --favorites
csv , ( , , ):
id -
conversation_id -
created_at -
tweet -
mentions - ( )
urls - ( youtube)
photos -
link -
reply_to - ,
, . . , , Facebook AI Research - fairseq, .
pip install hydra-core
:
pip install torch
pip install hydra-core==1.0.0 omegaconf==2.0.1
pip install fastBPE regex requests sacremoses subword_nmt
pytorch, . , - :
import torch
# Compare the results with English-Russian round-trip translation:
en2ru = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-ru.single_model',
tokenizer='moses', bpe='fastbpe')
ru2en = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.ru-en.single_model',
tokenizer='moses', bpe='fastbpe')
paraphrase = ru2en.translate(
en2ru.translate('PyTorch Hub is an awesome interface!')
)
assert paraphrase == 'PyTorch is a great interface!'
. torch , .
, , , , .
. , , -.
- " " (, , - , ),
1)
2)
3)
.