Fast and easy text generation in any language using the Huggingface framework
As part of the course “Machine Learning. Advanced ” prepared a translation of interesting material.
We also invite you to take part in the open webinar on "Multi-armed bandits to optimize AB testing." At the webinar, the participants, together with an expert, will analyze one of the most effective use cases for reinforcement learning, and also consider how you can reformulate the AB testing problem into a Bayesian inference problem.
Introduction
— (Natural Language Processing - NLP) . , , GPT-3, , , . - , , .
GPT-2 — GPT-3. Transformers, Huggingface. , GPT-2 , : GPT2 Pytorch
GPT-2 , , ! , .
1:
2:
3:
4: ,
5:
:
1:
Huggingface Transformers, , PyTorch. PyTorch, .
PyTorch, Huggingface Transformers, :
pip install transformers
2:
Transformers, pipeline:
from transformers import pipeline
pipeline , .
3:
. :
text_generation = pipeline(“text-generation”)
— GPT-2, .
4: ,
, . :
The world is
()
prefix_text = "The world is"
5:
, , ! , :
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text[‘generated_text’])
max_length
50 . :
The world is a better place if you’re a good person.
( , .)
I’m not saying that you should be a bad person. I’m saying that you should be a good person.
( , . , .)
I’m not saying that you should be a bad
( , .)
, , , . , . , , (, top-k/top-p ) , . , Huggingface TextGenerationPipeline.
:
-, , ; , , . , Huggingface , ( ), , .
, . GPT2 CKIPLab , .
from transformers import BertTokenizerFast, AutoModelWithLMHead
:
tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-chinese’) model = AutoModelWithLMHead.from_pretrained(‘ckiplab/gpt2-base-chinese’)
:
text_generation = pipeline(“text-generation”, model=model, tokenizer=tokenizer)
, , :
我 想 要 去
prefix_text = "我 想 要 去"
##
, , :
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text['generated_text'])
:
我 想 要 去 看 看 。 」 他 說 : 「 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們
## ». : « , , ,
, , , .
, , .
! , , API, Huggingface, . Jupyter:
In [1]:
from transformers import pipeline
In [ ]:
text_generation = pipeline("text-generation")
In [7]:
prefix_text = "The world is"
In [8]:
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text['generated_text'])
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The world is a better place if you're a good person.
I'm not saying that you should be a bad person. I'm saying that you should be a good person.
I'm not saying that you should be a bad
! , . - , . , , . , .
Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).
Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.
Transformers Github, Huggingface
Transformers Official Documentation, Huggingface
Pytorch Official Website, Facebook AI Research
Fan, Angela, Mike Lewis, and Yann Dauphin. “Hierarchical neural story generation.” arXiv preprint arXiv:1805.04833 (2018).
Welleck, Sean, et al. “Neural text generation with unlikelihood training.” arXiv preprint arXiv:1908.04319 (2019).
CKIPLab Transformers Github, Chinese Knowlege and Information Processing at the Institute of Information Science and the Institute of Linguistics of Academia Sinica
«Multi-armed bandits AB ».