Generating text with GPT2 and PyTorch

Fast and easy text generation in any language using the Huggingface framework

As part of the course “Machine Learning. Advanced ” prepared a translation of interesting material.



We also invite you to take part in the open webinar on "Multi-armed bandits to optimize AB testing." At the webinar, the participants, together with an expert, will analyze one of the most effective use cases for reinforcement learning, and also consider how you can reformulate the AB testing problem into a Bayesian inference problem.






Introduction

(Natural Language Processing - NLP) . , , GPT-3, , , . - , , .





GPT-2 — GPT-3. Transformers, Huggingface. , GPT-2 , : GPT2 Pytorch





GPT-2 , , ! , .





  • 1:





  • 2:





  • 3:





  • 4: ,





  • 5:





  • :





1:

Huggingface Transformers, , PyTorch. PyTorch, .





PyTorch, Huggingface Transformers, :





pip install transformers
      
      



2:

Transformers, pipeline:





from transformers import pipeline
      
      



pipeline , .





3:

. :





text_generation = pipeline(“text-generation”)
      
      



— GPT-2, .





4: ,

, . :





The world is

()

prefix_text = "The world is"
      
      



5:

, , ! , :





generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]

print(generated_text[‘generated_text’])
      
      



max_length



50 . :





The world is a better place if you’re a good person.

(   ,    .)

I’m not saying that you should be a bad person. I’m saying that you should be a good person.

(  ,      .  ,      .)

I’m not saying that you should be a bad

(  ,     .)
      
      



, , , . , . , , (, top-k/top-p ) , . , Huggingface TextGenerationPipeline.





:

-, , ; , , . , Huggingface , ( ), , .





, . GPT2 CKIPLab , .





, :





from transformers import BertTokenizerFast, AutoModelWithLMHead
      
      



:





tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-chinese’)

model = AutoModelWithLMHead.from_pretrained(‘ckiplab/gpt2-base-chinese’)
      
      



:





text_generation = pipeline(“text-generation”, model=model, tokenizer=tokenizer)
      
      



, , :





我 想 要 去

prefix_text = "我 想 要 去"

##  
      
      



, , :





generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]

print(generated_text['generated_text'])
      
      



:





我 想 要 去 看 看 。 」 他 說 : 「 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們 不 能 說, 我 們

##    ».  : «   ,    ,    , 
   ,    ,    , .
      
      



, , .





! , , API, Huggingface, . Jupyter:









In [1]:
from transformers import pipeline
 
In [ ]:
text_generation = pipeline("text-generation")
 
In [7]:
prefix_text = "The world is"
 
In [8]:
generated_text= text_generation(prefix_text, max_length=50, do_sample=False)[0]
print(generated_text['generated_text'])
 
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
 
The world is a better place if you're a good person.
 
I'm not saying that you should be a bad person. I'm saying that you should be a good person.
 
I'm not saying that you should be a bad

      
      



! , . - , . , , . , .





Brown, Tom B., et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).





Radford, Alec, et al. “Language models are unsupervised multitask learners.” OpenAI blog 1.8 (2019): 9.





Transformers Github, Huggingface





Transformers Official Documentation, Huggingface





Pytorch Official Website, Facebook AI Research





Fan, Angela, Mike Lewis, and Yann Dauphin. “Hierarchical neural story generation.” arXiv preprint arXiv:1805.04833 (2018).





Welleck, Sean, et al. “Neural text generation with unlikelihood training.” arXiv preprint arXiv:1908.04319 (2019).





CKIPLab Transformers Github, Chinese Knowlege and Information Processing at the Institute of Information Science and the Institute of Linguistics of Academia Sinica






«Machine Learning. Advanced».





«Multi-armed bandits AB ».








All Articles