Awesome Open Source
Awesome Open Source

GPT-2 Pre-training and text generation, implemented in Tensorflow 2.0

Originally implemented in tensorflow 1.14 by OapenAi :- "openai/gpt-2". OpenAi GPT-2 Paper:-"Language Models are Unsupervised Multitask Learners"

**This repository has OpenAi GPT-2 pre-training and sequence generation implementation in tensorflow 2.0, **


  • python >= 3.6
  • setuptools==41.0.1
  • ftfy==5.6
  • tqdm==4.32.1
  • Click==7.0
  • sentencepiece==0.1.83
  • tensorflow-gpu==2.3.0
  • numpy==1.16.4


$ git clone
$ cd gpt-2-tensorflow2.0
$ pip install -r requirements.txt

You can pre-train the model using sample data available in repository or you can download the data using this github repo

Pre-Training model on sample data available in repository

$ python --help

  --data-dir TEXT        training data path  [default: /data/scraped]
  --vocab-size INTEGER   byte pair vocab size  [default: 24512]
  --min-seq-len INTEGER  minimum sequence length  [default: 15]
  --max-seq-len INTEGER  maximum sequence length  [default: 512]
  --help                 Show this message and exit.
>> python

Pre-Training model on openwebtext or any other data

>> python --data-dir=data_directory --vocab-size=32000
$ python --help

  --num-layers INTEGER      No. of decoder layers  [default: 8]
  --embedding-size INTEGER  Embedding size  [default: 768]
  --num-heads INTEGER       Number of heads  [default: 8]
  --dff INTEGER             Filter Size  [default: 3072]
  --max-seq-len INTEGER     Seq length  [default: 515]
  --vocab-size INTEGER      Vocab size  [default: 24512]
  --optimizer TEXT          optimizer type  [default: adam]
  --batch-size INTEGER      batch size  [default: 8]
  --learning-rate FLOAT     learning rate  [default: 0.001]
  --graph-mode BOOLEAN      TF run mode  [default: False]
  --distributed BOOLEAN     distributed training  [default: False]
  --help                    Show this message and exit.
>> python \
  --num-layers=8 \
  --num-heads=8 \
  --dff=3072 \
  --embedding-size=768 \
  --batch-size=32 \

Distributed training on multiple gpu.

>> python \
  --num-layers=8 \
  --num-heads=8 \
  --dff=3072 \
  --embedding-size=768 \
  --batch-size=32 \
  --learning-rate=5e-5 \
  --distributed=True \

Start TensorBoard through the command line.

$ tensorboard --logdir /log

After pretraining your model, you can generate sequences by giving some context to model. Open this notebook and load the pretrained model and pass context to model it will return the generated sequence.

$ sequence_generator.ipynb


1. Parallel Preprocessing.
2. Shared weights across layers.
3. Factorized embedding.
4. Fine-Tuning wrapper.



  • Your issues and PRs are always welcome.



Computation Graph of GPT-2 Model.

Decoder Graph


Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (51,899
tensorflow (2,131
nlp (1,060
transformer (173
tensorflow2 (91
text-generation (47
gpt-2 (40
openai (29
implementation (23
gpt2 (21
gpt (18

Find Open Source By Browsing 7,000 Topics Across 59 Categories