A PyTorch re-implementation of GPT, both training and inference. minGPT tries to be small, clean, interpretable and educational, as most of the currently available GPT model implementations can a bit sprawling. GPT is not a complicated model and this implementation is appropriately about 300 lines of code (see mingpt/model.py). All that's going on is that a sequence of indices feeds into a Transformer, and a probability distribution over the next index in the sequence comes out. The majority of the complexity is just being clever with batching (both across examples and over sequence length) for efficiency.
note (Jan 2023): though I may continue to accept and change some details, minGPT is in a semi-archived state. For more recent developments see my rewrite nanoGPT. Basically, minGPT became referenced across a wide variety of places (notebooks, blogs, courses, books, etc.) which made me less willing to make the bigger changes I wanted to make to move the code forward. I also wanted to change the direction a bit, from a sole focus on education to something that is still simple and hackable but has teeth (reproduces medium-sized industry benchmarks, accepts some tradeoffs to gain runtime efficiency, etc).
The minGPT library is three files: mingpt/model.py contains the actual Transformer model definition, mingpt/bpe.py contains a mildly refactored Byte Pair Encoder that translates between text and sequences of integers exactly like OpenAI did in GPT, mingpt/trainer.py is (GPT-independent) PyTorch boilerplate code that trains the model. Then there are a number of demos and projects that use the library in the
projects/addertrains a GPT from scratch to add numbers (inspired by the addition section in the GPT-3 paper)
projects/chargpttrains a GPT to be a character-level language model on some input text file
demo.ipynbshows a minimal usage of the
Trainerin a notebook format on a simple sorting example
generate.ipynbshows how one can load a pretrained GPT2 and generate text given some prompt
If you want to
import mingpt into your project:
git clone https://github.com/karpathy/minGPT.git cd minGPT pip install -e .
Here's how you'd instantiate a GPT-2 (124M param version):
from mingpt.model import GPT model_config = GPT.get_default_config() model_config.model_type = 'gpt2' model_config.vocab_size = 50257 # openai's model vocabulary model_config.block_size = 1024 # openai's model block_size (i.e. input context length) model = GPT(model_config)
And here's how you'd train it:
# your subclass of torch.utils.data.Dataset that emits example # torch LongTensor of lengths up to 1024, with integers from [0,50257) train_dataset = YourDataset() from mingpt.trainer import Trainer train_config = Trainer.get_default_config() train_config.learning_rate = 5e-4 # many possible options, see the file train_config.max_iters = 1000 train_config.batch_size = 32 trainer = Trainer(train_config, model, train_dataset) trainer.run()
demo.ipynb for a more concrete example.
Coverage is not super amazing just yet but:
python -m unittest discover tests
Papers + some implementation notes: