Overview
This post aims to implements the the transformer model and its variant from scratch. It is based on the following posts:
- The Annotated Transformer (Harvard NLP)
- The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time.
-
GitHub – karpathy/minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training: Anderj Karpathy also creates a 2-hour video describing the process he creates the model.
GitHub – karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.: This is the optimized version of
minGPTthat is able to reproduce some of the mid-sized models, including a 1.3B GPT-2. Pretraining a 124M GPT-2 took 4 days on 8 A100 GPUs (each 40 GB). - GitHub – nlp-with-transformers/notebooks: Jupyter notebooks for the Natural Language Processing with Transformers book