Research Notes | Transformer from Scratch

Posted on October 27, 2023December 11, 2023 by David Yang

Overview

This post aims to implements the the transformer model and its variant from scratch. It is based on the following posts:

The Annotated Transformer (Harvard NLP)
The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time.
GitHub – karpathy/minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training: Anderj Karpathy also creates a 2-hour video describing the process he creates the model.

GitHub – karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.: This is the optimized version of minGPT that is able to reproduce some of the mid-sized models, including a 1.3B GPT-2. Pretraining a 124M GPT-2 took 4 days on 8 A100 GPUs (each 40 GB).
GitHub – nlp-with-transformers/notebooks: Jupyter notebooks for the Natural Language Processing with Transformers book