Research Notes | Transformer from Scratch

Overview

This post aims to implements the the transformer model and its variant from scratch. It is based on the following posts:

  1. The Annotated Transformer (Harvard NLP)
  2. The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time.
  3. GitHub – karpathy/minGPT: A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training: Anderj Karpathy also creates a 2-hour video describing the process he creates the model.

    GitHub – karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.: This is the optimized version of minGPT that is able to reproduce some of the mid-sized models, including a 1.3B GPT-2. Pretraining a 124M GPT-2 took 4 days on 8 A100 GPUs (each 40 GB).

  4. GitHub – nlp-with-transformers/notebooks: Jupyter notebooks for the Natural Language Processing with Transformers book