LLM Training From Scratch | GPT-2 Tutorial
Posts
Tags
Archives
Search
Building LLMs From Scratch
From 150MB to 12GB datasets. GPT-2 Small trained on 2.8B tokens. 134M parameters. 7x optimization speedup. Every lesson documented.
Read the Journey
View on GitHub
GPUburnout
Will Code for Tokens
134M
Params
2.8B
Tokens
7x
Speedup