Building LLMs From Scratch

From 150MB to 12GB datasets. GPT-2 Small trained on 2.8B tokens. 134M parameters. 7x optimization speedup. Every lesson documented.