The Geometry of Thought
August 18, 2025
My journey and reflections from building an LLM from scratch.
Table of Contents
Chapter 0: Intro
I’ve always been fascinated by machine learning — its ability to process, to recognize, to almost understand.
When ChatGPT 3 first came out, I stayed up all night experimenting with it. I asked absurd questions. I tried to jailbreak it. I tested its limits by bouncing between the dumbest and the smartest prompts I could think of.
Somewhere in the middle of that late-night frenzy, I had a realization: This wasn’t just a neat tool.
This was power. Profound power.
For the first time in human history, Intelligence itself was about to become a cheap resource at our fingertips.
And I thought: if that’s true, I need to understand it at the deepest level.
That’s what led me to build a large language model from scratch.
It’s not just a toy model, but something that mirrored GPT 2 in structure and scale — even though I didn’t have the luxury of racks of GPUs or a billion-dollar budget. I wanted to walk through the entire process: from raw text, to tokens, to embeddings, to self-attention blocks, to MLPs, to training loops and instruction fine-tuning.
In the end, my model wasn’t fully trained, but that wasn’t the point. The point was the journey — and the surprising lessons that surfaced along the way.
This blog isn’t a tutorial. There are already plenty of excellent resources online for the “how.” Instead, this is the story of my journey: what I built, what surprised me, and the bigger questions that emerged along the way.
Chapter 1: Embeddings Are Scary
Read Chapter 1: Embeddings Are Scary