Hacker News | MiniMind: End-to-end GPT-style LLM training pipeline in pure PyTorch

Hi HN,

I recently spent some time going through MiniMind, and it’s a remarkably clean resource for understanding the modern LLM stack under the hood. It’s a minimal, end-to-end implementation of a ~25M parameter GPT-style model in pure PyTorch, designed to be trained from scratch on a single GPU

Instead of heavy abstractions, it uses straightforward PyTorch while still implementing modern architectural choices like RMSNorm, SwiGLU, RoPE, and even MoE variants. What makes it valuable is that it doesn't stop at the forward pass; the repo covers the entire training lifecycle. You can trace the data flow from tokenizer training and pretraining, right through to Supervised Fine-Tuning (SFT), LoRA, preference optimization (DPO/PPO), and distillation

It’s small enough to actually read the source code end-to-end, but realistic enough to serve as a baseline for architectural experiments rather than just a toy example.

Curious if anyone here has used this (or similar minimal codebases) to test custom architecture modifications or train highly specialized small-scale models

I'm currently testing the pipeline locally on a PC with an RTX 4060, and it's a great fit for this kind of hardware

MiniMind: End-to-end GPT-style LLM training pipeline in pure PyTorch(github.com/jingyaogong)