• dmonterocrespo 2 hours ago
    Hi HN,

    I recently spent some time going through MiniMind, and it’s a remarkably clean resource for understanding the modern LLM stack under the hood. It’s a minimal, end-to-end implementation of a ~25M parameter GPT-style model in pure PyTorch, designed to be trained from scratch on a single GPU

    Instead of heavy abstractions, it uses straightforward PyTorch while still implementing modern architectural choices like RMSNorm, SwiGLU, RoPE, and even MoE variants. What makes it valuable is that it doesn't stop at the forward pass; the repo covers the entire training lifecycle. You can trace the data flow from tokenizer training and pretraining, right through to Supervised Fine-Tuning (SFT), LoRA, preference optimization (DPO/PPO), and distillation

    It’s small enough to actually read the source code end-to-end, but realistic enough to serve as a baseline for architectural experiments rather than just a toy example.

    Curious if anyone here has used this (or similar minimal codebases) to test custom architecture modifications or train highly specialized small-scale models

    I'm currently testing the pipeline locally on a PC with an RTX 4060, and it's a great fit for this kind of hardware