Skip to content

RaidLZ/microGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

microGPT-rs

The most atomic way to train and run inference for a GPT — in pure Rust.

A faithful Rust port of Andrej Karpathy's microgpt.py, the dependency-free Python implementation of a GPT model. This file is the complete algorithm. Everything else is just efficiency.

What is this?

This is a single-file, from-scratch implementation of a GPT language model in Rust, including:

  • 🧠 Custom Autograd Engine — arena-based computation graph with reverse-mode automatic differentiation
  • 🔢 Character-level Tokenizer — maps characters to token IDs with a special BOS (Beginning of Sequence) token
  • 🏗️ GPT-2 Architecture — multi-head self-attention, MLP blocks, RMSNorm, residual connections, and KV-cache
  • Adam Optimizer — with bias correction and linear learning rate decay
  • 🎲 Temperature-controlled Inference — autoregressive text generation with weighted sampling

The model trains on a dataset of ~32K names and learns to generate new, plausible-sounding names from scratch.

Architecture

GPT-2 (simplified) with:
├── Token Embedding       (vocab_size × n_embd)
├── Position Embedding    (block_size × n_embd)
├── RMSNorm (initial)
├── Transformer Block × n_layer
│   ├── Multi-Head Attention
│   │   ├── RMSNorm
│   │   ├── Q, K, V projections
│   │   ├── Scaled dot-product attention
│   │   ├── Output projection
│   │   └── Residual connection
│   └── MLP
│       ├── RMSNorm
│       ├── FC1 (n_embd → 4×n_embd) + ReLU
│       ├── FC2 (4×n_embd → n_embd)
│       └── Residual connection
└── LM Head              (vocab_size × n_embd)

Hyperparameters (default)

Parameter Value
n_layer 1
n_embd 16
block_size 16
n_head 4
head_dim 4
num_steps 1000
learning_rate 0.01
temperature 0.5

Quick Start

Prerequisites

  • Rust (stable toolchain)

Build & Run

# Debug build (fast compile, slower execution)
cargo run

# Release build (slower compile, much faster execution — recommended!)
cargo run --release

The program will:

  1. Download the names dataset (~200KB) on first run
  2. Train for 1000 steps, printing the loss
  3. Generate 20 hallucinated names

Example Output

num docs: 32033
vocab size: 27
num params: 7451
step 1000 / 1000 | loss 2.1234
--- inference (new, hallucinated names) ---
sample  1: mara
sample  2: joline
sample  3: kaden
...

Performance

The Rust version uses an arena-based autograd tape instead of Python's reference-counted Value objects, providing:

  • No Rc/RefCell overhead — all values are stored contiguously in a Vec
  • Cache-friendly memory layout — sequential access patterns
  • Iterative topological sort — no recursion depth limits
  • Pre-allocated tape capacity — minimizes heap allocations during training

For maximum performance, always use cargo run --release which enables LTO and single codegen unit.

Differences from the Python Version

Aspect Python Rust
Autograd Value with Rc-like semantics Arena-based Tape with index handles
Memory GC-managed Pre-allocated contiguous vectors
Backward Recursive topo sort Iterative DFS (stack-safe)
RNG random.gauss rand_distr::Normal
HTTP urllib curl / powershell (no deps)

Project Structure

microGPT/
├── Cargo.toml          # Rust project manifest
├── src/
│   └── main.rs         # The complete algorithm (single file)
├── microgpt.py         # Original Python version by @karpathy
├── README.md
├── LICENSE
└── .gitignore

Credits

License

MIT License — see LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors