microGPT-rs

The most atomic way to train and run inference for a GPT — in pure Rust.

A faithful Rust port of Andrej Karpathy's microgpt.py, the dependency-free Python implementation of a GPT model. This file is the complete algorithm. Everything else is just efficiency.

What is this?

This is a single-file, from-scratch implementation of a GPT language model in Rust, including:

🧠 Custom Autograd Engine — arena-based computation graph with reverse-mode automatic differentiation
🔢 Character-level Tokenizer — maps characters to token IDs with a special BOS (Beginning of Sequence) token
🏗️ GPT-2 Architecture — multi-head self-attention, MLP blocks, RMSNorm, residual connections, and KV-cache
⚡ Adam Optimizer — with bias correction and linear learning rate decay
🎲 Temperature-controlled Inference — autoregressive text generation with weighted sampling

The model trains on a dataset of ~32K names and learns to generate new, plausible-sounding names from scratch.

Architecture

GPT-2 (simplified) with:
├── Token Embedding       (vocab_size × n_embd)
├── Position Embedding    (block_size × n_embd)
├── RMSNorm (initial)
├── Transformer Block × n_layer
│   ├── Multi-Head Attention
│   │   ├── RMSNorm
│   │   ├── Q, K, V projections
│   │   ├── Scaled dot-product attention
│   │   ├── Output projection
│   │   └── Residual connection
│   └── MLP
│       ├── RMSNorm
│       ├── FC1 (n_embd → 4×n_embd) + ReLU
│       ├── FC2 (4×n_embd → n_embd)
│       └── Residual connection
└── LM Head              (vocab_size × n_embd)

Hyperparameters (default)

Parameter	Value
`n_layer`	1
`n_embd`	16
`block_size`	16
`n_head`	4
`head_dim`	4
`num_steps`	1000
`learning_rate`	0.01
`temperature`	0.5

Quick Start

Prerequisites

Rust (stable toolchain)

Build & Run

# Debug build (fast compile, slower execution)
cargo run

# Release build (slower compile, much faster execution — recommended!)
cargo run --release

The program will:

Download the names dataset (~200KB) on first run
Train for 1000 steps, printing the loss
Generate 20 hallucinated names

Example Output

num docs: 32033
vocab size: 27
num params: 7451
step 1000 / 1000 | loss 2.1234
--- inference (new, hallucinated names) ---
sample  1: mara
sample  2: joline
sample  3: kaden
...

Performance

The Rust version uses an arena-based autograd tape instead of Python's reference-counted Value objects, providing:

No Rc/RefCell overhead — all values are stored contiguously in a Vec
Cache-friendly memory layout — sequential access patterns
Iterative topological sort — no recursion depth limits
Pre-allocated tape capacity — minimizes heap allocations during training

For maximum performance, always use cargo run --release which enables LTO and single codegen unit.

Differences from the Python Version

Aspect	Python	Rust
Autograd	`Value` with `Rc`-like semantics	Arena-based `Tape` with index handles
Memory	GC-managed	Pre-allocated contiguous vectors
Backward	Recursive topo sort	Iterative DFS (stack-safe)
RNG	`random.gauss`	`rand_distr::Normal`
HTTP	`urllib`	`curl` / `powershell` (no deps)

Project Structure

microGPT/
├── Cargo.toml          # Rust project manifest
├── src/
│   └── main.rs         # The complete algorithm (single file)
├── microgpt.py         # Original Python version by @karpathy
├── README.md
├── LICENSE
└── .gitignore

Credits

Original Python implementation by Andrej Karpathy — microgpt.py
Rust port — this repository

License

MIT License — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

microGPT-rs

What is this?

Architecture

Hyperparameters (default)

Quick Start

Prerequisites

Build & Run

Example Output

Performance

Differences from the Python Version

Project Structure

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
microgpt.py		microgpt.py

Folders and files

Latest commit

History

Repository files navigation

microGPT-rs

What is this?

Architecture

Hyperparameters (default)

Quick Start

Prerequisites

Build & Run

Example Output

Performance

Differences from the Python Version

Project Structure

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages