Skip to content

Commit df44f34

Browse files
committed
docs: update CLAUDE.md
1 parent 1fc7d16 commit df44f34

8 files changed

Lines changed: 661 additions & 453 deletions

File tree

CLAUDE.md

Lines changed: 83 additions & 453 deletions
Large diffs are not rendered by default.

docs/architecture.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Architecture
2+
3+
```
4+
┌──────────────────────────────────────────────────────────┐
5+
│ TypeScript layer — 6 packages │
6+
│ @mlx-node/lm Inference, ChatSession, streaming │
7+
│ @mlx-node/trl GRPO/SFT training, datasets │
8+
│ @mlx-node/vlm VLM, OCR, document pipelines │
9+
│ @mlx-node/server HTTP server (/v1/responses, /v1/messages)│
10+
│ @mlx-node/cli mlx download, mlx convert, mlx launch │
11+
│ @mlx-node/core Native addon (NAPI bindings) │
12+
├──────────────────────────────────────────────────────────┤
13+
│ Rust compute layer — 5 workspace crates │
14+
│ mlx-core Models, training, ops, vision (all NAPI) │
15+
│ mlx-paged-attn PagedAttention + Metal kernels │
16+
│ mlx-sys Low-level MLX FFI bridge (cpp + headers) │
17+
│ mlx-db SQLite training persistence │
18+
│ mlx-tui mlx-train Ratatui binary (no library deps)│
19+
├──────────────────────────────────────────────────────────┤
20+
│ C++ bridge → Compiled forward paths │
21+
│ ~300 FFI declarations, compiled decode via mlx::compile │
22+
├──────────────────────────────────────────────────────────┤
23+
│ MLX → Metal / Accelerate GPU backend │
24+
└──────────────────────────────────────────────────────────┘
25+
```
26+
27+
## Package dependency chain
28+
29+
```
30+
@mlx-node/core (Rust/NAPI native addon)
31+
├── @mlx-node/lm inference, models, streaming, tools, profiling
32+
│ ├── @mlx-node/trl training (GRPO, SFT, datasets, rewards)
33+
│ ├── @mlx-node/vlm vision (VLM, OCR, document pipeline)
34+
│ └── @mlx-node/server HTTP server (SessionRegistry, /v1/* endpoints)
35+
└── @mlx-node/cli depends on core + lm + server
36+
```
37+
38+
`mlx-tui` is the workspace binary crate (Ratatui-based `mlx-train` TUI) — it's a workspace member but no other crate depends on it, so it's built separately via `cargo build -p mlx-tui`. `@mlx-node/internal-tools` lives in root `devDependencies` and is not part of the runtime chain.
39+
40+
## Repository layout
41+
42+
```
43+
mlx-node/
44+
├── Cargo.toml workspace manifest (5 crates)
45+
├── package.json npm workspaces (6 packages + examples)
46+
├── vite.config.ts Vitest + Oxlint + Oxfmt config
47+
├── tsconfig.json TypeScript project references
48+
49+
├── crates/
50+
│ ├── mlx-sys/ MLX C/C++ FFI bridge — see ffi-cpp.md
51+
│ ├── mlx-core/ All NAPI exports: models, training, ops, vision
52+
│ ├── mlx-paged-attn/ PagedAttention + Metal shaders — see paged-cache.md
53+
│ ├── mlx-db/ SQLite training persistence
54+
│ └── mlx-tui/ mlx-train Ratatui binary (standalone)
55+
56+
├── packages/
57+
│ ├── core/ @mlx-node/core (native addon + .d.cts)
58+
│ ├── lm/ @mlx-node/lm
59+
│ │ └── src/
60+
│ │ ├── chat-session.ts ChatSession<M> cross-model wrapper
61+
│ │ ├── stream.ts Session-aware models + callback→AsyncGenerator bridge
62+
│ │ ├── profiling.ts JS profiling API
63+
│ │ ├── models/ loadModel, loadSession, configs
64+
│ │ └── tools/ Tool definition types
65+
│ ├── trl/ @mlx-node/trl (trainers/, data/, utils/)
66+
│ ├── vlm/ @mlx-node/vlm (models/, pipeline/)
67+
│ ├── server/ @mlx-node/server
68+
│ │ └── src/
69+
│ │ ├── endpoints/ /v1/responses, /v1/messages
70+
│ │ └── session-registry.ts SessionRegistry — owns ChatSession lifetimes
71+
│ └── cli/ @mlx-node/cli — see cli.md
72+
73+
├── __test__/ TypeScript tests
74+
└── examples/ lm.ts, vlm-inference.ts, paddle-ocr-pipeline.ts, tool-use-example.ts, grpo/, sft/
75+
```
76+
77+
## Build flow
78+
79+
| Command | Output |
80+
| ---------------------------------- | ---------------------------------------------------------------------------------------------- |
81+
| `yarn build` | `yarn build:native && yarn build:ts` |
82+
| `yarn build:native` | `packages/core/index.cjs`, `mlx-core.darwin-arm64.node`, `mlx.metallib`, `paged_attn.metallib` |
83+
| `yarn build:ts` | `packages/*/dist/` via `tsc -b` (project references) |
84+
| `yarn typecheck` | TypeScript type-check only |
85+
| `cargo build --release -p mlx-tui` | `mlx-train` TUI binary |
86+
87+
`yarn build:native` is the **canonical native build** — runs the napi-rs pipeline through `packages/core/build.ts` (executed via `oxnode`). Running `cargo build` directly does **not** produce the `.node` addon.
88+
89+
## Adding a new native operation
90+
91+
1. Add FFI declaration in `crates/mlx-sys/src/lib.rs`.
92+
2. Add C++ bridge function in the appropriate `crates/mlx-sys/src/mlx_*.cpp` file (see [ffi-cpp.md](ffi-cpp.md) for which file owns what).
93+
3. Add a Rust wrapper in `crates/mlx-core/src/` with `#[napi]` exports.
94+
4. Run `yarn build:native` to regenerate NAPI bindings and `packages/core/index.d.cts`.
95+
5. Add tests using TypedArray helpers.
96+
97+
If you added a **new** `.cpp` file, run `rm -rf target/release/build/mlx-sys-*` once — the `cc` crate caches the source-file list across builds and won't pick up new files otherwise.
98+
99+
## Adding a TypeScript utility
100+
101+
1. Pick the package by responsibility: `lm` (inference), `trl` (training), `vlm` (vision), `server` (HTTP), `cli` (CLI).
102+
2. Add to `packages/<pkg>/src/`, export from `packages/<pkg>/src/index.ts`.
103+
3. Run `yarn build:ts && yarn typecheck`.

docs/cli.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# CLI (`@mlx-node/cli`)
2+
3+
The `mlx` binary is built from `packages/cli/` and exposes three top-level commands: `download`, `convert`, and `launch`.
4+
5+
## `mlx download`
6+
7+
### Models
8+
9+
```bash
10+
mlx download model --model Qwen/Qwen3-0.6B
11+
```
12+
13+
| Flag | Default | Purpose |
14+
| ---------------- | ----------------- | ------------------------------------------------------ |
15+
| `-m`, `--model` | `Qwen/Qwen3-0.6B` | HuggingFace model id |
16+
| `-g`, `--glob` || Filename pattern filter (download only matching files) |
17+
| `--set-token` || Store HuggingFace credentials |
18+
| `-o`, `--output` || Output directory |
19+
20+
### Datasets
21+
22+
```bash
23+
mlx download dataset
24+
```
25+
26+
Default dataset: `openai/gsm8k`. Parquet inputs are automatically converted to JSONL via `convertParquetToJsonl()`.
27+
28+
| Flag | Default | Purpose |
29+
| ------------------ | -------------- | ---------------------- |
30+
| `-d`, `--dataset` | `openai/gsm8k` | HuggingFace dataset id |
31+
| `-r`, `--revision` || Dataset revision |
32+
| `-o`, `--output` || Output directory |
33+
34+
## `mlx convert`
35+
36+
The convert command uses `--input` / `--output` (not `--model`).
37+
38+
### Dtype conversion
39+
40+
```bash
41+
mlx convert --input ./model --output ./model-bf16 --dtype bf16
42+
```
43+
44+
### Quantization (affine, default)
45+
46+
```bash
47+
mlx convert --input ./model --output ./model-q --quantize --q-recipe mixed_4_6
48+
```
49+
50+
| Flag | Purpose |
51+
| ------------------ | ------------------------------------------------------------------------------- |
52+
| `-i`, `--input` | Source model directory (required) |
53+
| `-o`, `--output` | Output directory (required) |
54+
| `-d`, `--dtype` | Target dtype: `float32` / `float16` / `bfloat16` |
55+
| `-q`, `--quantize` | Enable quantization |
56+
| `--q-recipe` | One of `mixed_2_6`, `mixed_3_4`, `mixed_3_6`, `mixed_4_6`, `qwen3_5`, `unsloth` |
57+
| `--q-mode` | `affine` (default) or `mxfp8` |
58+
| `--imatrix-path` | Path to imatrix file for AWQ pre-scaling |
59+
| `--mmproj` | Vision-encoder conversion path |
60+
| `-v`, `--verbose` | Verbose logging |
61+
62+
### GGUF → SafeTensors
63+
64+
```bash
65+
mlx convert --input ./model.gguf --output ./model-mlx
66+
```
67+
68+
Auto-detected by the `.gguf` extension. Supports BF16, F16, F32, Q4_0, Q4_1, Q8_0 source quantization types.
69+
70+
### Model-type auto-detection
71+
72+
The converter auto-detects model families and applies family-specific sanitization passes:
73+
74+
- `qwen3_5`, `qwen3_5_moe`
75+
- `gemma4`
76+
- `paddleocr-vl`, `qianfan-ocr`
77+
- `pp-lcnet-ori`, `uvdoc`
78+
79+
Sharded models are also supported (parses `model.safetensors.index.json`).
80+
81+
Foreign weight formats: Paddle `.pdiparams`, PyTorch `.pkl`.
82+
83+
## `mlx launch claude`
84+
85+
Launches the local `@mlx-node/server` and spawns Claude Code against it — the entry point for using MLX-Node as a Claude Code backend. The "serve" terminology in commit messages refers to internal server components only; there is no `mlx serve` command.

docs/ffi-cpp.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# C++ FFI bridge
2+
3+
The bridge between MLX (C++) and the NAPI/Rust layer lives in `crates/mlx-sys/`. The Rust side declares the FFI surface in `lib.rs`; the C++ side implements each declaration across topical `.cpp` files compiled by the `cc` crate.
4+
5+
## File inventory
6+
7+
`crates/mlx-sys/src/`:
8+
9+
| File | Purpose |
10+
| ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
11+
| `mlx_array_ops.cpp` | Array construction, arithmetic, indexing, dtype-safe scalar ops |
12+
| `mlx_advanced_ops.cpp` | quantized_matmul, gather_qmm, conv2d, FP8 dequant, PaddleOCR forward |
13+
| `mlx_nn_ops.cpp` | NN ops, data extraction, random, math |
14+
| `mlx_fused_ops.cpp` | Fused SwiGLU MLP and supporting ops |
15+
| `mlx_misc_ops.cpp` | Synchronization, compiled sampling helpers |
16+
| `mlx_stream.cpp` | Stream/device management, memory limits |
17+
| `mlx_autograd.cpp` | `value_and_grad` integration |
18+
| `mlx_gated_delta.cpp` | Metal GDN kernel opaque handles and shader indexing |
19+
| `mlx_qwen35.cpp` | Compiled Qwen3.5 dense forward (uses `mlx::core::compile`) |
20+
| `mlx_qwen35_moe.cpp` | Compiled Qwen3.5 MoE forward with expert routing (uses `mlx::core::compile`) |
21+
| `mlx_qwen35_vlm.cpp` | Qwen3.5 VLM prefill — runs the full LM forward over text+vision embeddings and stores caches; the compiled decode path then resumes from those caches |
22+
| `mlx_qwen35_common.h` | Shared compiled-forward helpers — linear_proj, attn, GDN, RoPE |
23+
| `mlx_common.h` | FFI macros, error handling, array conversion |
24+
| `mlx_common_weights.cpp` | Common weight storage for compiled forward passes |
25+
| `mlx_paged_dispatch.cpp` | C++ paged-attention kernel dispatch |
26+
| `mlx_paged_ops.cpp` | `PagedKVWrite` / `PagedAttention` custom MLX ops (largest file in the bridge) |
27+
| `mlx_paged_profile.cpp` | Profile-run helpers for auto-sizing the block pool |
28+
29+
`crates/mlx-sys/src/lib.rs` is the FFI declaration root (~300 `pub fn` wrappers around `unsafe extern "C-unwind"` blocks).
30+
31+
## Compiled forward paths
32+
33+
Qwen3.5 dense + MoE decode use `mlx::core::compile` to cache the forward graph: trace once, reuse via `compile_replace`. Key design points:
34+
35+
- Pre-allocated KV caches passed in as compile inputs
36+
- `fast::rope` invoked with an array-valued offset
37+
- `slice_update` invoked with an array start index
38+
- Path only enabled when `mlx_qwen35_weight_count() > 0`
39+
40+
```
41+
mlx_qwen35.cpp dense compiled decode (mlx::core::compile)
42+
mlx_qwen35_moe.cpp MoE compiled decode + expert routing (mlx::core::compile)
43+
mlx_qwen35_vlm.cpp VLM prefill — stores caches that the compiled decode path resumes from
44+
mlx_qwen35_common.h shared helpers (linear_proj, attn, GDN, RoPE)
45+
```
46+
47+
### Pitfalls
48+
49+
- `mlx::core::array` has **no default constructor** — initialize via `mlx_array_from_scalar(...)` or other helpers.
50+
- `int32` is not in scope inside inner namespaces — use `mlx::core::int32`.
51+
- Adding a **new** `.cpp` file requires `rm -rf target/release/build/mlx-sys-*` once; the `cc` crate caches its source-file list across builds and won't pick up new files otherwise.
52+
53+
### Env vars
54+
55+
| Var | Effect |
56+
| ----------------------- | ----------------------------------------------------------------------- |
57+
| `MLX_NO_COMPILE=1` | Disables the compiled forward path; falls back to per-step Rust forward |
58+
| `MLX_EVAL_ALL_CACHES=1` | Reverts to eval-all-caches strategy (vs. the default token-only eval) |
59+
60+
## Process-wide globals
61+
62+
Compiled paths use process-wide globals in `crates/mlx-core/src/models/qwen3_5/model.rs`:
63+
64+
- `DENSE_COMPILED_MUTEX: std::sync::Mutex<()>` — serializes dense compiled-path access
65+
- `COMPILED_WEIGHTS_RWLOCK: std::sync::RwLock<()>` — read locks during compiled forward, write locks during weight load
66+
67+
The paged-cache code path bypasses both locks entirely (see [paged-cache.md](paged-cache.md) for the compile-lockout contract).
68+
69+
## Metal shaders
70+
71+
`crates/mlx-paged-attn/metal/`:
72+
73+
| File | Purpose |
74+
| --------------------------------- | ------------------------------------- |
75+
| `attention/paged_attention.metal` | Paged-attention attention kernel |
76+
| `cache/reshape_and_cache.metal` | KV cache reshape operations |
77+
| `cache/copy_blocks.metal` | Block copy for paged cache management |
78+
| `float8.metal` | FP8 type conversions and helpers |
79+
| `utils.metal` | Common Metal utilities |
80+
81+
`crates/mlx-sys/build.rs` compiles `.metal` sources into `paged_attn.metallib` and copies both `paged_attn.metallib` and `mlx.metallib` into `target/<profile>/` and `target/<profile>/deps/` so integration tests discover them.

0 commit comments

Comments
 (0)