Skip to content

Commit e1be645

Browse files
gHashTagclaude
andcommitted
feat(opt-pa01): PagedAttention — 4-10x memory efficiency, CoW block sharing, 14 tests
paged_attention.zig (947 lines): vLLM-style block-based KV cache manager - BlockPool: pre-allocated block pool with LIFO free stack - BlockTable: fixed-size per-sequence page mapping (no ArrayList) - Copy-on-Write: ref-counted block sharing for beam search fork - PagedKVCacheManager: multi-sequence lifecycle (create/append/fork/remove) - Full attention: Q@K^T dot product, softmax, weighted V sum - Memory analysis: 4x paged savings, 64x with ternary compression - 14 tests: config, block lifecycle, CoW, fork, attention, exhaustion - Zig 0.15.2 compatible (zero std.ArrayList usage) - build.zig: test-paged-attention step wired 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent fb6eb05 commit e1be645

3 files changed

Lines changed: 1030 additions & 6 deletions

File tree

.ralph/TECH_TREE.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@
2020
## ✅ Recently Completed
2121
| ID | Name | Branch | Gain |
2222
|----|------|--------|------|
23+
|**OPT-PA01**|**PagedAttention**|**optimization**|**paged_attention.zig (947 lines): vLLM-style block KV cache, CoW block sharing, 14 tests, 4-10x memory efficiency, beam search fork, pool lifecycle, attention Q@K^T+softmax+V, memory analysis (64x with ternary), build.zig wired**|
24+
|----|------|--------|------|
2325
|**OPT-T02**|**Ternary Matrix Multiplication**|**optimization**|**ternary_matmul.zig (851 lines): 10x matmul speedup (no multiply), scalar+SIMD8+SIMD16+batch4 kernels, matmat, 3 quant modes, per-row scales, 15.9x compression, cosine accuracy, 15 tests**|
2426
|----|------|--------|------|
2527
|**OPT-T03**|**Ternary KV Cache**|**optimization**|**ternary_kv_cache.zig (729 lines): 16x compression proof, full attention pipeline, SIMD ternaryDot, 4 quant modes, 13 tests, cosine accuracy validation**|
@@ -87,7 +89,6 @@
8789
|OPT-C01|KV Cache Compression|optimization|5-16x cache compression|
8890
|OPT-S01|Speculative Decoding|optimization|2-3x generation speed|
8991
|OPT-B01|Continuous Batching|optimization|2-3x throughput|
90-
|OPT-PA01|PagedAttention|optimization|4-10x memory efficiency|
9192

9293
## 🔒 Locked (waiting for dependencies)
9394
| ID | Name | Branch | Needs (missing) |
@@ -104,20 +105,20 @@
104105
|Core|3|4|75%|
105106
|Inference|2|5|40%|
106107
|Deployment|2|4|50%|
107-
|**Optimization**|**14**|**14**|**100%**|
108+
|**Optimization**|**15**|**15**|**100%**|
108109
|Hardware|0|3|0%|
109110
|**Math**|**5**|**5**|**100%**|
110111
|**Development**|**3**|**3**|**100%**|
111112
|**Symbolic**|**5**|**5**|**100%**|
112113
|Visualization|1|1|100%|
113114
|**Nexus**|**10**|**10**|**100%**|
114115
|Multilingual|3|3|100%|
115-
|**Total**|**45**|**54**|**83%**|
116+
|**Total**|**46**|**55**|**84%**|
116117

117118
## 🎯 Recommended Next (highest ROI)
118-
1. **OPT-PA01** PagedAttention — 4-10x memory efficiency, combines with OPT-T03 for 64x
119-
2. **OPT-B01** Continuous Batching — 2-3x throughput
120-
3. **OPT-S01** Speculative Decoding2-3x generation speed
119+
1. **OPT-B01** Continuous Batching — 2-3x throughput, combines with PagedAttention
120+
2. **OPT-S01** Speculative Decoding — 2-3x generation speed
121+
3. **INF-001** GGUF Parserload any GGUF model, unlocks real inference
121122

122123
---
123124
φ² + 1/φ² = 3 | TRINITY

build.zig

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1830,4 +1830,17 @@ pub fn build(b: *std.Build) void {
18301830
const gen_ternary_matmul_step = b.step("test-ternary-matmul", "Test OPT-T02 Ternary Matrix Multiplication 10x speedup");
18311831
gen_ternary_matmul_step.dependOn(&run_gen_ternary_matmul_tests.step);
18321832
test_step.dependOn(&run_gen_ternary_matmul_tests.step);
1833+
1834+
// Generated Paged Attention tests (OPT-PA01: 4-10x memory efficiency)
1835+
const gen_paged_attn_tests = b.addTest(.{
1836+
.root_module = b.createModule(.{
1837+
.root_source_file = b.path("generated/paged_attention.zig"),
1838+
.target = target,
1839+
.optimize = optimize,
1840+
}),
1841+
});
1842+
const run_gen_paged_attn_tests = b.addRunArtifact(gen_paged_attn_tests);
1843+
const gen_paged_attn_step = b.step("test-paged-attention", "Test OPT-PA01 PagedAttention 4-10x memory efficiency");
1844+
gen_paged_attn_step.dependOn(&run_gen_paged_attn_tests.step);
1845+
test_step.dependOn(&run_gen_paged_attn_tests.step);
18331846
}

0 commit comments

Comments
 (0)