Skip to content

Commit b8ef49c

Browse files
unamedkrclaude
andcommitted
Initial release: TurboQuant.cpp v0.1.0
Cross-platform C/C++ library for extreme KV cache compression in LLM inference. Achieves 7.5x memory reduction with 99.5% attention accuracy. Core: - 7 quantization types: PolarQuant (3/4b), QJL (1b), TurboQuant (3/4b), Uniform (2/4b) - Direct attention kernels: QJL Hamming distance, PolarQuant cos/sin LUT - Self-contained block formats with ONNX-compliant LSB-first bit packing - O(1) type traits dispatch table (inspired by llama.cpp) Cache: - Paged KV cache with block-table mapping (inspired by vLLM) - Progressive compression: 3-tier automatic quality degradation by age - Copy-on-Write for beam search (ref_count based) - Value cache quantization (uniform 4-bit) Backends: - CPU Generic (reference C11) - ARM NEON optimized (5.7x speedup over generic) - x86 AVX2 stubs - CUDA kernels (7 files, syntactically complete) - Metal compute shaders (7 files, syntactically complete) Quality: - 11 test suites, 100% pass rate - ASan + UBSan + TSan clean - Roundtrip MSE: 0.0014, Attention cosine: 0.998 - Cross-platform CI (Linux x86_64 + macOS arm64) Performance (Apple M-series): - Quantize: 2.87M elements/ms - Attention: 331K queries/sec - Compression: 7.53x - SIMD speedup: 5.74x Integration: - llama.cpp plugin interface - vLLM integration scaffold - Python ctypes bindings - 4 example programs (standalone, A/B test, real model demo, llama.cpp) Developed by QuantumAI Inc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 parents  commit b8ef49c

102 files changed

Lines changed: 19634 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude/commands/develop.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
description: Autonomous development — implement the next WBS item using the Karpathy loop
3+
argument-hint: Optional specific module to work on (e.g., polar, qjl, foundation)
4+
---
5+
6+
# Develop
7+
8+
Autonomous single-agent development loop following the Karpathy AutoResearch pattern.
9+
10+
## Protocol
11+
12+
You are an autonomous development agent for TurboQuant.cpp.
13+
Follow this loop exactly:
14+
15+
### Step 1: Assess
16+
- Run `bash score.sh --quick` to see current score
17+
- Read `docs/wbs_v0.1.md` to find the next unchecked `- [ ]` item
18+
19+
If the user specified a module ($ARGUMENTS), focus only on WBS items related to that module.
20+
21+
### Step 2: Implement
22+
- Read `program.md` and `CLAUDE.md` for specifications
23+
- Read the relevant reference code in `refs/` before implementing
24+
- Implement the WBS item (create/edit files)
25+
- Follow module ownership rules from CLAUDE.md — only modify files you own
26+
27+
### Step 3: Verify
28+
- Run `bash score.sh --quick`
29+
- If score improved or stayed the same: proceed
30+
- If score dropped: revert your changes and try a different approach
31+
- Ensure all tests pass: `cd build && ctest --output-on-failure`
32+
33+
### Step 4: Commit
34+
- Mark the WBS item as `[x]` in `docs/wbs_v0.1.md`
35+
- Stage only the files you changed (not refs/, not .score_history)
36+
- Commit with a descriptive message
37+
38+
### Step 5: Report
39+
- Show the user: what was implemented, score before → after, next item
40+
41+
### Rules
42+
- ONE WBS item per invocation. Small, correct, incremental.
43+
- Never modify files in `refs/`, `program.md`, or `score.sh`
44+
- Always read reference code before implementing algorithms
45+
- If build fails, fix the build before doing anything else

.claude/commands/harness.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
description: Launch the hierarchical harness (Karpathy loop + ClawTeam parallel agents)
3+
argument-hint: Optional target score (default 0.9) or "single" for single-agent mode
4+
---
5+
6+
# Harness
7+
8+
Launch the full Hierarchical Harness that combines the Karpathy AutoResearch loop with ClawTeam multi-agent parallelism.
9+
10+
## How It Works
11+
12+
The harness has an Outer Loop (you, the Leader) and Inner Loops (spawned workers):
13+
14+
```
15+
You (Leader):
16+
score → identify bottleneck → delegate modules → merge gate → repeat
17+
18+
Workers (in isolated worktrees):
19+
each runs: score → modify own module → score → report back
20+
```
21+
22+
## Execution
23+
24+
### Step 1: Score and assess phase
25+
26+
Run `bash score.sh` and determine the current phase:
27+
28+
| Score | Phase | Action |
29+
|-------|-------|--------|
30+
| < 0.05 | Foundation | YOU do it directly (single agent) |
31+
| 0.05 ~ 0.30 | Core Algorithms | Spawn parallel workers: polar, qjl, uniform |
32+
| 0.30 ~ 0.60 | Advanced | Spawn parallel workers: turbo, cache, simd-neon, bench |
33+
| > 0.60 | Fine-tuning | YOU do it directly (precision matters) |
34+
35+
### Step 2: For Foundation / Fine-tuning phases (single agent)
36+
37+
Use the `/develop` command pattern — implement one WBS item at a time.
38+
39+
### Step 3: For parallel phases, spawn ClawTeam workers
40+
41+
$ARGUMENTS can override the target score (default: 0.9).
42+
43+
```bash
44+
# Create team
45+
clawteam team spawn-team tq-dev -d "TurboQuant.cpp development"
46+
47+
# Spawn workers for each independent module
48+
clawteam spawn --team tq-dev --agent-name polar --workspace --repo . \
49+
--task "Implement PolarQuant in src/core/tq_polar.c. Read refs/PolarQuant/models/modeling_llama_polar.py for algorithm. Write tests/test_polar.cpp. Run bash score.sh --quick after changes. Only modify: src/core/tq_polar.*, tests/test_polar.*"
50+
51+
clawteam spawn --team tq-dev --agent-name qjl --workspace --repo . \
52+
--task "Implement QJL in src/core/tq_qjl.c. Read refs/QJL/models/llama2_utils_qjl.py for algorithm. Write tests/test_qjl.cpp. Run bash score.sh --quick after changes. Only modify: src/core/tq_qjl.*, tests/test_qjl.*"
53+
```
54+
55+
### Step 4: Wait and merge gate
56+
57+
```bash
58+
# Wait for all workers
59+
clawteam task wait tq-dev --timeout 1800
60+
61+
# Merge gate: merge each worker one-by-one
62+
# For each worker branch:
63+
# 1. git merge <branch> --no-edit
64+
# 2. bash score.sh --quick
65+
# 3. If score dropped: git reset --hard HEAD~1
66+
# 4. If score OK: continue
67+
```
68+
69+
### Step 5: Loop back to Step 1
70+
71+
Repeat until the target score is reached.
72+
73+
## Key Rules
74+
75+
- Workers must only modify files in their module ownership (see CLAUDE.md)
76+
- Merge gate ALWAYS checks score after each merge — revert if it drops
77+
- Foundation and fine-tuning phases are always single-agent (safer)
78+
- Monitor workers: `clawteam board attach tq-dev`

.claude/commands/merge-gate.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
description: Merge worker branches one-by-one with score-based accept/reject
3+
argument-hint: Team name (e.g., tq-alg)
4+
---
5+
6+
# Merge Gate
7+
8+
Safely merge completed ClawTeam worker branches into main, reverting any merge that causes a score drop.
9+
10+
## Protocol
11+
12+
The team name is: $ARGUMENTS
13+
14+
If no team name provided, list available branches with `git branch -a | grep clawteam`.
15+
16+
### Step 1: Record baseline score
17+
18+
```bash
19+
bash score.sh --quick
20+
```
21+
22+
Save the score as `baseline_score`.
23+
24+
### Step 2: List worker branches
25+
26+
```bash
27+
git branch -a | grep "clawteam/$ARGUMENTS"
28+
```
29+
30+
### Step 3: For each worker branch, sequentially:
31+
32+
```
33+
a. Save current HEAD:
34+
pre_merge=$(git rev-parse HEAD)
35+
36+
b. Attempt merge:
37+
git merge <branch> --no-edit -m "Merge <worker> results"
38+
39+
c. If merge conflict:
40+
git merge --abort
41+
Report: "<worker> has merge conflicts — skipping"
42+
Continue to next worker
43+
44+
d. Score check:
45+
bash score.sh --quick
46+
new_score=$(cat .score)
47+
48+
e. Decision:
49+
If new_score >= baseline_score:
50+
Report: "<worker> merged OK (score: baseline → new_score)"
51+
Update baseline_score = new_score
52+
Else:
53+
Report: "<worker> REVERTED (score dropped: baseline → new_score)"
54+
git reset --hard $pre_merge
55+
```
56+
57+
### Step 4: Final report
58+
59+
- Run `bash score.sh` (full evaluation)
60+
- Show which workers were merged and which were reverted
61+
- Show final score vs original baseline
62+
- Suggest next action based on new score
63+
64+
### Rules
65+
66+
- ALWAYS merge one worker at a time, never batch
67+
- ALWAYS check score after each merge
68+
- ALWAYS revert if score drops — no exceptions
69+
- Order preference: merge simpler modules first (uniform → polar → qjl → turbo → cache → simd → bench)

.claude/commands/score.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
description: Run the 5-dimension scoring harness and display results
3+
---
4+
5+
# Score
6+
7+
Run the TurboQuant.cpp scoring harness to measure project completeness across 5 dimensions.
8+
9+
## Steps
10+
11+
1. Run `bash score.sh` (full evaluation) using the Bash tool
12+
2. Read the `.score` file for the numeric score
13+
3. Present the results to the user in a clear summary:
14+
- Total score (X.XXXX / 1.0000)
15+
- Each dimension's percentage (structure, correctness, quality, performance, integration)
16+
- The LOWEST scoring dimension (this is the bottleneck)
17+
- Specific items scoring 0 that could be improved next
18+
4. If `.score_history` exists, show the trend (improving/declining/stagnant)
19+
5. Suggest the single highest-impact next action based on the score breakdown

.claude/commands/spawn-team.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
description: Spawn ClawTeam parallel workers for the current development phase
3+
argument-hint: Optional phase override (foundation, algorithms, advanced, finetune)
4+
---
5+
6+
# Spawn Team
7+
8+
Spawn a team of parallel ClawTeam workers, each in an isolated git worktree, to work on independent modules simultaneously.
9+
10+
## Steps
11+
12+
### Step 1: Determine current phase
13+
14+
Run `bash score.sh --quick` and read `.score` to determine the phase.
15+
16+
If user specified a phase ($ARGUMENTS), use that instead.
17+
18+
### Step 2: Spawn workers based on phase
19+
20+
Execute the appropriate clawteam commands:
21+
22+
#### Phase: foundation (score < 0.05)
23+
Do NOT spawn workers. Tell the user: "Foundation phase should be done with `/develop foundation` (single agent). The project needs CMakeLists.txt, headers, and type definitions before parallel work can begin."
24+
25+
#### Phase: algorithms (score 0.05 ~ 0.30)
26+
```bash
27+
clawteam team spawn-team tq-alg -d "TurboQuant core algorithms"
28+
29+
clawteam spawn --team tq-alg --agent-name polar --workspace --repo . \
30+
--task "Implement PolarQuant algorithm. Read CLAUDE.md for full context. Read refs/PolarQuant/models/modeling_llama_polar.py lines 135-157 and refs/PolarQuant/models/kernel4group.py lines 14-81 for the algorithm. Create src/core/tq_polar.c with tq_polar_quantize_ref(), tq_polar_dequantize_ref(), tq_polar_attention_ref(). Create tests/test_polar.cpp with Google Test. Run bash score.sh --quick to verify. ONLY modify: src/core/tq_polar.*, tests/test_polar.*"
31+
32+
clawteam spawn --team tq-alg --agent-name qjl --workspace --repo . \
33+
--task "Implement QJL algorithm. Read CLAUDE.md for full context. Read refs/QJL/models/llama2_utils_qjl.py lines 7-185 for the algorithm. Create src/core/tq_qjl.c with tq_qjl_init_projection(), tq_qjl_quantize_ref(), tq_qjl_detect_outliers(), tq_qjl_attention_ref(). Create tests/test_qjl.cpp with Google Test. Run bash score.sh --quick to verify. ONLY modify: src/core/tq_qjl.*, tests/test_qjl.*"
34+
35+
clawteam spawn --team tq-alg --agent-name uniform --workspace --repo . \
36+
--task "Implement uniform baseline and value quantization. Read CLAUDE.md for full context. Create src/core/tq_uniform.c (min-max 2/4-bit), src/core/tq_value_quant.c (value cache quantization). Create tests/test_uniform.cpp and tests/test_value.cpp. Run bash score.sh --quick to verify. ONLY modify: src/core/tq_uniform.*, src/core/tq_value_quant.*, tests/test_uniform.*, tests/test_value.*"
37+
```
38+
39+
#### Phase: advanced (score 0.30 ~ 0.60)
40+
```bash
41+
clawteam team spawn-team tq-adv -d "TurboQuant advanced features"
42+
43+
clawteam spawn --team tq-adv --agent-name turbo --workspace --repo . \
44+
--task "Implement TurboQuant composite (PolarQuant + QJL). Read CLAUDE.md. Create src/core/tq_turbo.c combining polar stage 1 + qjl residual stage 2. Create tests/test_turbo.cpp. ONLY modify: src/core/tq_turbo.*, tests/test_turbo.*"
45+
46+
clawteam spawn --team tq-adv --agent-name cache --workspace --repo . \
47+
--task "Implement paged cache and progressive compression. Read CLAUDE.md. Read refs/vllm/csrc/cache_kernels.cu for patterns. Create src/cache/tq_paged_cache.c and src/cache/tq_progressive.c with tests. ONLY modify: src/cache/**, tests/test_paged_cache.*, tests/test_progressive.*"
48+
49+
clawteam spawn --team tq-adv --agent-name simd --workspace --repo . \
50+
--task "Implement NEON and AVX2 optimized kernels. Read CLAUDE.md. Read refs/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c for NEON patterns. Create src/backend/cpu/tq_generic.c, tq_neon.c, tq_avx2.c, tq_cpu_dispatch.c. ONLY modify: src/backend/cpu/**"
51+
52+
clawteam spawn --team tq-adv --agent-name bench --workspace --repo . \
53+
--task "Create benchmarks and specs. Read CLAUDE.md. Create bench/tq_bench.cpp (output: quantize_throughput=N, attention_throughput=N, compression_ratio=N, simd_speedup=N). Create bench/tq_quality.cpp (output: roundtrip_mse=N, attention_cosine=N, cross_platform=pass/fail). Create spec/tq_format_v1.md and spec/tq_operators_v1.md. ONLY modify: bench/**, spec/**"
54+
```
55+
56+
#### Phase: finetune (score > 0.60)
57+
Do NOT spawn workers. Tell the user: "Fine-tuning phase is best done with `/develop` (single agent) for precision. Focus on the lowest-scoring dimension."
58+
59+
### Step 3: Monitor
60+
61+
Tell the user how to monitor:
62+
```bash
63+
clawteam board attach <team-name> # Live tmux view
64+
clawteam task list <team-name> # Task status
65+
watch -n 30 bash score.sh --quick # Score tracking
66+
```
67+
68+
### Step 4: After workers complete
69+
70+
Tell the user to run the merge gate:
71+
```bash
72+
# Wait for completion
73+
clawteam task wait <team-name> --timeout 1800
74+
75+
# Then merge each worker's branch one-by-one:
76+
# git merge clawteam/<team>/<worker> --no-edit
77+
# bash score.sh --quick
78+
# If score dropped: git reset --hard HEAD~1
79+
```
80+
81+
Or suggest running `/harness` which automates the merge gate.

.github/workflows/ci.yml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main, develop]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
build-and-test:
11+
strategy:
12+
fail-fast: false
13+
matrix:
14+
include:
15+
- os: ubuntu-latest
16+
arch: x86_64
17+
cmake_extra: ""
18+
- os: macos-latest
19+
arch: arm64
20+
cmake_extra: ""
21+
22+
runs-on: ${{ matrix.os }}
23+
name: ${{ matrix.os }} (${{ matrix.arch }})
24+
25+
steps:
26+
- name: Checkout
27+
uses: actions/checkout@v4
28+
29+
- name: Configure CMake
30+
run: >
31+
cmake -B build
32+
-DCMAKE_BUILD_TYPE=Release
33+
-DTQ_BUILD_TESTS=ON
34+
-DTQ_BUILD_BENCH=ON
35+
${{ matrix.cmake_extra }}
36+
37+
- name: Build
38+
run: cmake --build build --config Release -j$(nproc 2>/dev/null || sysctl -n hw.ncpu)
39+
40+
- name: Run tests
41+
run: ctest --test-dir build --output-on-failure --timeout 120
42+
43+
- name: Upload test results
44+
if: failure()
45+
uses: actions/upload-artifact@v4
46+
with:
47+
name: test-results-${{ matrix.os }}-${{ matrix.arch }}
48+
path: build/Testing/
49+
retention-days: 7

.gitignore

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Build
2+
build/
3+
build-*/
4+
cmake-build-*/
5+
*.o
6+
*.a
7+
*.so
8+
*.dylib
9+
*.dll
10+
11+
# IDE
12+
.vscode/
13+
.idea/
14+
*.xcodeproj/
15+
*.xcworkspace/
16+
compile_commands.json
17+
18+
# OS
19+
.DS_Store
20+
Thumbs.db
21+
22+
# TurboQuant harness
23+
.score
24+
.score_history
25+
.logs/
26+
27+
# Python
28+
__pycache__/
29+
*.pyc
30+
*.egg-info/
31+
dist/
32+
*.whl
33+
34+
# Test artifacts
35+
Testing/
36+
spec/test_vectors/*.bin
37+
38+
# Etc.
39+
refs/

0 commit comments

Comments
 (0)