quantumaikr
diff --git a/‎.claude/commands/develop.md‎
Lines changed: 45 additions & 0 deletions b/‎.claude/commands/develop.md‎
Lines changed: 45 additions & 0 deletions
diff --git a/‎.claude/commands/harness.md‎
Lines changed: 78 additions & 0 deletions b/‎.claude/commands/harness.md‎
Lines changed: 78 additions & 0 deletions
diff --git a/‎.claude/commands/merge-gate.md‎
Lines changed: 69 additions & 0 deletions b/‎.claude/commands/merge-gate.md‎
Lines changed: 69 additions & 0 deletions
diff --git a/‎.claude/commands/score.md‎
Lines changed: 19 additions & 0 deletions b/‎.claude/commands/score.md‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎.claude/commands/spawn-team.md‎
Lines changed: 81 additions & 0 deletions b/‎.claude/commands/spawn-team.md‎
Lines changed: 81 additions & 0 deletions
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 49 additions & 0 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 49 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 39 additions & 0 deletions b/‎.gitignore‎
Lines changed: 39 additions & 0 deletions
@@ -0,0 +1,45 @@
+---
+description: Autonomous development — implement the next WBS item using the Karpathy loop
+argument-hint: Optional specific module to work on (e.g., polar, qjl, foundation)
+---
+
+# Develop
+
+Autonomous single-agent development loop following the Karpathy AutoResearch pattern.
+
+## Protocol
+
+You are an autonomous development agent for TurboQuant.cpp.
+Follow this loop exactly:
+
+### Step 1: Assess
+- Run `bash score.sh --quick` to see current score
+- Read `docs/wbs_v0.1.md` to find the next unchecked `- [ ]` item
+
+If the user specified a module ($ARGUMENTS), focus only on WBS items related to that module.
+
+### Step 2: Implement
+- Read `program.md` and `CLAUDE.md` for specifications
+- Read the relevant reference code in `refs/` before implementing
+- Implement the WBS item (create/edit files)
+- Follow module ownership rules from CLAUDE.md — only modify files you own
+
+### Step 3: Verify
+- Run `bash score.sh --quick`
+- If score improved or stayed the same: proceed
+- If score dropped: revert your changes and try a different approach
+- Ensure all tests pass: `cd build && ctest --output-on-failure`
+
+### Step 4: Commit
+- Mark the WBS item as `[x]` in `docs/wbs_v0.1.md`
+- Stage only the files you changed (not refs/, not .score_history)
+- Commit with a descriptive message
+
+### Step 5: Report
+- Show the user: what was implemented, score before → after, next item
+
+### Rules
+- ONE WBS item per invocation. Small, correct, incremental.
+- Never modify files in `refs/`, `program.md`, or `score.sh`
+- Always read reference code before implementing algorithms
+- If build fails, fix the build before doing anything else
@@ -0,0 +1,78 @@
+---
+description: Launch the hierarchical harness (Karpathy loop + ClawTeam parallel agents)
+argument-hint: Optional target score (default 0.9) or "single" for single-agent mode
+---
+
+# Harness
+
+Launch the full Hierarchical Harness that combines the Karpathy AutoResearch loop with ClawTeam multi-agent parallelism.
+
+## How It Works
+
+The harness has an Outer Loop (you, the Leader) and Inner Loops (spawned workers):
+
+```
+You (Leader):
+  score → identify bottleneck → delegate modules → merge gate → repeat
+
+Workers (in isolated worktrees):
+  each runs: score → modify own module → score → report back
+```
+
+## Execution
+
+### Step 1: Score and assess phase
+
+Run `bash score.sh` and determine the current phase:
+
+| Score | Phase | Action |
+|-------|-------|--------|
+| < 0.05 | Foundation | YOU do it directly (single agent) |
+| 0.05 ~ 0.30 | Core Algorithms | Spawn parallel workers: polar, qjl, uniform |
+| 0.30 ~ 0.60 | Advanced | Spawn parallel workers: turbo, cache, simd-neon, bench |
+| > 0.60 | Fine-tuning | YOU do it directly (precision matters) |
+
+### Step 2: For Foundation / Fine-tuning phases (single agent)
+
+Use the `/develop` command pattern — implement one WBS item at a time.
+
+### Step 3: For parallel phases, spawn ClawTeam workers
+
+$ARGUMENTS can override the target score (default: 0.9).
+
+```bash
+# Create team
+clawteam team spawn-team tq-dev -d "TurboQuant.cpp development"
+
+# Spawn workers for each independent module
+clawteam spawn --team tq-dev --agent-name polar --workspace --repo . \
+  --task "Implement PolarQuant in src/core/tq_polar.c. Read refs/PolarQuant/models/modeling_llama_polar.py for algorithm. Write tests/test_polar.cpp. Run bash score.sh --quick after changes. Only modify: src/core/tq_polar.*, tests/test_polar.*"
+
+clawteam spawn --team tq-dev --agent-name qjl --workspace --repo . \
+  --task "Implement QJL in src/core/tq_qjl.c. Read refs/QJL/models/llama2_utils_qjl.py for algorithm. Write tests/test_qjl.cpp. Run bash score.sh --quick after changes. Only modify: src/core/tq_qjl.*, tests/test_qjl.*"
+```
+
+### Step 4: Wait and merge gate
+
+```bash
+# Wait for all workers
+clawteam task wait tq-dev --timeout 1800
+
+# Merge gate: merge each worker one-by-one
+# For each worker branch:
+#   1. git merge <branch> --no-edit
+#   2. bash score.sh --quick
+#   3. If score dropped: git reset --hard HEAD~1
+#   4. If score OK: continue
+```
+
+### Step 5: Loop back to Step 1
+
+Repeat until the target score is reached.
+
+## Key Rules
+
+- Workers must only modify files in their module ownership (see CLAUDE.md)
+- Merge gate ALWAYS checks score after each merge — revert if it drops
+- Foundation and fine-tuning phases are always single-agent (safer)
+- Monitor workers: `clawteam board attach tq-dev`
@@ -0,0 +1,69 @@
+---
+description: Merge worker branches one-by-one with score-based accept/reject
+argument-hint: Team name (e.g., tq-alg)
+---
+
+# Merge Gate
+
+Safely merge completed ClawTeam worker branches into main, reverting any merge that causes a score drop.
+
+## Protocol
+
+The team name is: $ARGUMENTS
+
+If no team name provided, list available branches with `git branch -a | grep clawteam`.
+
+### Step 1: Record baseline score
+
+```bash
+bash score.sh --quick
+```
+
+Save the score as `baseline_score`.
+
+### Step 2: List worker branches
+
+```bash
+git branch -a | grep "clawteam/$ARGUMENTS"
+```
+
+### Step 3: For each worker branch, sequentially:
+
+```
+a. Save current HEAD:
+   pre_merge=$(git rev-parse HEAD)
+
+b. Attempt merge:
+   git merge <branch> --no-edit -m "Merge <worker> results"
+
+c. If merge conflict:
+   git merge --abort
+   Report: "<worker> has merge conflicts — skipping"
+   Continue to next worker
+
+d. Score check:
+   bash score.sh --quick
+   new_score=$(cat .score)
+
+e. Decision:
+   If new_score >= baseline_score:
+     Report: "<worker> merged OK (score: baseline → new_score)"
+     Update baseline_score = new_score
+   Else:
+     Report: "<worker> REVERTED (score dropped: baseline → new_score)"
+     git reset --hard $pre_merge
+```
+
+### Step 4: Final report
+
+- Run `bash score.sh` (full evaluation)
+- Show which workers were merged and which were reverted
+- Show final score vs original baseline
+- Suggest next action based on new score
+
+### Rules
+
+- ALWAYS merge one worker at a time, never batch
+- ALWAYS check score after each merge
+- ALWAYS revert if score drops — no exceptions
+- Order preference: merge simpler modules first (uniform → polar → qjl → turbo → cache → simd → bench)
@@ -0,0 +1,19 @@
+---
+description: Run the 5-dimension scoring harness and display results
+---
+
+# Score
+
+Run the TurboQuant.cpp scoring harness to measure project completeness across 5 dimensions.
+
+## Steps
+
+1. Run `bash score.sh` (full evaluation) using the Bash tool
+2. Read the `.score` file for the numeric score
+3. Present the results to the user in a clear summary:
+   - Total score (X.XXXX / 1.0000)
+   - Each dimension's percentage (structure, correctness, quality, performance, integration)
+   - The LOWEST scoring dimension (this is the bottleneck)
+   - Specific items scoring 0 that could be improved next
+4. If `.score_history` exists, show the trend (improving/declining/stagnant)
+5. Suggest the single highest-impact next action based on the score breakdown
@@ -0,0 +1,81 @@
+---
+description: Spawn ClawTeam parallel workers for the current development phase
+argument-hint: Optional phase override (foundation, algorithms, advanced, finetune)
+---
+
+# Spawn Team
+
+Spawn a team of parallel ClawTeam workers, each in an isolated git worktree, to work on independent modules simultaneously.
+
+## Steps
+
+### Step 1: Determine current phase
+
+Run `bash score.sh --quick` and read `.score` to determine the phase.
+
+If user specified a phase ($ARGUMENTS), use that instead.
+
+### Step 2: Spawn workers based on phase
+
+Execute the appropriate clawteam commands:
+
+#### Phase: foundation (score < 0.05)
+Do NOT spawn workers. Tell the user: "Foundation phase should be done with `/develop foundation` (single agent). The project needs CMakeLists.txt, headers, and type definitions before parallel work can begin."
+
+#### Phase: algorithms (score 0.05 ~ 0.30)
+```bash
+clawteam team spawn-team tq-alg -d "TurboQuant core algorithms"
+
+clawteam spawn --team tq-alg --agent-name polar --workspace --repo . \
+  --task "Implement PolarQuant algorithm. Read CLAUDE.md for full context. Read refs/PolarQuant/models/modeling_llama_polar.py lines 135-157 and refs/PolarQuant/models/kernel4group.py lines 14-81 for the algorithm. Create src/core/tq_polar.c with tq_polar_quantize_ref(), tq_polar_dequantize_ref(), tq_polar_attention_ref(). Create tests/test_polar.cpp with Google Test. Run bash score.sh --quick to verify. ONLY modify: src/core/tq_polar.*, tests/test_polar.*"
+
+clawteam spawn --team tq-alg --agent-name qjl --workspace --repo . \
+  --task "Implement QJL algorithm. Read CLAUDE.md for full context. Read refs/QJL/models/llama2_utils_qjl.py lines 7-185 for the algorithm. Create src/core/tq_qjl.c with tq_qjl_init_projection(), tq_qjl_quantize_ref(), tq_qjl_detect_outliers(), tq_qjl_attention_ref(). Create tests/test_qjl.cpp with Google Test. Run bash score.sh --quick to verify. ONLY modify: src/core/tq_qjl.*, tests/test_qjl.*"
+
+clawteam spawn --team tq-alg --agent-name uniform --workspace --repo . \
+  --task "Implement uniform baseline and value quantization. Read CLAUDE.md for full context. Create src/core/tq_uniform.c (min-max 2/4-bit), src/core/tq_value_quant.c (value cache quantization). Create tests/test_uniform.cpp and tests/test_value.cpp. Run bash score.sh --quick to verify. ONLY modify: src/core/tq_uniform.*, src/core/tq_value_quant.*, tests/test_uniform.*, tests/test_value.*"
+```
+
+#### Phase: advanced (score 0.30 ~ 0.60)
+```bash
+clawteam team spawn-team tq-adv -d "TurboQuant advanced features"
+
+clawteam spawn --team tq-adv --agent-name turbo --workspace --repo . \
+  --task "Implement TurboQuant composite (PolarQuant + QJL). Read CLAUDE.md. Create src/core/tq_turbo.c combining polar stage 1 + qjl residual stage 2. Create tests/test_turbo.cpp. ONLY modify: src/core/tq_turbo.*, tests/test_turbo.*"
+
+clawteam spawn --team tq-adv --agent-name cache --workspace --repo . \
+  --task "Implement paged cache and progressive compression. Read CLAUDE.md. Read refs/vllm/csrc/cache_kernels.cu for patterns. Create src/cache/tq_paged_cache.c and src/cache/tq_progressive.c with tests. ONLY modify: src/cache/**, tests/test_paged_cache.*, tests/test_progressive.*"
+
+clawteam spawn --team tq-adv --agent-name simd --workspace --repo . \
+  --task "Implement NEON and AVX2 optimized kernels. Read CLAUDE.md. Read refs/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c for NEON patterns. Create src/backend/cpu/tq_generic.c, tq_neon.c, tq_avx2.c, tq_cpu_dispatch.c. ONLY modify: src/backend/cpu/**"
+
+clawteam spawn --team tq-adv --agent-name bench --workspace --repo . \
+  --task "Create benchmarks and specs. Read CLAUDE.md. Create bench/tq_bench.cpp (output: quantize_throughput=N, attention_throughput=N, compression_ratio=N, simd_speedup=N). Create bench/tq_quality.cpp (output: roundtrip_mse=N, attention_cosine=N, cross_platform=pass/fail). Create spec/tq_format_v1.md and spec/tq_operators_v1.md. ONLY modify: bench/**, spec/**"
+```
+
+#### Phase: finetune (score > 0.60)
+Do NOT spawn workers. Tell the user: "Fine-tuning phase is best done with `/develop` (single agent) for precision. Focus on the lowest-scoring dimension."
+
+### Step 3: Monitor
+
+Tell the user how to monitor:
+```bash
+clawteam board attach <team-name>     # Live tmux view
+clawteam task list <team-name>         # Task status
+watch -n 30 bash score.sh --quick      # Score tracking
+```
+
+### Step 4: After workers complete
+
+Tell the user to run the merge gate:
+```bash
+# Wait for completion
+clawteam task wait <team-name> --timeout 1800
+
+# Then merge each worker's branch one-by-one:
+# git merge clawteam/<team>/<worker> --no-edit
+# bash score.sh --quick
+# If score dropped: git reset --hard HEAD~1
+```
+
+Or suggest running `/harness` which automates the merge gate.
@@ -0,0 +1,49 @@
+name: CI
+
+on:
+  push:
+    branches: [main, develop]
+  pull_request:
+    branches: [main]
+
+jobs:
+  build-and-test:
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - os: ubuntu-latest
+            arch: x86_64
+            cmake_extra: ""
+          - os: macos-latest
+            arch: arm64
+            cmake_extra: ""
+
+    runs-on: ${{ matrix.os }}
+    name: ${{ matrix.os }} (${{ matrix.arch }})
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Configure CMake
+        run: >
+          cmake -B build
+          -DCMAKE_BUILD_TYPE=Release
+          -DTQ_BUILD_TESTS=ON
+          -DTQ_BUILD_BENCH=ON
+          ${{ matrix.cmake_extra }}
+
+      - name: Build
+        run: cmake --build build --config Release -j$(nproc 2>/dev/null || sysctl -n hw.ncpu)
+
+      - name: Run tests
+        run: ctest --test-dir build --output-on-failure --timeout 120
+
+      - name: Upload test results
+        if: failure()
+        uses: actions/upload-artifact@v4
+        with:
+          name: test-results-${{ matrix.os }}-${{ matrix.arch }}
+          path: build/Testing/
+          retention-days: 7
@@ -0,0 +1,39 @@
+# Build
+build/
+build-*/
+cmake-build-*/
+*.o
+*.a
+*.so
+*.dylib
+*.dll
+
+# IDE
+.vscode/
+.idea/
+*.xcodeproj/
+*.xcworkspace/
+compile_commands.json
+
+# OS
+.DS_Store
+Thumbs.db
+
+# TurboQuant harness
+.score
+.score_history
+.logs/
+
+# Python
+__pycache__/
+*.pyc
+*.egg-info/
+dist/
+*.whl
+
+# Test artifacts
+Testing/
+spec/test_vectors/*.bin
+
+# Etc.
+refs/