Skip to content

Commit f343724

Browse files
committed
D2.1 token-agreement harness scaffold (117/117 tests, +13 new)
First Phase 2 deliverable — scaffold of the I11 cert gate harness. The PR #219#220 lesson landed as a typed-rejection wall: the stub result carries stub:true + backend:"stub" so no client can confuse Phase 0 stub output for a real measurement. crates/cognitive-shader-driver/src/token_agreement.rs (~320 LOC): ReferenceModel { path, path_hash, stub_token_count } ::load(&Path) -> Result<Self, TokenAgreementError> D2.1 stub: validates path exists, hashes display; does NOT parse safetensors yet. D2.2 replaces with real loader driven by auto_detect::detect() → ModelFingerprint. ::stub(tag, n_tokens) — builds stub model without touching fs TokenAgreementError: ModelPathMissing { path } EmptyPromptSet TokenCountMismatch { reference, candidate } NotImplementedYet { what } ← measure_full() until D2.2 TopKAgreement { top1_matches, top5_matches, total_positions, divergence_positions: Vec<u32> } ::compare(ref: &[Vec<u32>], cand: &[Vec<u32>]) -> Result<Self> Position-by-position: top1 = r[0] == c[0]; top5 = r[0] in c[..5]. Records divergence positions for failure-mode analysis (late-sequence drift vs random errors). ::top1_rate() / top5_rate() -> f32 ::meets_cert_gate() -> bool (top1 ≥ 0.99 AND top5 ≥ 0.999) ::aggregate(per_prompt) — sums counters; concatenates divergence with per-prompt offset so failures stay localised TokenAgreementHarness: ::new(reference, baseline, candidate, n_tokens) ::measure_stub() -> WireTokenAgreementResult { stub:true, .. } ::measure_full() -> NotImplementedYet (D2.2 scope) Tests (13 new): - reference_model_stub_builds_without_filesystem - reference_model_load_missing_path_yields_typed_error - topk_compare_identical_streams_is_perfect (full cert gate pass) - topk_compare_all_different_fails_cert_gate - topk_top5_matches_when_top1_misses_but_in_top5 (ref top-1 = 7; cand has 7 at position 3 in top-5 → top5 counts) - topk_mismatched_stream_lengths_yield_typed_error - topk_aggregate_sums_counters_and_offsets_divergence (prompt 2's divergence at pos 4 → aggregate pos 14 after prompt 1's 10) - cert_gate_passes_at_exact_thresholds (990/1000 = 0.99, 999/1000 = 0.999 — both boundaries hit) - cert_gate_fails_when_top1_below_threshold_even_if_top5_passes - cert_gate_fails_when_top5_below_threshold_even_if_top1_passes - harness_measure_stub_returns_machine_checkable_stub_flag (stub:true enforced; backend="stub"; all rates 0.0; zero latencies) - harness_measure_full_returns_not_implemented_pointing_at_d22 - harness_measure_stub_rejects_zero_n_tokens Board hygiene (CLAUDE.md Mandatory rule): STATUS_BOARD.md D2.1 Queued → In PR Phase state: Phase 0 ✅ complete (D0.1-D0.7 all shipped) Phase 1 scaffold ✅ (D1.1, D1.2, D1.3 shipped; D1.1b queued) Phase 2 ⏳ D2.1 (this PR), D2.2 + D2.3 queued Rules honored: Rule D — Measurement set comes from Wire DTOs (D0.2 WireTokenAgreement) Rule E — TopKAgreement exposes object-methods (top1_rate, meets_cert_gate) Rule F — No serialization between stages; per-prompt Vec<Vec<u32>> token streams are plain Rust owned; the serde happens at D2.3 handler entry / exit only https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
1 parent 6bed7ae commit f343724

3 files changed

Lines changed: 439 additions & 1 deletion

File tree

.claude/board/STATUS_BOARD.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ afterwards is a JIT kernel, not a rebuild. Plan path:
7070

7171
| D-id | Title | Status | PR / Evidence |
7272
|---|---|---|---|
73-
| D2.1 | Reference-model loader (ndarray safetensors) | **Queued** | target ~180 LOC |
73+
| D2.1 | Token-agreement harness scaffold (reference model stub + top-k comparator + stub result) | **In PR** | branch — `ReferenceModel::{load, stub}` + `TokenAgreementError` + `TopKAgreement::{compare, top1_rate, top5_rate, meets_cert_gate, aggregate}` + `TokenAgreementHarness::{measure_stub, measure_full}` + 13 tests. Real safetensors load + decode loop defer to D2.2. |
7474
| D2.2 | Decode-and-compare loop (top-k, per-layer MSE) | **Queued** | target ~220 LOC |
7575
| D2.3 | Handler wiring for `/v1/shader/token-agreement` | **Queued** | target ~60 LOC |
7676

crates/cognitive-shader-driver/src/lib.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,13 @@ pub mod rotation_kernel;
138138
#[cfg(feature = "serve")]
139139
pub mod decode_kernel;
140140

141+
// D2.1 — token-agreement harness scaffold (I11 cert gate infra).
142+
// Reference model loader stub + top-k comparator + stub result with
143+
// machine-checkable `stub:true` flag. D2.2 adds real safetensors decode.
144+
// LAB-ONLY.
145+
#[cfg(feature = "serve")]
146+
pub mod token_agreement;
147+
141148
// Axum REST server. LAB-ONLY.
142149
#[cfg(feature = "serve")]
143150
pub mod serve;

0 commit comments

Comments
 (0)