Commit f343724
committed
D2.1 token-agreement harness scaffold (117/117 tests, +13 new)
First Phase 2 deliverable — scaffold of the I11 cert gate harness.
The PR #219 → #220 lesson landed as a typed-rejection wall: the
stub result carries stub:true + backend:"stub" so no client can
confuse Phase 0 stub output for a real measurement.
crates/cognitive-shader-driver/src/token_agreement.rs (~320 LOC):
ReferenceModel { path, path_hash, stub_token_count }
::load(&Path) -> Result<Self, TokenAgreementError>
D2.1 stub: validates path exists, hashes display; does NOT
parse safetensors yet. D2.2 replaces with real loader driven
by auto_detect::detect() → ModelFingerprint.
::stub(tag, n_tokens) — builds stub model without touching fs
TokenAgreementError:
ModelPathMissing { path }
EmptyPromptSet
TokenCountMismatch { reference, candidate }
NotImplementedYet { what } ← measure_full() until D2.2
TopKAgreement { top1_matches, top5_matches, total_positions,
divergence_positions: Vec<u32> }
::compare(ref: &[Vec<u32>], cand: &[Vec<u32>]) -> Result<Self>
Position-by-position: top1 = r[0] == c[0]; top5 = r[0] in c[..5].
Records divergence positions for failure-mode analysis
(late-sequence drift vs random errors).
::top1_rate() / top5_rate() -> f32
::meets_cert_gate() -> bool (top1 ≥ 0.99 AND top5 ≥ 0.999)
::aggregate(per_prompt) — sums counters; concatenates
divergence with per-prompt offset so failures stay localised
TokenAgreementHarness:
::new(reference, baseline, candidate, n_tokens)
::measure_stub() -> WireTokenAgreementResult { stub:true, .. }
::measure_full() -> NotImplementedYet (D2.2 scope)
Tests (13 new):
- reference_model_stub_builds_without_filesystem
- reference_model_load_missing_path_yields_typed_error
- topk_compare_identical_streams_is_perfect (full cert gate pass)
- topk_compare_all_different_fails_cert_gate
- topk_top5_matches_when_top1_misses_but_in_top5
(ref top-1 = 7; cand has 7 at position 3 in top-5 → top5 counts)
- topk_mismatched_stream_lengths_yield_typed_error
- topk_aggregate_sums_counters_and_offsets_divergence
(prompt 2's divergence at pos 4 → aggregate pos 14 after prompt 1's 10)
- cert_gate_passes_at_exact_thresholds
(990/1000 = 0.99, 999/1000 = 0.999 — both boundaries hit)
- cert_gate_fails_when_top1_below_threshold_even_if_top5_passes
- cert_gate_fails_when_top5_below_threshold_even_if_top1_passes
- harness_measure_stub_returns_machine_checkable_stub_flag
(stub:true enforced; backend="stub"; all rates 0.0; zero latencies)
- harness_measure_full_returns_not_implemented_pointing_at_d22
- harness_measure_stub_rejects_zero_n_tokens
Board hygiene (CLAUDE.md Mandatory rule):
STATUS_BOARD.md D2.1 Queued → In PR
Phase state:
Phase 0 ✅ complete (D0.1-D0.7 all shipped)
Phase 1 scaffold ✅ (D1.1, D1.2, D1.3 shipped; D1.1b queued)
Phase 2 ⏳ D2.1 (this PR), D2.2 + D2.3 queued
Rules honored:
Rule D — Measurement set comes from Wire DTOs (D0.2 WireTokenAgreement)
Rule E — TopKAgreement exposes object-methods (top1_rate, meets_cert_gate)
Rule F — No serialization between stages; per-prompt Vec<Vec<u32>>
token streams are plain Rust owned; the serde happens at
D2.3 handler entry / exit only
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh1 parent 6bed7ae commit f343724
3 files changed
Lines changed: 439 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
73 | | - | |
| 73 | + | |
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
138 | 138 | | |
139 | 139 | | |
140 | 140 | | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
141 | 148 | | |
142 | 149 | | |
143 | 150 | | |
| |||
0 commit comments