Problem
This is a positive observation captured in the friction report, not a bug. Filing it as an issue so the lesson isn't lost.
Background
In paper-memorytime-mirage iter-1, the rehearsal_subset (h-main arm, seed 42, both schedulers) ran at the campaign's locked parameters. Both Token-WFQ and KV-time-greedy produced memorytime_share_ratio ≈ 1.06 — vastly below the predicted 3.0×. Rather than reporting null findings, the agent ran a diagnostic D=1 probe, which produced ρ_mt ≈ 4.378 under WFQ. From the contrast, it correctly diagnosed two campaign-author errors:
- D=8 puts the system in a decode-dominated regime where memory-time ∝ P·D, and equal-mean P_A=P_B masks the variance signal. Recommended: D=1.
- K=1M blocks makes the bucket inoperative (ω·K = 450K vs ~152 actual occupancy). Recommended: K ≤ 1000.
The findings.json discrepancy_analysis was a clean post-mortem. The agent confirmed apparatus correctness (zero conservation violations, WFQ counter balance ratio 1.003) before declaring REFUTED with diagnostic_note recommending specific parameter fixes for iter-2.
Why this matters
This is the affirmative case for the rehearsal mechanism. The campaign author made two non-trivial workload-design errors that no amount of pre-run review caught. Iter-1 surfaced both with diagnostic precision, suggested fixes, and confirmed the underlying mechanism is real (4.38× mirage at D=1). Without rehearsal, iter-2 would have produced null results at full scale.
Desired behavior
Capture this lesson in nous's documentation, in two places:
-
Methodology docs (the page that explains experiment_spec.rehearsal_subset and the iter-1-as-rehearsal pattern): add a worked example illustrating the affirmative case. Show how a diagnostic-mode rehearsal can both (a) refute the campaign-author's stated parameters and (b) recommend specific fixes — without escalating to full-scale iter-2.
-
Campaign-authoring guide: add a "unit-check the closed-form prediction against your locked parameters" step before locking. In the paper-memorytime-mirage case, evaluating C_KV(P=1024, D=8) / C_KV(P=mixture, D=8) under realistic π/δ would have shown ratio ≈ 1.06 (decode dominates), revealing the D=8 error pre-run. This step would have eliminated one of the two errors before iter-1 ran.
Suggested implementation sketch
- Add a "Rehearsal as scientific instrument" section to the methodology docs with the paper-memorytime-mirage iter-1 worked example (D=8 → D=1 + K=1M → K=1000 diagnoses).
- Add a "Pre-lock unit check" step to the campaign-authoring guide.
- Cross-link from the rehearsal_subset schema doc to the worked example.
Acceptance criteria
Severity
N/A — positive case, recorded for completeness. Documentation-only.
Source
friction-report.md F14, paper-memorytime-mirage campaign (2026-05).
Part of friction-report tracking issue #245.
Problem
This is a positive observation captured in the friction report, not a bug. Filing it as an issue so the lesson isn't lost.
Background
In paper-memorytime-mirage iter-1, the rehearsal_subset (h-main arm, seed 42, both schedulers) ran at the campaign's locked parameters. Both Token-WFQ and KV-time-greedy produced
memorytime_share_ratio ≈ 1.06— vastly below the predicted 3.0×. Rather than reporting null findings, the agent ran a diagnostic D=1 probe, which produced ρ_mt ≈ 4.378 under WFQ. From the contrast, it correctly diagnosed two campaign-author errors:The findings.json discrepancy_analysis was a clean post-mortem. The agent confirmed apparatus correctness (zero conservation violations, WFQ counter balance ratio 1.003) before declaring REFUTED with diagnostic_note recommending specific parameter fixes for iter-2.
Why this matters
This is the affirmative case for the rehearsal mechanism. The campaign author made two non-trivial workload-design errors that no amount of pre-run review caught. Iter-1 surfaced both with diagnostic precision, suggested fixes, and confirmed the underlying mechanism is real (4.38× mirage at D=1). Without rehearsal, iter-2 would have produced null results at full scale.
Desired behavior
Capture this lesson in nous's documentation, in two places:
Methodology docs (the page that explains
experiment_spec.rehearsal_subsetand the iter-1-as-rehearsal pattern): add a worked example illustrating the affirmative case. Show how a diagnostic-mode rehearsal can both (a) refute the campaign-author's stated parameters and (b) recommend specific fixes — without escalating to full-scale iter-2.Campaign-authoring guide: add a "unit-check the closed-form prediction against your locked parameters" step before locking. In the paper-memorytime-mirage case, evaluating
C_KV(P=1024, D=8) / C_KV(P=mixture, D=8)under realistic π/δ would have shown ratio ≈ 1.06 (decode dominates), revealing the D=8 error pre-run. This step would have eliminated one of the two errors before iter-1 ran.Suggested implementation sketch
Acceptance criteria
Severity
N/A — positive case, recorded for completeness. Documentation-only.
Source
friction-report.mdF14, paper-memorytime-mirage campaign (2026-05).Part of friction-report tracking issue #245.