Problem
A subtle but common experimental-implementation failure mode: an apparatus invariant passes while the underlying attribution it's supposed to validate is wrong.
In paper-memorytime-mirage iter-1, the per-tenant memory-time meter (BLIS sim/kvtime/meter.go) did this:
- Conservation invariant:
Σ_RequestMap == UsedBlocks · BlockSize. ✅ Always passes.
- Per-tenant attribution: walked
runningBatch, not RequestMap.
- Author's own comment:
RequestMap may also contain requests NOT in runningBatch — i.e., orphans (preempted/swapped requests holding KV blocks).
- Result: orphan requests counted toward
UsedBlocks · BlockSize (right-hand side of the conservation check) but NOT attributed to any tenant in the Accumulated per-tenant meter. So per-tenant A_i(t) silently undercounted, while the conservation check passed.
The conservation check validated the upstream total, not the attribution the experiment depended on. This is a generalizable pattern: invariants must validate the variable the experiment cares about, not a related variable that's easier to compute.
This was caught only by external code review during iter-1 — exactly the kind of bug that's invisible to a self-contained run because the meter's own invariant says everything's fine.
Desired behavior
The methodology prompt for the EXECUTE_ANALYZE phase (where the agent implements apparatus / instrumentation code) should include an "apparatus discipline" section:
Apparatus invariants must validate the ATTRIBUTION the experiment depends on, not just an upstream total. When the experiment's claim is "per-tenant A_i is correct," your invariant must compare per-tenant A_i (your attribution variable) against an independent per-tenant ground truth — not against the totals-level ground truth. A check that compares Σ tenants against Σ everything will pass even if individual tenants are mis-attributed (e.g., orphan-attribution bugs, swap-out gaps).
When designing an invariant, ask: if the bug I want to catch were present, would this invariant fail? If the bug-of-interest involves attribution among items, your invariant must distinguish per-item, not just sum.
Suggested implementation sketch
- Add the section above to
orchestrator/prompts/execute_analyze.md (or wherever the EXECUTE_ANALYZE methodology prompt is assembled), under a heading like "Apparatus discipline".
- Include a worked example of the bug pattern (the
runningBatch vs RequestMap case is generalizable; describe in domain-neutral terms: "if your meter walks set A but conservation compares set B's total, a mismatch between A and B is invisible").
- Add a checklist entry to the apparatus-design portion of the methodology prompt: "For each invariant, identify the bug class it catches. If the bug class is 'attribution among items', the invariant must be per-item."
Acceptance criteria
Severity
HIGH — would have produced silently wrong per-tenant memory-time, which is the paper's primary metric.
Source
friction-report.md F7, paper-memorytime-mirage campaign (2026-05).
Part of friction-report tracking issue #245.
Problem
A subtle but common experimental-implementation failure mode: an apparatus invariant passes while the underlying attribution it's supposed to validate is wrong.
In paper-memorytime-mirage iter-1, the per-tenant memory-time meter (BLIS
sim/kvtime/meter.go) did this:Σ_RequestMap == UsedBlocks · BlockSize. ✅ Always passes.runningBatch, notRequestMap.RequestMap may also contain requests NOT in runningBatch— i.e., orphans (preempted/swapped requests holding KV blocks).UsedBlocks · BlockSize(right-hand side of the conservation check) but NOT attributed to any tenant in theAccumulatedper-tenant meter. So per-tenantA_i(t)silently undercounted, while the conservation check passed.The conservation check validated the upstream total, not the attribution the experiment depended on. This is a generalizable pattern: invariants must validate the variable the experiment cares about, not a related variable that's easier to compute.
This was caught only by external code review during iter-1 — exactly the kind of bug that's invisible to a self-contained run because the meter's own invariant says everything's fine.
Desired behavior
The methodology prompt for the EXECUTE_ANALYZE phase (where the agent implements apparatus / instrumentation code) should include an "apparatus discipline" section:
Suggested implementation sketch
orchestrator/prompts/execute_analyze.md(or wherever the EXECUTE_ANALYZE methodology prompt is assembled), under a heading like "Apparatus discipline".runningBatchvsRequestMapcase is generalizable; describe in domain-neutral terms: "if your meter walks set A but conservation compares set B's total, a mismatch between A and B is invisible").Acceptance criteria
Severity
HIGH — would have produced silently wrong per-tenant memory-time, which is the paper's primary metric.
Source
friction-report.mdF7, paper-memorytime-mirage campaign (2026-05).Part of friction-report tracking issue #245.