Skip to content

[F7] Methodology: apparatus invariants must validate ATTRIBUTION, not just upstream totals #252

@sriumcp

Description

@sriumcp

Problem

A subtle but common experimental-implementation failure mode: an apparatus invariant passes while the underlying attribution it's supposed to validate is wrong.

In paper-memorytime-mirage iter-1, the per-tenant memory-time meter (BLIS sim/kvtime/meter.go) did this:

  • Conservation invariant: Σ_RequestMap == UsedBlocks · BlockSize. ✅ Always passes.
  • Per-tenant attribution: walked runningBatch, not RequestMap.
  • Author's own comment: RequestMap may also contain requests NOT in runningBatch — i.e., orphans (preempted/swapped requests holding KV blocks).
  • Result: orphan requests counted toward UsedBlocks · BlockSize (right-hand side of the conservation check) but NOT attributed to any tenant in the Accumulated per-tenant meter. So per-tenant A_i(t) silently undercounted, while the conservation check passed.

The conservation check validated the upstream total, not the attribution the experiment depended on. This is a generalizable pattern: invariants must validate the variable the experiment cares about, not a related variable that's easier to compute.

This was caught only by external code review during iter-1 — exactly the kind of bug that's invisible to a self-contained run because the meter's own invariant says everything's fine.

Desired behavior

The methodology prompt for the EXECUTE_ANALYZE phase (where the agent implements apparatus / instrumentation code) should include an "apparatus discipline" section:

Apparatus invariants must validate the ATTRIBUTION the experiment depends on, not just an upstream total. When the experiment's claim is "per-tenant A_i is correct," your invariant must compare per-tenant A_i (your attribution variable) against an independent per-tenant ground truth — not against the totals-level ground truth. A check that compares Σ tenants against Σ everything will pass even if individual tenants are mis-attributed (e.g., orphan-attribution bugs, swap-out gaps).

When designing an invariant, ask: if the bug I want to catch were present, would this invariant fail? If the bug-of-interest involves attribution among items, your invariant must distinguish per-item, not just sum.

Suggested implementation sketch

  1. Add the section above to orchestrator/prompts/execute_analyze.md (or wherever the EXECUTE_ANALYZE methodology prompt is assembled), under a heading like "Apparatus discipline".
  2. Include a worked example of the bug pattern (the runningBatch vs RequestMap case is generalizable; describe in domain-neutral terms: "if your meter walks set A but conservation compares set B's total, a mismatch between A and B is invisible").
  3. Add a checklist entry to the apparatus-design portion of the methodology prompt: "For each invariant, identify the bug class it catches. If the bug class is 'attribution among items', the invariant must be per-item."

Acceptance criteria

  • EXECUTE_ANALYZE methodology prompt includes an "Apparatus discipline" section with the attribution-vs-total distinction and a worked example.
  • The prompt's apparatus-design checklist asks the agent to identify the bug class each invariant catches.
  • Friction report F7 row in the tracking issue checks off.

Severity

HIGH — would have produced silently wrong per-tenant memory-time, which is the paper's primary metric.

Source

friction-report.md F7, paper-memorytime-mirage campaign (2026-05).


Part of friction-report tracking issue #245.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or requestfriction-reportFrom external campaign-author friction reports

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions