Skip to content

[F3] Distinguish Occam-by-breadth vs Occam-by-depth in rehearsal_subset schema #248

@sriumcp

Description

@sriumcp

Problem

nous's DESIGN intent is "iter-1 < iter-2 cost" (Occam's razor). The schema-blessed mechanism for narrowing iter-1 is experiment_spec.rehearsal_subset, which exposes seeds and arms only — i.e., it lets the agent reduce breadth (fewer cells). It does NOT expose any knob for depth (smaller cells, e.g., shorter duration_seconds, lower concurrency).

When the agent wants iter-1 cheaper but only has breadth-narrowing knobs, it bakes depth-shrinkage directly into verified_parameters. This is dangerous because scale-dependent apparatus checks become invalid silently:

  • An empirical-PMF histogram on a 10s window may not stabilize.
  • A "backlog-nonempty ≥ 99.9% post-warmup" check may pass trivially under low concurrency.
  • A 30-second sliding-window arrival-curve check loses statistical power.

The right principle: retain physics validation with simplicity, instead of sacrificing physics for the sake of simplicity. Occam should narrow what's tested, not weaken what each test means.

Concrete instance

In paper-memorytime-mirage iter-1, the agent shrank duration_seconds from the campaign's locked 600 down to 60 (and concurrency_per_tenant from 32 to 8) to get a faster iter-1 — silently invalidating the workload-distribution histogram and backlog-nonempty checks at the iter-1 scale.

Desired behavior

Extend rehearsal_subset to a richer schema that distinguishes the two modes:

rehearsal_subset:
  seeds: [42]                 # narrows breadth — preserves cell physics
  arms: [h-main]              # narrows breadth — preserves cell physics
  depth_overrides:            # narrows depth — invalidates scale-dependent checks
    duration_seconds: 120
    concurrency_per_tenant: 8
    invalidates_checks:       # author MUST declare which checks become invalid
      - workload-distribution-histogram
      - backlog-nonempty-99.9

The depth_overrides block requires invalidates_checks to be present and non-empty if any depth-class parameter is overridden. This forces the agent (or campaign author) to be explicit about which apparatus guarantees they're surrendering.

Suggested implementation sketch

  1. Extend the bundle schema to add rehearsal_subset.depth_overrides with a required invalidates_checks sub-field.
  2. In the methodology prompt for DESIGN, add a paragraph distinguishing breadth vs depth shrinkage, with the worked example above.
  3. The bundle validator rejects a bundle that has depth_overrides without an explicit invalidates_checks list.
  4. The findings synthesizer marks any check listed in invalidates_checks as "not run at design scale" rather than "passed/failed".

Acceptance criteria

  • Bundle schema documents depth_overrides and invalidates_checks.
  • Validator fails a bundle that overrides any depth-class param without listing invalidated checks.
  • Methodology prompt for DESIGN includes the breadth-vs-depth distinction with example.
  • Friction report F3 row in the tracking issue checks off.

Severity

MEDIUM — invalidated scale-dependent checks at iter-1 in this campaign.

Source

friction-report.md F3, paper-memorytime-mirage campaign (2026-05).


Part of friction-report tracking issue #245.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfriction-reportFrom external campaign-author friction reports

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions