Skip to content

[F20] Schema: workload yaml deviations need locked_workload + workload_changes_from_canonical #265

@sriumcp

Description

@sriumcp

Problem

The DESIGN agent can emit a workload yaml (under bundle.inputs/*.yaml) whose content fields (per-tenant prompt distributions, decode-token settings, concurrency) deviate substantially from the campaign's stated canonical workload — and nothing in nous's schema or watchdog flow surfaces this deviation.

bundle.experiment_spec.verified_parameters (the field that watchdogs typically diff against the campaign) tracks scalar config values. It does NOT cover workload yaml content, because workload distributions live in inputs/*.yaml files referenced from the bundle, not in the bundle itself.

In paper-memorytime-mirage iter-3, the design agent rewrote the workload from the locked equal-mean-different-variance construction (P_A=1024, P_B mixture, D=1) to a unit-length construction (P_A=4000, P_B=128, D=8). The verified_parameters block looked correct (model, concurrency, duration, warmup all matched). The watchdog saw no deviation. The pivot was scientifically defensible — the unit-length construction produces a much stronger and cleaner mirage (28.84× vs ~4.4×) — but the audit trail did not capture the deviation as a deliberate decision.

In this case it salvaged the campaign. In general it could mask undeclared rewrites that drift the experimental physics without alerting anyone.

Desired behavior

Two changes:

(1) Campaign.yaml locked_workload block parallel to F1's locked_parameters:

locked_workload:
  tenants:
    tenant-A:
      input_distribution: {type: constant, value: 1024}
      output_distribution: {type: constant, value: 1}
      concurrency: 32
    tenant-B:
      input_distribution: {type: empirical_pmf, values: [100, 4720], probs: [0.8, 0.2]}
      output_distribution: {type: constant, value: 1}
      concurrency: 32

The validator diffs bundle.inputs/<workload>.yaml against campaign.locked_workload. Mismatches → hard fail (analogous to F1's locked_parameters).

(2) workload_changes_from_canonical field on bundle.yaml that the design agent must populate when it deliberately rewrites the workload from the campaign's canonical:

workload_changes_from_canonical:
  rationale: "Pivoted to unit-length construction (paper §10.1) which produces a 28x cleaner mirage than the locked equal-mean construction (4x)."
  diff:
    - {tenant: tenant-A, field: input_distribution.value, from: 1024, to: 4000}
    - {tenant: tenant-B, field: input_distribution, from: empirical_pmf, to: constant}
    - {tenant: both, field: output_distribution.value, from: 1, to: 8}

The locked_workload block uses (1) as a hard constraint; deliberate deviation requires (2) to be populated and approved at HUMAN_DESIGN_GATE. Under --auto-approve, deviation appears in the F4 gate-summary diff.

Suggested implementation sketch

  1. Extend campaign schema with locked_workload (analogous to locked_parameters).
  2. Validator diffs the workload yaml content against the canonical; mismatches → fail unless workload_changes_from_canonical declares the deviation explicitly.
  3. Update DESIGN methodology prompt to require workload_changes_from_canonical when rewriting workload.

Acceptance criteria

  • locked_workload schema field exists and is validated against bundle.inputs/*.yaml.
  • workload_changes_from_canonical schema field allows declared deviation with rationale.
  • Methodology prompt instructs DESIGN to use it when applicable.
  • Friction report F20 row in the tracking issue checks off.

Severity

MEDIUM — in the iter-3 case it salvaged the campaign; in general could mask deviation.

Source

friction-report.md F20, paper-memorytime-mirage campaign (2026-05). Depends on F1 for the locked-X infrastructure.


Part of friction-report tracking issue #245.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfriction-reportFrom external campaign-author friction reports

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions