Problem
The DESIGN agent can emit a workload yaml (under bundle.inputs/*.yaml) whose content fields (per-tenant prompt distributions, decode-token settings, concurrency) deviate substantially from the campaign's stated canonical workload — and nothing in nous's schema or watchdog flow surfaces this deviation.
bundle.experiment_spec.verified_parameters (the field that watchdogs typically diff against the campaign) tracks scalar config values. It does NOT cover workload yaml content, because workload distributions live in inputs/*.yaml files referenced from the bundle, not in the bundle itself.
In paper-memorytime-mirage iter-3, the design agent rewrote the workload from the locked equal-mean-different-variance construction (P_A=1024, P_B mixture, D=1) to a unit-length construction (P_A=4000, P_B=128, D=8). The verified_parameters block looked correct (model, concurrency, duration, warmup all matched). The watchdog saw no deviation. The pivot was scientifically defensible — the unit-length construction produces a much stronger and cleaner mirage (28.84× vs ~4.4×) — but the audit trail did not capture the deviation as a deliberate decision.
In this case it salvaged the campaign. In general it could mask undeclared rewrites that drift the experimental physics without alerting anyone.
Desired behavior
Two changes:
(1) Campaign.yaml locked_workload block parallel to F1's locked_parameters:
locked_workload:
tenants:
tenant-A:
input_distribution: {type: constant, value: 1024}
output_distribution: {type: constant, value: 1}
concurrency: 32
tenant-B:
input_distribution: {type: empirical_pmf, values: [100, 4720], probs: [0.8, 0.2]}
output_distribution: {type: constant, value: 1}
concurrency: 32
The validator diffs bundle.inputs/<workload>.yaml against campaign.locked_workload. Mismatches → hard fail (analogous to F1's locked_parameters).
(2) workload_changes_from_canonical field on bundle.yaml that the design agent must populate when it deliberately rewrites the workload from the campaign's canonical:
workload_changes_from_canonical:
rationale: "Pivoted to unit-length construction (paper §10.1) which produces a 28x cleaner mirage than the locked equal-mean construction (4x)."
diff:
- {tenant: tenant-A, field: input_distribution.value, from: 1024, to: 4000}
- {tenant: tenant-B, field: input_distribution, from: empirical_pmf, to: constant}
- {tenant: both, field: output_distribution.value, from: 1, to: 8}
The locked_workload block uses (1) as a hard constraint; deliberate deviation requires (2) to be populated and approved at HUMAN_DESIGN_GATE. Under --auto-approve, deviation appears in the F4 gate-summary diff.
Suggested implementation sketch
- Extend campaign schema with
locked_workload (analogous to locked_parameters).
- Validator diffs the workload yaml content against the canonical; mismatches → fail unless
workload_changes_from_canonical declares the deviation explicitly.
- Update DESIGN methodology prompt to require
workload_changes_from_canonical when rewriting workload.
Acceptance criteria
Severity
MEDIUM — in the iter-3 case it salvaged the campaign; in general could mask deviation.
Source
friction-report.md F20, paper-memorytime-mirage campaign (2026-05). Depends on F1 for the locked-X infrastructure.
Part of friction-report tracking issue #245.
Problem
The DESIGN agent can emit a workload yaml (under
bundle.inputs/*.yaml) whose content fields (per-tenant prompt distributions, decode-token settings, concurrency) deviate substantially from the campaign's stated canonical workload — and nothing in nous's schema or watchdog flow surfaces this deviation.bundle.experiment_spec.verified_parameters(the field that watchdogs typically diff against the campaign) tracks scalar config values. It does NOT cover workload yaml content, because workload distributions live ininputs/*.yamlfiles referenced from the bundle, not in the bundle itself.In paper-memorytime-mirage iter-3, the design agent rewrote the workload from the locked equal-mean-different-variance construction (P_A=1024, P_B mixture, D=1) to a unit-length construction (P_A=4000, P_B=128, D=8). The verified_parameters block looked correct (model, concurrency, duration, warmup all matched). The watchdog saw no deviation. The pivot was scientifically defensible — the unit-length construction produces a much stronger and cleaner mirage (28.84× vs ~4.4×) — but the audit trail did not capture the deviation as a deliberate decision.
In this case it salvaged the campaign. In general it could mask undeclared rewrites that drift the experimental physics without alerting anyone.
Desired behavior
Two changes:
(1) Campaign.yaml
locked_workloadblock parallel to F1'slocked_parameters:The validator diffs
bundle.inputs/<workload>.yamlagainstcampaign.locked_workload. Mismatches → hard fail (analogous to F1'slocked_parameters).(2)
workload_changes_from_canonicalfield on bundle.yaml that the design agent must populate when it deliberately rewrites the workload from the campaign's canonical:The
locked_workloadblock uses (1) as a hard constraint; deliberate deviation requires (2) to be populated and approved at HUMAN_DESIGN_GATE. Under--auto-approve, deviation appears in the F4 gate-summary diff.Suggested implementation sketch
locked_workload(analogous tolocked_parameters).workload_changes_from_canonicaldeclares the deviation explicitly.workload_changes_from_canonicalwhen rewriting workload.Acceptance criteria
locked_workloadschema field exists and is validated againstbundle.inputs/*.yaml.workload_changes_from_canonicalschema field allows declared deviation with rationale.Severity
MEDIUM — in the iter-3 case it salvaged the campaign; in general could mask deviation.
Source
friction-report.mdF20, paper-memorytime-mirage campaign (2026-05). Depends on F1 for the locked-X infrastructure.Part of friction-report tracking issue #245.