Skip to content

Commit b6ba8a6

Browse files
SmithSmith
authored andcommitted
feat: enhance model resolution rules and add deterministic mode checks in scenarios and tests
1 parent 817b0e8 commit b6ba8a6

12 files changed

Lines changed: 158 additions & 18 deletions

File tree

Orchestrator.agent.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ Do NOT use `vscode/askQuestions` for questions answerable from codebase evidence
7979
- Use `revision_mode: initial_create` when no active plan exists.
8080
- Use `revision_mode: in_place_update` for ordinary PLAN_REVIEW fixes to an active draft/current plan. The payload-selected path is `active_plan_path`, and Planner must return the same `plan_path`.
8181
- Use `revision_mode: new_artifact_supersession` only for accepted-baseline replacement, user-requested new artifacts, material invalidation, or independent citation needs. The payload-selected path is `existing_plan_path`, and the new Planner output should set `revision_of` to that prior path.
82-
- Apply the Universal Model Resolution Rule before every Planner dispatch. For replan/update dispatches, the outer `agent/runSubagent` call must include the resolved outer `model`, and the Planner payload must include payload-level `model`, `trace_id`, review-loop `iteration_index`, `revision_mode`, `revision_reason`, and exactly the selected path field for the mode: `active_plan_path` for `in_place_update` or `existing_plan_path` for `new_artifact_supersession`.
82+
- Apply the Universal Model Resolution Rule before every Planner dispatch. For replan/update dispatches, deterministic mode must include the resolved outer `model`, while auto mode intentionally omits outer `model`; the Planner payload must include `runtime_model_mode`, payload-level `model` when deterministic mode requires it, `trace_id`, review-loop `iteration_index`, `revision_mode`, `revision_reason`, and exactly the selected path field for the mode: `active_plan_path` for `in_place_update` or `existing_plan_path` for `new_artifact_supersession`.
8383
- Serialize write-capable Planner revisions by `(trace_id, active_plan_path)`. Never run two write-capable Planner updates to the same plan in parallel; parallel review agents may read the same `plan_path` but must not edit it.
8484
- Phase 3 structural validation is not behavior-complete. `cd evals && npm run test:structural` confirms schema structure and legacy compatibility only; Phase 4 owns conditional enforcement behavior tests and scenario migration for `revision_mode`, selected path fields, `trace_id`, and `iteration_index`.
8585

@@ -178,7 +178,7 @@ This rule covers all dispatch paths without exception: Plan Review Gate reviewer
178178

179179
### Dispatch Tool-Call Contract (Required Fields)
180180

181-
Every `agent/runSubagent` call must include these outer tool-call fields:
181+
Every `agent/runSubagent` call must follow these outer tool-call envelope rules:
182182
- **`agentName`** — the verified target-agent field (string). Placing the agent name only inside prompt prose or a delegation payload is non-compliant.
183183
- **`model`** — mode-conditional outer runtime selector from the Universal Model Resolution Rule. In deterministic mode (opt-in for pinned dispatch), pass the resolved primary as the outer `model` field and never omit it. In auto mode (the default), omit the outer `model` field intentionally so Copilot selects the model automatically.
184184
- **Prompt/context payload** — scope, deliverables, and relevant context references.
@@ -265,7 +265,7 @@ For `CodeReviewer-subagent`, `PlanAuditor-subagent`, and `AssumptionVerifier-sub
265265
- If a legacy phase omits `executor_agent`, do not infer silently. Route the plan back through `REPLAN` to Planner and stop the implementation batch until the phase is reissued with an explicit executor.
266266
- Build a `phase_task_card` for executor payloads when the phase has `phase_task_card_path`, the plan uses `resource_profile: small_local`, or `governance/runtime-policy.json` `resource_profiles.small_local.require_phase_task_card` applies. The card must include objective, allowed files, forbidden areas, context artifacts, validation commands, acceptance checks, max changed files, and escalation rule.
267267
- When `phase_task_card` budgets are exceeded, do not widen the phase silently. Route to Planner with `needs_replan` or stop with `NEEDS_INPUT` according to the card's escalation rule.
268-
- **Model Resolution:** Apply the Universal Model Resolution Rule (see Execution Protocol preamble above) before delegating execution: look up `phase.executor_agent` in `agent_role_index`, resolve `roles[role].by_tier[complexity_tier]`, and pass the resolved primary model as the `model` parameter. If the tier entry is `{ "inherit_from": "default" }`, use the role's default `primary`. Only pass a fallback list if `agent/runSubagent` explicitly supports one.
268+
- **Model Resolution:** Apply the Universal Model Resolution Rule (see Execution Protocol preamble above) before delegating execution: look up `phase.executor_agent` in `agent_role_index`, resolve `roles[role].by_tier[complexity_tier]`, and derive the primary model from the tier entry or role default. In deterministic mode, pass that resolved primary as the outer `model` parameter. In auto mode, omit the outer `model` parameter intentionally so platform auto-selection can choose the runtime model. Only pass a fallback list if `agent/runSubagent` explicitly supports one.
269269
- Delegate execution to the declared executor agent.
270270
- Verification Build Gate: after the implementation subagent reports completion, verify build success. Either confirm the execution report includes `build.state: PASS`, or if build evidence is absent or ambiguous, run the project's build command directly. If the build fails, route through Failure Classification Handling before proceeding.
271271
- Delegate to CodeReviewer-subagent for phase code review (apply Universal Model Resolution Rule). Code review is mandatory for all complexity tiers — see `governance/runtime-policy.json → review_pipeline_by_tier.code_review`. Pass the changed files list, phase scope, and executor agent execution report.
@@ -341,7 +341,7 @@ When a subagent returns a `failure_classification`, Orchestrator routes automati
341341
| `escalate` | STOP — transition to `WAITING_APPROVAL`, present to user | 0 |
342342
| `model_unavailable` | Retry the same agent up to `retry_budgets.model_unavailable_max` times; on exhaustion, escalate to user via `WAITING_APPROVAL` | retry_budgets.model_unavailable_max |
343343

344-
If retry limit is exhausted, escalate to user with accumulated failure evidence. For all dispatch actions in this table (retry or replan), apply the Universal Model Resolution Rule to resolve the `model` parameter — including needs_replan Planner dispatch. A `needs_replan` Planner dispatch that updates an active plan must follow Planner Revision Modes: include outer `model`, payload-level `model`, `trace_id`, review-loop `iteration_index`, `revision_mode`, `revision_reason`, and either `active_plan_path` for `in_place_update` or `existing_plan_path` for `new_artifact_supersession`.
344+
If retry limit is exhausted, escalate to user with accumulated failure evidence. For all dispatch actions in this table (retry or replan), apply the Universal Model Resolution Rule — including needs_replan Planner dispatch. A `needs_replan` Planner dispatch that updates an active plan must follow Planner Revision Modes: include outer `model` only in deterministic mode, omit outer `model` in auto mode, include payload-level `model` when deterministic mode requires it, and include `runtime_model_mode`, `trace_id`, review-loop `iteration_index`, `revision_mode`, `revision_reason`, and either `active_plan_path` for `in_place_update` or `existing_plan_path` for `new_artifact_supersession`.
345345

346346
### Diagnosis Packet (MEDIUM/LARGE — Fixable Retries)
347347

@@ -410,4 +410,4 @@ Use `plans/templates/plan-document-template.md` for full authoring rules. Inline
410410
- No batching of todo completions across phases. Each completion is a separate `#todos` call, made at the moment of phase verification — not aggregated for later flushing.
411411
- No phase work may resume after a context compaction or session restart without first reconciling the `#todos` state against actual plan-artifact reality.
412412
- If uncertain and cannot verify safely: `ABSTAIN`.
413-
- No `agent/runSubagent` dispatch may omit the `model` parameter. Every dispatch must apply the Universal Model Resolution Rule from Execution Protocol.
413+
- No deterministic-mode `agent/runSubagent` dispatch may omit the outer `model` parameter. Auto-mode dispatches intentionally omit outer `model` and must carry `runtime_model_mode: auto` in the payload. Every dispatch must apply the Universal Model Resolution Rule from Execution Protocol.

evals/drift-checks.mjs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,9 @@ export function validateModelResolutionScenarioNegatives(scenario) {
304304

305305
const missingOuterModel = byId.get('missing-outer-model');
306306
if (missingOuterModel) {
307+
if (missingOuterModel?.input_context?.runtime_model_mode !== 'deterministic') {
308+
errors.push('missing-outer-model: input runtime_model_mode must be deterministic');
309+
}
307310
if (missingOuterModel?.broken_dispatch?.outer_fields?.agentName_present !== true) {
308311
errors.push('missing-outer-model: outer agentName must be present so the failure is isolated to model');
309312
}
@@ -317,6 +320,9 @@ export function validateModelResolutionScenarioNegatives(scenario) {
317320

318321
const payloadOnlyModel = byId.get('payload-only-model');
319322
if (payloadOnlyModel) {
323+
if (payloadOnlyModel?.input_context?.runtime_model_mode !== 'deterministic') {
324+
errors.push('payload-only-model: input runtime_model_mode must be deterministic');
325+
}
320326
if (payloadOnlyModel?.broken_dispatch?.outer_fields?.model_present !== false) {
321327
errors.push('payload-only-model: outer model must be absent');
322328
}
@@ -391,6 +397,9 @@ export function validateModelResolutionScenarioNegatives(scenario) {
391397

392398
const omittedDueMissingTier = byId.get('omitted-model-due-missing-tier-context');
393399
if (omittedDueMissingTier) {
400+
if (omittedDueMissingTier?.input_context?.runtime_model_mode !== 'deterministic') {
401+
errors.push('omitted-model-due-missing-tier-context: input runtime_model_mode must be deterministic');
402+
}
394403
if (omittedDueMissingTier?.input_context?.complexity_tier_present !== false) {
395404
errors.push('omitted-model-due-missing-tier-context: complexity_tier_present must be false');
396405
}

evals/scenarios/orchestrator-model-resolution.json

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -249,7 +249,10 @@
249249
},
250250
{
251251
"case_id": "missing-outer-model",
252-
"description": "Broken dispatch supplies the outer agentName but omits the outer model field entirely.",
252+
"description": "Broken deterministic-mode dispatch supplies the outer agentName but omits the outer model field entirely.",
253+
"input_context": {
254+
"runtime_model_mode": "deterministic"
255+
},
253256
"broken_dispatch": {
254257
"outer_fields": {
255258
"agentName_present": true,
@@ -263,14 +266,17 @@
263266
"expected": {
264267
"rejected": true,
265268
"violates": "missing_outer_model",
266-
"reason": "Every internal dispatch must pass the governance-resolved primary model as the outer model parameter.",
269+
"reason": "Deterministic-mode internal dispatch must pass the governance-resolved primary model as the outer model parameter.",
267270
"offline_detection_scope": "structural_contract",
268271
"live_runtime_assertion": false
269272
}
270273
},
271274
{
272275
"case_id": "payload-only-model",
273-
"description": "Broken dispatch carries a payload-level model for audit context but omits the outer model runtime selector.",
276+
"description": "Broken deterministic-mode dispatch carries a payload-level model for audit context but omits the outer model runtime selector.",
277+
"input_context": {
278+
"runtime_model_mode": "deterministic"
279+
},
274280
"broken_dispatch": {
275281
"outer_fields": {
276282
"agentName_present": true,
@@ -368,10 +374,11 @@
368374
},
369375
{
370376
"case_id": "omitted-model-due-missing-tier-context",
371-
"description": "Broken dispatch omits model because no complexity_tier is available yet instead of using the target role top-level primary.",
377+
"description": "Broken deterministic-mode dispatch omits model because no complexity_tier is available yet instead of using the target role top-level primary.",
372378
"input_context": {
373379
"target_agent": "Planner",
374380
"role": "capable-planner",
381+
"runtime_model_mode": "deterministic",
375382
"complexity_tier_present": false,
376383
"dispatch_family": "initial_planner_dispatch"
377384
},
@@ -387,7 +394,7 @@
387394
"violates": "omitted_model_missing_tier_context",
388395
"resolution_when_tier_missing": "top_level_primary",
389396
"resolved_primary_model": "GPT-5.5 (copilot)",
390-
"reason": "Missing tier context changes the resolution source, not the requirement to pass an outer model field.",
397+
"reason": "In deterministic mode, missing tier context changes the resolution source, not the requirement to pass an outer model field.",
391398
"offline_detection_scope": "structural_contract",
392399
"live_runtime_assertion": false
393400
}
@@ -402,4 +409,4 @@
402409
"reference_cases_documented": 10,
403410
"negative_cases_documented": 7
404411
}
405-
}
412+
}

evals/tests/drift-detection.test.mjs

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -783,6 +783,7 @@ console.log('\n=== Check #6c — model resolution scenario negative cases ===');
783783
},
784784
{
785785
case_id: 'missing-outer-model',
786+
input_context: { runtime_model_mode: 'deterministic' },
786787
broken_dispatch: {
787788
outer_fields: { agentName_present: true, model_present: false },
788789
payload_fields: { model_present: false },
@@ -791,6 +792,7 @@ console.log('\n=== Check #6c — model resolution scenario negative cases ===');
791792
},
792793
{
793794
case_id: 'payload-only-model',
795+
input_context: { runtime_model_mode: 'deterministic' },
794796
broken_dispatch: {
795797
outer_fields: { agentName_present: true, model_present: false },
796798
payload_fields: { model_present: true },
@@ -820,7 +822,7 @@ console.log('\n=== Check #6c — model resolution scenario negative cases ===');
820822
},
821823
{
822824
case_id: 'omitted-model-due-missing-tier-context',
823-
input_context: { complexity_tier_present: false },
825+
input_context: { runtime_model_mode: 'deterministic', complexity_tier_present: false },
824826
broken_dispatch: { outer_fields: { agentName_present: true, model_present: false } },
825827
expected: { rejected: true, violates: 'omitted_model_missing_tier_context', resolution_when_tier_missing: 'top_level_primary', resolved_primary_model: 'GPT-5.5 (copilot)', offline_detection_scope: 'structural_contract', live_runtime_assertion: false },
826828
},
@@ -854,6 +856,15 @@ console.log('\n=== Check #6c — model resolution scenario negative cases ===');
854856
`ok=${payloadOnlyConflated.ok}, errors=${JSON.stringify(payloadOnlyConflated.errors)}`
855857
);
856858

859+
const missingDeterministicModeScenario = JSON.parse(JSON.stringify(validScenario));
860+
delete missingDeterministicModeScenario.input.negative_cases.find(c => c.case_id === 'missing-outer-model').input_context.runtime_model_mode;
861+
const missingDeterministicMode = validateModelResolutionScenarioNegatives(missingDeterministicModeScenario);
862+
check(
863+
'negative: deterministic missing-outer-model case without deterministic marker is flagged',
864+
missingDeterministicMode.ok === false && missingDeterministicMode.errors.some(e => e.includes('missing-outer-model')),
865+
`ok=${missingDeterministicMode.ok}, errors=${JSON.stringify(missingDeterministicMode.errors)}`
866+
);
867+
857868
const autoModeRejectedScenario = JSON.parse(JSON.stringify(validScenario));
858869
autoModeRejectedScenario.input.negative_cases.find(c => c.case_id === 'auto-mode-missing-outer-model-allowed').expected.rejected = true;
859870
const autoModeRejected = validateModelResolutionScenarioNegatives(autoModeRejectedScenario);

0 commit comments

Comments
 (0)