Smithbox-ai
diff --git a/‎Orchestrator.agent.md‎
Lines changed: 5 additions & 5 deletions b/‎Orchestrator.agent.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎evals/drift-checks.mjs‎
Lines changed: 9 additions & 0 deletions b/‎evals/drift-checks.mjs‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎evals/scenarios/orchestrator-model-resolution.json‎
Lines changed: 13 additions & 6 deletions b/‎evals/scenarios/orchestrator-model-resolution.json‎
Lines changed: 13 additions & 6 deletions
diff --git a/‎evals/tests/drift-detection.test.mjs‎
Lines changed: 12 additions & 1 deletion b/‎evals/tests/drift-detection.test.mjs‎
Lines changed: 12 additions & 1 deletion
@@ -79,7 +79,7 @@ Do NOT use `vscode/askQuestions` for questions answerable from codebase evidence
 - Use `revision_mode: initial_create` when no active plan exists.
 - Use `revision_mode: in_place_update` for ordinary PLAN_REVIEW fixes to an active draft/current plan. The payload-selected path is `active_plan_path`, and Planner must return the same `plan_path`.
 - Use `revision_mode: new_artifact_supersession` only for accepted-baseline replacement, user-requested new artifacts, material invalidation, or independent citation needs. The payload-selected path is `existing_plan_path`, and the new Planner output should set `revision_of` to that prior path.
-- Apply the Universal Model Resolution Rule before every Planner dispatch. For replan/update dispatches, the outer `agent/runSubagent` call must include the resolved outer `model`, and the Planner payload must include payload-level `model`, `trace_id`, review-loop `iteration_index`, `revision_mode`, `revision_reason`, and exactly the selected path field for the mode: `active_plan_path` for `in_place_update` or `existing_plan_path` for `new_artifact_supersession`.
+- Apply the Universal Model Resolution Rule before every Planner dispatch. For replan/update dispatches, deterministic mode must include the resolved outer `model`, while auto mode intentionally omits outer `model`; the Planner payload must include `runtime_model_mode`, payload-level `model` when deterministic mode requires it, `trace_id`, review-loop `iteration_index`, `revision_mode`, `revision_reason`, and exactly the selected path field for the mode: `active_plan_path` for `in_place_update` or `existing_plan_path` for `new_artifact_supersession`.
 - Serialize write-capable Planner revisions by `(trace_id, active_plan_path)`. Never run two write-capable Planner updates to the same plan in parallel; parallel review agents may read the same `plan_path` but must not edit it.
 - Phase 3 structural validation is not behavior-complete. `cd evals && npm run test:structural` confirms schema structure and legacy compatibility only; Phase 4 owns conditional enforcement behavior tests and scenario migration for `revision_mode`, selected path fields, `trace_id`, and `iteration_index`.
 
@@ -178,7 +178,7 @@ This rule covers all dispatch paths without exception: Plan Review Gate reviewer
 
 ### Dispatch Tool-Call Contract (Required Fields)
 
-Every `agent/runSubagent` call must include these outer tool-call fields:
+Every `agent/runSubagent` call must follow these outer tool-call envelope rules:
 - **`agentName`** — the verified target-agent field (string). Placing the agent name only inside prompt prose or a delegation payload is non-compliant.
 - **`model`** — mode-conditional outer runtime selector from the Universal Model Resolution Rule. In deterministic mode (opt-in for pinned dispatch), pass the resolved primary as the outer `model` field and never omit it. In auto mode (the default), omit the outer `model` field intentionally so Copilot selects the model automatically.
 - **Prompt/context payload** — scope, deliverables, and relevant context references.
@@ -265,7 +265,7 @@ For `CodeReviewer-subagent`, `PlanAuditor-subagent`, and `AssumptionVerifier-sub
    - If a legacy phase omits `executor_agent`, do not infer silently. Route the plan back through `REPLAN` to Planner and stop the implementation batch until the phase is reissued with an explicit executor.
    - Build a `phase_task_card` for executor payloads when the phase has `phase_task_card_path`, the plan uses `resource_profile: small_local`, or `governance/runtime-policy.json` `resource_profiles.small_local.require_phase_task_card` applies. The card must include objective, allowed files, forbidden areas, context artifacts, validation commands, acceptance checks, max changed files, and escalation rule.
    - When `phase_task_card` budgets are exceeded, do not widen the phase silently. Route to Planner with `needs_replan` or stop with `NEEDS_INPUT` according to the card's escalation rule.
-   - **Model Resolution:** Apply the Universal Model Resolution Rule (see Execution Protocol preamble above) before delegating execution: look up `phase.executor_agent` in `agent_role_index`, resolve `roles[role].by_tier[complexity_tier]`, and pass the resolved primary model as the `model` parameter. If the tier entry is `{ "inherit_from": "default" }`, use the role's default `primary`. Only pass a fallback list if `agent/runSubagent` explicitly supports one.
+   - **Model Resolution:** Apply the Universal Model Resolution Rule (see Execution Protocol preamble above) before delegating execution: look up `phase.executor_agent` in `agent_role_index`, resolve `roles[role].by_tier[complexity_tier]`, and derive the primary model from the tier entry or role default. In deterministic mode, pass that resolved primary as the outer `model` parameter. In auto mode, omit the outer `model` parameter intentionally so platform auto-selection can choose the runtime model. Only pass a fallback list if `agent/runSubagent` explicitly supports one.
    - Delegate execution to the declared executor agent.
    - Verification Build Gate: after the implementation subagent reports completion, verify build success. Either confirm the execution report includes `build.state: PASS`, or if build evidence is absent or ambiguous, run the project's build command directly. If the build fails, route through Failure Classification Handling before proceeding.
    - Delegate to CodeReviewer-subagent for phase code review (apply Universal Model Resolution Rule). Code review is mandatory for all complexity tiers — see `governance/runtime-policy.json → review_pipeline_by_tier.code_review`. Pass the changed files list, phase scope, and executor agent execution report.
@@ -341,7 +341,7 @@ When a subagent returns a `failure_classification`, Orchestrator routes automati
 | `escalate` | STOP — transition to `WAITING_APPROVAL`, present to user | 0 |
 | `model_unavailable` | Retry the same agent up to `retry_budgets.model_unavailable_max` times; on exhaustion, escalate to user via `WAITING_APPROVAL` | retry_budgets.model_unavailable_max |
 
-If retry limit is exhausted, escalate to user with accumulated failure evidence. For all dispatch actions in this table (retry or replan), apply the Universal Model Resolution Rule to resolve the `model` parameter — including needs_replan Planner dispatch. A `needs_replan` Planner dispatch that updates an active plan must follow Planner Revision Modes: include outer `model`, payload-level `model`, `trace_id`, review-loop `iteration_index`, `revision_mode`, `revision_reason`, and either `active_plan_path` for `in_place_update` or `existing_plan_path` for `new_artifact_supersession`.
+If retry limit is exhausted, escalate to user with accumulated failure evidence. For all dispatch actions in this table (retry or replan), apply the Universal Model Resolution Rule — including needs_replan Planner dispatch. A `needs_replan` Planner dispatch that updates an active plan must follow Planner Revision Modes: include outer `model` only in deterministic mode, omit outer `model` in auto mode, include payload-level `model` when deterministic mode requires it, and include `runtime_model_mode`, `trace_id`, review-loop `iteration_index`, `revision_mode`, `revision_reason`, and either `active_plan_path` for `in_place_update` or `existing_plan_path` for `new_artifact_supersession`.
 
 ### Diagnosis Packet (MEDIUM/LARGE — Fixable Retries)
 
@@ -410,4 +410,4 @@ Use `plans/templates/plan-document-template.md` for full authoring rules. Inline
 - No batching of todo completions across phases. Each completion is a separate `#todos` call, made at the moment of phase verification — not aggregated for later flushing.
 - No phase work may resume after a context compaction or session restart without first reconciling the `#todos` state against actual plan-artifact reality.
 - If uncertain and cannot verify safely: `ABSTAIN`.
-- No `agent/runSubagent` dispatch may omit the `model` parameter. Every dispatch must apply the Universal Model Resolution Rule from Execution Protocol.
+- No deterministic-mode `agent/runSubagent` dispatch may omit the outer `model` parameter. Auto-mode dispatches intentionally omit outer `model` and must carry `runtime_model_mode: auto` in the payload. Every dispatch must apply the Universal Model Resolution Rule from Execution Protocol.
@@ -304,6 +304,9 @@ export function validateModelResolutionScenarioNegatives(scenario) {
 
   const missingOuterModel = byId.get('missing-outer-model');
   if (missingOuterModel) {
+    if (missingOuterModel?.input_context?.runtime_model_mode !== 'deterministic') {
+      errors.push('missing-outer-model: input runtime_model_mode must be deterministic');
+    }
     if (missingOuterModel?.broken_dispatch?.outer_fields?.agentName_present !== true) {
       errors.push('missing-outer-model: outer agentName must be present so the failure is isolated to model');
     }
@@ -317,6 +320,9 @@ export function validateModelResolutionScenarioNegatives(scenario) {
 
   const payloadOnlyModel = byId.get('payload-only-model');
   if (payloadOnlyModel) {
+    if (payloadOnlyModel?.input_context?.runtime_model_mode !== 'deterministic') {
+      errors.push('payload-only-model: input runtime_model_mode must be deterministic');
+    }
     if (payloadOnlyModel?.broken_dispatch?.outer_fields?.model_present !== false) {
       errors.push('payload-only-model: outer model must be absent');
     }
@@ -391,6 +397,9 @@ export function validateModelResolutionScenarioNegatives(scenario) {
 
   const omittedDueMissingTier = byId.get('omitted-model-due-missing-tier-context');
   if (omittedDueMissingTier) {
+    if (omittedDueMissingTier?.input_context?.runtime_model_mode !== 'deterministic') {
+      errors.push('omitted-model-due-missing-tier-context: input runtime_model_mode must be deterministic');
+    }
     if (omittedDueMissingTier?.input_context?.complexity_tier_present !== false) {
       errors.push('omitted-model-due-missing-tier-context: complexity_tier_present must be false');
     }
 
@@ -249,7 +249,10 @@
       },
       {
         "case_id": "missing-outer-model",
-        "description": "Broken dispatch supplies the outer agentName but omits the outer model field entirely.",
+        "description": "Broken deterministic-mode dispatch supplies the outer agentName but omits the outer model field entirely.",
+        "input_context": {
+          "runtime_model_mode": "deterministic"
+        },
         "broken_dispatch": {
           "outer_fields": {
             "agentName_present": true,
@@ -263,14 +266,17 @@
         "expected": {
           "rejected": true,
           "violates": "missing_outer_model",
-          "reason": "Every internal dispatch must pass the governance-resolved primary model as the outer model parameter.",
+          "reason": "Deterministic-mode internal dispatch must pass the governance-resolved primary model as the outer model parameter.",
           "offline_detection_scope": "structural_contract",
           "live_runtime_assertion": false
         }
       },
       {
         "case_id": "payload-only-model",
-        "description": "Broken dispatch carries a payload-level model for audit context but omits the outer model runtime selector.",
+        "description": "Broken deterministic-mode dispatch carries a payload-level model for audit context but omits the outer model runtime selector.",
+        "input_context": {
+          "runtime_model_mode": "deterministic"
+        },
         "broken_dispatch": {
           "outer_fields": {
             "agentName_present": true,
@@ -368,10 +374,11 @@
       },
       {
         "case_id": "omitted-model-due-missing-tier-context",
-        "description": "Broken dispatch omits model because no complexity_tier is available yet instead of using the target role top-level primary.",
+        "description": "Broken deterministic-mode dispatch omits model because no complexity_tier is available yet instead of using the target role top-level primary.",
         "input_context": {
           "target_agent": "Planner",
           "role": "capable-planner",
+          "runtime_model_mode": "deterministic",
           "complexity_tier_present": false,
           "dispatch_family": "initial_planner_dispatch"
         },
@@ -387,7 +394,7 @@
           "violates": "omitted_model_missing_tier_context",
           "resolution_when_tier_missing": "top_level_primary",
           "resolved_primary_model": "GPT-5.5 (copilot)",
-          "reason": "Missing tier context changes the resolution source, not the requirement to pass an outer model field.",
+          "reason": "In deterministic mode, missing tier context changes the resolution source, not the requirement to pass an outer model field.",
           "offline_detection_scope": "structural_contract",
           "live_runtime_assertion": false
         }
@@ -402,4 +409,4 @@
     "reference_cases_documented": 10,
     "negative_cases_documented": 7
   }
-}
+}
@@ -783,6 +783,7 @@ console.log('\n=== Check #6c — model resolution scenario negative cases ===');
         },
         {
           case_id: 'missing-outer-model',
+          input_context: { runtime_model_mode: 'deterministic' },
           broken_dispatch: {
             outer_fields: { agentName_present: true, model_present: false },
             payload_fields: { model_present: false },
@@ -791,6 +792,7 @@ console.log('\n=== Check #6c — model resolution scenario negative cases ===');
         },
         {
           case_id: 'payload-only-model',
+          input_context: { runtime_model_mode: 'deterministic' },
           broken_dispatch: {
             outer_fields: { agentName_present: true, model_present: false },
             payload_fields: { model_present: true },
@@ -820,7 +822,7 @@ console.log('\n=== Check #6c — model resolution scenario negative cases ===');
         },
         {
           case_id: 'omitted-model-due-missing-tier-context',
-          input_context: { complexity_tier_present: false },
+          input_context: { runtime_model_mode: 'deterministic', complexity_tier_present: false },
           broken_dispatch: { outer_fields: { agentName_present: true, model_present: false } },
           expected: { rejected: true, violates: 'omitted_model_missing_tier_context', resolution_when_tier_missing: 'top_level_primary', resolved_primary_model: 'GPT-5.5 (copilot)', offline_detection_scope: 'structural_contract', live_runtime_assertion: false },
         },
@@ -854,6 +856,15 @@ console.log('\n=== Check #6c — model resolution scenario negative cases ===');
     `ok=${payloadOnlyConflated.ok}, errors=${JSON.stringify(payloadOnlyConflated.errors)}`
   );
 
+  const missingDeterministicModeScenario = JSON.parse(JSON.stringify(validScenario));
+  delete missingDeterministicModeScenario.input.negative_cases.find(c => c.case_id === 'missing-outer-model').input_context.runtime_model_mode;
+  const missingDeterministicMode = validateModelResolutionScenarioNegatives(missingDeterministicModeScenario);
+  check(
+    'negative: deterministic missing-outer-model case without deterministic marker is flagged',
+    missingDeterministicMode.ok === false && missingDeterministicMode.errors.some(e => e.includes('missing-outer-model')),
+    `ok=${missingDeterministicMode.ok}, errors=${JSON.stringify(missingDeterministicMode.errors)}`
+  );
+
   const autoModeRejectedScenario = JSON.parse(JSON.stringify(validScenario));
   autoModeRejectedScenario.input.negative_cases.find(c => c.case_id === 'auto-mode-missing-outer-model-allowed').expected.rejected = true;
   const autoModeRejected = validateModelResolutionScenarioNegatives(autoModeRejectedScenario);