Skip to content

auto-route candidate enumeration collapses multi-tier subscription harnesses to DefaultModel only (claude → opus-4.7 always wins, sonnet never scored) #6

@easel

Description

@easel

Summary

Under policy=default (the auto-route mode used when no --harness / --provider / --model is pinned), Fizeau collapses each subprocess harness to one scoring candidate — its DefaultModel. Multi-tier harnesses like claude (which advertises sonnet, opus, haiku as siblings under one auth/subscription) therefore only ever offer their default tier to the auto-route scorer. Sonnet and Haiku can never win on cost vs Opus even though the catalog has full power/cost metadata for all three.

Concrete symptom from a downstream caller (DDx): every policy=default dispatch lands on claude/opus-4.7 because registry.go:51 sets DefaultModel = "opus-4.7". With a 2207-token implementation prompt at role=implementer, Opus won score 100.5. When the same dispatch is rerun with --model sonnet-4.6 pinned, sonnet scores 196.0 — it would have beaten opus by 95.5 points if it had been enumerated.

Evidence

Repro: a downstream ddx work dispatch with policy=default against the live catalog.

Run 1 (no model pin, policy=default): the routing_decision event includes exactly one claude candidate:

{
  "harness": "claude",
  "model": "opus-4.7",
  "score": 100.5,
  "eligible": true,
  "score_components": {
    "base": 100, "context_headroom": 30, "cost": -22.5,
    "performance": -5, "power": -2, "quota_health": 5,
    "utilization": -5
  }
}

No row for sonnet — not even eligible: false with a filter_reason. Silent drop.

Run 2 (same prompt, --model sonnet-4.6 pinned):

17:33:25 readiness route fiz/anthropic/claude-sonnet-4.6
         provider=openrouter reason=policy=default; score=196.0

Sonnet scores 196.0. So the cost-aware scorer works correctly when it can see sonnet. The pin also exposed a second bug: Fizeau routes the pinned sonnet through fiz/openrouter (the catalog's openrouter_id), not through the claude subscription harness — so pinning by model name does not produce a same-harness alternative tier even when one exists on the same auth/subscription.

Root cause (tentative — pointers, not a patch)

The auto-route enumeration at internal/harnesses/registry.go:51-52 and service_routing.go:1172-1189 (v0.12.2 module path: github.com/easel/fizeau/internal/harnesses/registry.go, service_routing.go) goes:

// service_routing.go:1172-1189 (auto-route candidate add)
for _, h := range entries {
    if h.DefaultModel != "" {
        add(h.DefaultModel, true, status)      // adds opus-4.7 for claude
    }
    for _, modelID := range h.SupportedModels {
        add(modelID, true, status)              // SHOULD add sonnet-4.6 too
    }
    ...
}

subprocessHarnessModelIDs("claude", cfg) at service_models.go:91-101 returns the full ["sonnet", "sonnet-4.6", "opus", "opus-4.7", "claude-sonnet-4-6"] set. So sonnet should be reaching the candidate pool. The evidence says it doesn't — either:

  • The downstream aggregation/dedup is keeping one model per {harness, provider} key (picking the DefaultModel as representative), or
  • The eligibility map keyed by modelID at line ~1170 is collapsing sonnet entries by family before the candidate list is emitted, or
  • h.SupportedModels is being populated empty at the call site that builds entries (different from the metadata_billing.go:71 path).

I didn't read deep enough into routing.Inputs construction to pin down which. The routing_decision evidence is unambiguous, though: no sonnet row at all in the candidates array, not even excluded. A row with eligible: false, filter_reason: ... would be diagnostic; silent absence is consistent with "never enumerated."

Catalog is not the gap

I checked the embedded manifest at internal/modelcatalog/catalog/models.yaml:

sonnet-4.6:
  family: claude-sonnet
  power: 8
  cost_input_per_m: 3.0
  cost_output_per_m: 15.0
  context_window: 1000000
  surfaces:
    agent.anthropic: sonnet-4.6
    claude-code: sonnet-4.6

Complete. Power is set (8 vs opus 10), cost is 5x cheaper than opus ($3/$15 vs $15/$75), the claude-code surface matches. This rules out the power_missing exclusion path (the one that drops openrouter/anthropic/claude-haiku-4.5 with filter_reason: "power_missing").

Why this matters

The whole point of cost-aware auto-routing is to let cheap models do cheap work and reserve expensive models for hard work. Today policy=default on a multi-tier subscription harness silently always picks the most expensive tier:

  • 2207-token implementation prompt → opus (would be sonnet if enumerated)
  • short status-check prompt → opus (would be haiku if enumerated)

The cost gap is large: opus is 5× sonnet on input tokens, ~5× on output. For a project running ddx work continuously, defaulting every dispatch to opus is materially wrong.

Repro

  1. Configure Fizeau with the claude harness (subscription path, no model pin).
  2. Send any execute request with policy=default, no --harness/--provider/--model.
  3. Inspect the routing_decision event — only claude/opus-4.7 appears under claude. No sonnet, no haiku.
  4. Resend the same prompt with --model sonnet-4.6. The route goes to fiz/openrouter/anthropic/claude-sonnet-4.6 with score 196.0, not to the claude subscription path.

Suggested fix direction (not prescriptive)

For multi-tier subscription harnesses, enumerate one candidate per tier (opus, sonnet, haiku) with that tier's catalog power/cost, all routing through the same harness. Let cost-aware scoring pick the cheapest tier that meets power_hint_fit for the prompt. Today the harness behaves like "one model, take it or leave it"; it should behave like "one auth, multiple tiers."

If aggregation-by-harness is intentional for some other reason, the alternative is to make the harness configurable to expose multiple DefaultModels per role/power-band, and have ddx pass a power hint that selects the right one.

Caller

DDx CLI v? — see https://github.com/erik-labianca/ddx (or wherever appropriate). The DDx side does not pre-resolve routing knobs (per CONTRACT-003 / FEAT-010) and passes policy=default through verbatim.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions