feat(skills): orchestrator tier 2 — multi-model routing, DAG, domain memory, follow-up Q&A by vanterx · Pull Request #6 · vanterx/mssql-performance-skills

vanterx · 2026-05-17T11:52:17Z

Summary

Stacked on tier 1 (PR #5). Adds four cost-and-intelligence primitives without changing the trust model (strictly offline preserved).

Primitive	What it does
V5 Multi-model cost routing	Haiku for triage/classification/Q&A, Sonnet for synthesis and deep dive, Opus for the adversarial pass. Cuts cost ~40% vs all-Sonnet at no quality loss on high-stakes phases. Override via `--model-tier {economy\|standard\|maximum}` and `--no-adversarial`.
V6 Skill-graph DAG	Replaces fixed phase ordering with a dynamic dependency DAG built from artifact types and probe findings. Probes that depend on each other sequence correctly; everything else runs in parallel. Dynamic edges open during the walk based on findings (e.g., S9 sniffing in sqlplan-review opens an edge to query-store-review).
V9 Domain memory	Per-instance facts (MAXDOP, cores, AG topology, partitioning, RCSI status, trace flags) loaded from `~/.mssql-perf-review/instances/.json`. Validates every recommendation against the facts: redundant recommendations rejected, environment-aware escalators applied. Orchestrator never writes facts.json silently.
V10 Follow-up Q&A	After the report, the orchestrator stays in the session to answer questions ("why this index ordering?", "why was MAXDOP not recommended?"). Most follow-ups are free — they read from the in-context evidence chain. Five-category question taxonomy with when-to-probe rules.

Changes

New references (4):

`skills/mssql-performance-review/references/model-routing.md` — tier-to-phase mapping, override flags, cost profile
`skills/mssql-performance-review/references/skill-dag.md` — DAG construction rules, walk algorithm, dynamic edge catalogue
`skills/mssql-performance-review/references/domain-memory.md` — facts.json schema, rejection/escalation catalogue, staleness handling
`skills/mssql-performance-review/references/followup-qa.md` — question taxonomy, refusal patterns, answer format

Updated:

`skills/mssql-performance-review/SKILL.md` — primitive list grows from 4 to 8; new sections describe DAG dispatch, model routing, domain memory loading, follow-up Q&A behavior. 363 lines (under 1000-line cap).
`skills/mssql-performance-review/references/README.md` — index entries for all 4 new references.

Trust model unchanged

Strictly offline — no SQL Server contact
Domain memory is user-managed (read-only from orchestrator)
Multi-model routing affects cost only, not trust surface

Test plan

`bash scripts/verify-docs.sh` — 31 PASS, 0 WARN, 0 FAIL
SKILL.md within 1000-line cap (363 lines)
All 4 new references linked from references/README.md with load-when guidance
Adversarial pass remains mandatory even on `--model-tier economy` (downgrades only to Sonnet, never to Haiku)

Base branch

This PR targets `feat/perf-review-orchestrator-v4-tier1` (stacked). After tier 1 (PR #5) merges to main, retarget this PR to main and merge.

Roadmap

Tier	Status
tier 1	PR #5 — open
tier 2	this PR
tier 3	(next PR — stacked on this branch)

🤖 Generated with Claude Code

…memory, follow-up Q&A Builds on tier 1 (PR #5). Adds four cost-and-intelligence primitives without changing the trust model (strictly offline). - V5 Multi-model cost routing: Haiku for triage/classification/Q&A, Sonnet for synthesis and deep dive, Opus for the adversarial pass. Cuts cost ~40% vs all-Sonnet at no quality loss on high-stakes phases. Override via --model-tier {economy|standard|maximum} and --no-adversarial. - V6 Skill-graph DAG: replaces fixed phase ordering with a dynamic dependency DAG built from artifact types and probe findings. Probes that depend on each other sequence correctly; everything else runs in parallel. Dynamic edges open during the walk based on findings (e.g., S9 sniffing in sqlplan-review opens an edge to query-store-review for plan-instability). --phases flag reverts to tier-1 fixed-phase behavior. - V9 Domain memory: per-instance facts (MAXDOP, cores, AG topology, partitioning, RCSI status, trace flags, user_notes) loaded from a user-managed JSON at ~/.mssql-perf-review/instances/<server>.json. Validates every recommendation against the facts: redundant recommendations rejected, environment-aware escalators applied (partition alignment, AG primary, Standard edition LOB rebuild). Stale facts (>90 days) trigger a downgrade warning. Orchestrator never writes facts.json silently. - V10 Follow-up Q&A: after the report, the orchestrator stays in the session to answer questions ("why this index ordering?", "why was MAXDOP not recommended?"). Most follow-ups are free — they read from the in-context evidence chain. Five-category question taxonomy, when-to-probe rules, refusal patterns, and structured answer format with evidence citation. Added: - skills/mssql-performance-review/references/model-routing.md - skills/mssql-performance-review/references/skill-dag.md - skills/mssql-performance-review/references/domain-memory.md - skills/mssql-performance-review/references/followup-qa.md - references/README.md updated to list and explain the 4 new references - SKILL.md: primitive list grows from 4 to 8; new sections describe DAG dispatch, model routing, domain memory loading, follow-up Q&A behavior - SKILL.md still well under the 1000-line cap (363 lines) verify-docs.sh: 31 PASS, 0 WARN, 0 FAIL

vanterx

Spec contradiction: adversarial pass minimum model tier

In references/model-routing.md, three statements disagree on whether the adversarial pass can use Sonnet:

Line 80 (Override rules): "Adversarial pass is always at least Sonnet..."
Line 106 (Safeguard 2): "...the adversarial pass (always Opus or higher)..."
Line 108 (Safeguard 3): "Adversarial pass cannot be downgraded. Even on --model-tier economy, the adversarial check runs on Opus."

Safeguards 2 and 3 agree (Opus minimum). The override rules line permits Sonnet. SKILL.md line 126 also agrees with the safeguards: "Opus 4.7 (cannot be downgraded even on --model-tier economy — quality-critical)".

Severity: Medium — if the LLM follows the override rules section rather than the safeguards section, it will use Sonnet for the adversarial pass on economy tier, breaking the stated quality guarantee.

Fix: Change line 80 from "at least Sonnet" to "at least Opus" to match the rest of the specification.

Resolves PR #6 review comment. Line 80 of model-routing.md said 'Adversarial pass is always at least Sonnet ... Opus is the default', contradicting: - Line 106: 'adversarial pass (always Opus or higher)' - Line 108: 'Adversarial pass cannot be downgraded. Even on --model-tier economy, the adversarial check runs on Opus.' - SKILL.md line 126: 'Opus 4.7 (cannot be downgraded even on --model-tier economy — quality-critical)' The contradiction risked an LLM following the override-rules section using Sonnet for the adversarial pass on economy tier, breaking the stated quality guarantee. Rewrote line 80 to align with the other safeguards: 'Adversarial pass is always Opus ... cannot be downgraded by any tier flag.' verify-docs.sh: 31 PASS.

vanterx · 2026-05-18T06:53:26Z

Fixed in commit afa8241. Line 80 of model-routing.md now says "Adversarial pass is always Opus (Haiku and Sonnet miss counterfactuals reliably); cannot be downgraded by any tier flag — this is the most important quality guarantee." Consistent with lines 106, 108, and SKILL.md line 126.

vanterx

Stale forward-reference to domain memory

references/risk-rubric.md line 57:

Domain memory escalators are checked once tier-2 introduces facts.json (see backlog plan v4). In tier 1, assume defaults from this table apply.

This PR introduces domain-memory.md in the same references/ directory, making this line stale. It describes domain memory as a future tier-2 feature when the file it references is now present. The risk-rubric should mention references/domain-memory.md by name (similar to how SKILL.md loads domain-memory.md when a facts file is present).

Severity: Low — the LLM loading risk-rubric.md will still apply the escalator table (line 55 already covers domain memory); this is a forward-reference that's now incorrect since the referenced feature has landed.

Fix: Replace the sentence with something like "Domain memory escalators are checked against references/domain-memory.md when a facts file is loaded for the target instance."

Resolves PR #6 review comment. Line 57 of risk-rubric.md previously said: 'Domain memory escalators are checked once tier-2 introduces facts.json (see backlog plan v4). In tier 1, assume defaults from this table apply.' This forward-reference became stale when tier 2 (this PR) introduced references/domain-memory.md. Replaced with a direct reference to the now-present file and the canonical facts.json path: 'Domain memory escalators are checked against references/domain-memory.md when a facts file is loaded for the target instance (~/.mssql-perf-review/instances/<server>.json). When no facts file is present, the defaults in this table apply.' verify-docs.sh: 31 PASS.

vanterx · 2026-05-18T07:12:48Z

Fixed in commit b891d6f. risk-rubric.md line 57 now reads:

Domain memory escalators are checked against references/domain-memory.md when a facts file is loaded for the target instance (~/.mssql-perf-review/instances/<server>.json). When no facts file is present, the defaults in this table apply.

This matches how SKILL.md loads the file (now-present, no longer a forward-reference) and gives the canonical facts.json path.

vanterx

Tautological model routing description

model-routing.md line 53:

| User reports "previous review missed the obvious problem" | maximum (escalates the adversarial pass to Opus from Opus, plus all reasoning to Opus) |

The adversarial pass already runs on Opus in --model-tier standard (line 14). The parenthetical "to Opus from Opus" is a tautology — it describes no actual escalation. The intent of --model-tier maximum is to escalate synthesis and deep-dive phases to Opus (correctly described on line 24 as "Sonnet for triage; Opus for all reasoning phases").

Severity: Low — documentation, no runtime impact.

Fix: Rewrite the parenthetical to describe what maximum tier adds vs standard (e.g., "adversarial already runs Opus; also escalates synthesis and deep dive to Opus") or simply omit the redundant phrase.

vanterx

Bogus flag syntax in follow-up Q&A reference

followup-qa.md line 152:

`--model-tier maximum --no-adversarial=false`

--no-adversarial is defined as a boolean flag (no value) throughout model-routing.md (line 25: "Skip the Opus adversarial pass"). --no-adversarial=false is not valid syntax for a flag — this looks like a --flag=<value> pattern but --no-adversarial was never defined to take a value. The adversarial pass is on by default; re-dispatching a follow-up probe without --no-adversarial would already run it.

Severity: Low — if an LLM tries to parse this literally and passes --no-adversarial=false as a single flag token, it may not match any defined flag. In practice, the Q&A instructions say "dispatch a targeted adversarial probe" — so the flag isn't actually needed for behavior.

Fix: Change to just --model-tier maximum (since --no-adversarial was never passed and adversarial is on by default).

1. Tautological 'to Opus from Opus' in model-routing.md line 53. The user-situation table row for 'previous review missed the obvious problem' described --model-tier maximum as 'escalates the adversarial pass to Opus from Opus'. Since adversarial already runs Opus on --model-tier standard (line 14), the parenthetical was tautological. Rewrote to describe what maximum actually adds: synthesis and deep-dive escalate to Opus (adversarial unchanged). 2. Bogus flag syntax in followup-qa.md line 152. The follow-up Q&A reference described a dispatch as '--model-tier maximum --no-adversarial=false'. --no-adversarial is a boolean flag (model-routing.md line 25), not a --flag=<value> token; '--no-adversarial=false' is not valid syntax. Since adversarial is on by default, --no-adversarial isn't needed at all for an adversarial-emphasising follow-up. Simplified to just '--model-tier maximum' with a parenthetical noting adversarial is the default. verify-docs.sh: 31 PASS.

vanterx · 2026-05-18T07:32:42Z

Both nits fixed in commit 62badd8:

Tautological description — model-routing.md line 53 now reads: "maximum (adversarial already runs Opus on standard; this also escalates synthesis and deep dive to Opus)" — describes what maximum actually adds instead of the no-op "to Opus from Opus".
Bogus flag syntax — followup-qa.md line 152 simplified to just --model-tier maximum (with a parenthetical noting adversarial is on by default). Dropped the invalid --no-adversarial=false token.

Tier 2 merged forward into tier 3 (db2947e) so PR #7 stays consistent.

vanterx

Off-by-one error in skills-skipped count

skill-dag.md line 148 in Example 2 states "10 skills skipped. 4 ran." but the count is wrong. There are 15 sub-skills total. The explicit "Skills NOT run" list covers 9 skills. Two additional skills — sqlstats-review and sqlplan-batch — are also not run in this scenario but are omitted from the listing. The correct count is 11 skills skipped, 4 ran.

Severity: Low — documentation inconsistency; doesn't affect runtime behavior.

… add docs/ gitignore entry references/README.md described model-routing.md as "adversarial always runs on at least Sonnet" — inconsistent with the fix landed in model-routing.md itself (PR #6 nit). Updated to match: "always runs on Opus and cannot be downgraded". .gitignore: add docs/ entry (was present locally, never committed).

vanterx commented May 17, 2026

View reviewed changes

vanterx commented May 18, 2026

View reviewed changes

vanterx deleted the branch feat/perf-review-orchestrator-v4-tier1 May 18, 2026 08:53

vanterx closed this May 18, 2026

vanterx mentioned this pull request May 18, 2026

feat(skills): orchestrator tier 2 — multi-model routing, DAG, domain memory, follow-up Q&A #8

Merged

5 tasks

vanterx deleted the feat/perf-review-orchestrator-v4-tier2 branch May 18, 2026 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): orchestrator tier 2 — multi-model routing, DAG, domain memory, follow-up Q&A#6

feat(skills): orchestrator tier 2 — multi-model routing, DAG, domain memory, follow-up Q&A#6
vanterx wants to merge 4 commits into
feat/perf-review-orchestrator-v4-tier1from
feat/perf-review-orchestrator-v4-tier2

vanterx commented May 17, 2026

Uh oh!

vanterx left a comment

Uh oh!

vanterx commented May 18, 2026

Uh oh!

vanterx left a comment

Uh oh!

vanterx commented May 18, 2026

Uh oh!

vanterx left a comment

Uh oh!

vanterx left a comment

Uh oh!

vanterx commented May 18, 2026

Uh oh!

vanterx left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vanterx commented May 17, 2026

Summary

Changes

Trust model unchanged

Test plan

Base branch

Roadmap

Uh oh!

vanterx left a comment

Choose a reason for hiding this comment

Spec contradiction: adversarial pass minimum model tier

Uh oh!

vanterx commented May 18, 2026

Uh oh!

vanterx left a comment

Choose a reason for hiding this comment

Stale forward-reference to domain memory

Uh oh!

vanterx commented May 18, 2026

Uh oh!

vanterx left a comment

Choose a reason for hiding this comment

Tautological model routing description

Uh oh!

vanterx left a comment

Choose a reason for hiding this comment

Bogus flag syntax in follow-up Q&A reference

Uh oh!

vanterx commented May 18, 2026

Uh oh!

vanterx left a comment

Choose a reason for hiding this comment

Off-by-one error in skills-skipped count

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant