Skip to content

feat(skills): orchestrator tier 2 β€” multi-model routing, DAG, domain memory, follow-up Q&A#6

Closed
vanterx wants to merge 4 commits into
feat/perf-review-orchestrator-v4-tier1from
feat/perf-review-orchestrator-v4-tier2
Closed

feat(skills): orchestrator tier 2 β€” multi-model routing, DAG, domain memory, follow-up Q&A#6
vanterx wants to merge 4 commits into
feat/perf-review-orchestrator-v4-tier1from
feat/perf-review-orchestrator-v4-tier2

Conversation

@vanterx
Copy link
Copy Markdown
Owner

@vanterx vanterx commented May 17, 2026

Summary

Stacked on tier 1 (PR #5). Adds four cost-and-intelligence primitives without changing the trust model (strictly offline preserved).

Primitive What it does
V5 Multi-model cost routing Haiku for triage/classification/Q&A, Sonnet for synthesis and deep dive, Opus for the adversarial pass. Cuts cost ~40% vs all-Sonnet at no quality loss on high-stakes phases. Override via `--model-tier {economy|standard|maximum}` and `--no-adversarial`.
V6 Skill-graph DAG Replaces fixed phase ordering with a dynamic dependency DAG built from artifact types and probe findings. Probes that depend on each other sequence correctly; everything else runs in parallel. Dynamic edges open during the walk based on findings (e.g., S9 sniffing in sqlplan-review opens an edge to query-store-review).
V9 Domain memory Per-instance facts (MAXDOP, cores, AG topology, partitioning, RCSI status, trace flags) loaded from `~/.mssql-perf-review/instances/.json`. Validates every recommendation against the facts: redundant recommendations rejected, environment-aware escalators applied. Orchestrator never writes facts.json silently.
V10 Follow-up Q&A After the report, the orchestrator stays in the session to answer questions ("why this index ordering?", "why was MAXDOP not recommended?"). Most follow-ups are free β€” they read from the in-context evidence chain. Five-category question taxonomy with when-to-probe rules.

Changes

New references (4):

  • `skills/mssql-performance-review/references/model-routing.md` β€” tier-to-phase mapping, override flags, cost profile
  • `skills/mssql-performance-review/references/skill-dag.md` β€” DAG construction rules, walk algorithm, dynamic edge catalogue
  • `skills/mssql-performance-review/references/domain-memory.md` β€” facts.json schema, rejection/escalation catalogue, staleness handling
  • `skills/mssql-performance-review/references/followup-qa.md` β€” question taxonomy, refusal patterns, answer format

Updated:

  • `skills/mssql-performance-review/SKILL.md` β€” primitive list grows from 4 to 8; new sections describe DAG dispatch, model routing, domain memory loading, follow-up Q&A behavior. 363 lines (under 1000-line cap).
  • `skills/mssql-performance-review/references/README.md` β€” index entries for all 4 new references.

Trust model unchanged

  • Strictly offline β€” no SQL Server contact
  • Domain memory is user-managed (read-only from orchestrator)
  • Multi-model routing affects cost only, not trust surface

Test plan

  • `bash scripts/verify-docs.sh` β€” 31 PASS, 0 WARN, 0 FAIL
  • SKILL.md within 1000-line cap (363 lines)
  • All 4 new references linked from references/README.md with load-when guidance
  • Adversarial pass remains mandatory even on `--model-tier economy` (downgrades only to Sonnet, never to Haiku)

Base branch

This PR targets `feat/perf-review-orchestrator-v4-tier1` (stacked). After tier 1 (PR #5) merges to main, retarget this PR to main and merge.

Roadmap

Tier Status
tier 1 PR #5 β€” open
tier 2 this PR
tier 3 (next PR β€” stacked on this branch)

πŸ€– Generated with Claude Code

…memory, follow-up Q&A

Builds on tier 1 (PR #5). Adds four cost-and-intelligence primitives without
changing the trust model (strictly offline).

- V5 Multi-model cost routing: Haiku for triage/classification/Q&A,
  Sonnet for synthesis and deep dive, Opus for the adversarial pass.
  Cuts cost ~40% vs all-Sonnet at no quality loss on high-stakes phases.
  Override via --model-tier {economy|standard|maximum} and --no-adversarial.

- V6 Skill-graph DAG: replaces fixed phase ordering with a dynamic
  dependency DAG built from artifact types and probe findings. Probes that
  depend on each other sequence correctly; everything else runs in parallel.
  Dynamic edges open during the walk based on findings (e.g., S9 sniffing in
  sqlplan-review opens an edge to query-store-review for plan-instability).
  --phases flag reverts to tier-1 fixed-phase behavior.

- V9 Domain memory: per-instance facts (MAXDOP, cores, AG topology,
  partitioning, RCSI status, trace flags, user_notes) loaded from a
  user-managed JSON at ~/.mssql-perf-review/instances/<server>.json.
  Validates every recommendation against the facts: redundant recommendations
  rejected, environment-aware escalators applied (partition alignment, AG
  primary, Standard edition LOB rebuild). Stale facts (>90 days) trigger a
  downgrade warning. Orchestrator never writes facts.json silently.

- V10 Follow-up Q&A: after the report, the orchestrator stays in the session
  to answer questions ("why this index ordering?", "why was MAXDOP not
  recommended?"). Most follow-ups are free β€” they read from the in-context
  evidence chain. Five-category question taxonomy, when-to-probe rules,
  refusal patterns, and structured answer format with evidence citation.

Added:
- skills/mssql-performance-review/references/model-routing.md
- skills/mssql-performance-review/references/skill-dag.md
- skills/mssql-performance-review/references/domain-memory.md
- skills/mssql-performance-review/references/followup-qa.md
- references/README.md updated to list and explain the 4 new references
- SKILL.md: primitive list grows from 4 to 8; new sections describe DAG
  dispatch, model routing, domain memory loading, follow-up Q&A behavior
- SKILL.md still well under the 1000-line cap (363 lines)

verify-docs.sh: 31 PASS, 0 WARN, 0 FAIL
Copy link
Copy Markdown
Owner Author

@vanterx vanterx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec contradiction: adversarial pass minimum model tier

In references/model-routing.md, three statements disagree on whether the adversarial pass can use Sonnet:

  • Line 80 (Override rules): "Adversarial pass is always at least Sonnet..."
  • Line 106 (Safeguard 2): "...the adversarial pass (always Opus or higher)..."
  • Line 108 (Safeguard 3): "Adversarial pass cannot be downgraded. Even on --model-tier economy, the adversarial check runs on Opus."

Safeguards 2 and 3 agree (Opus minimum). The override rules line permits Sonnet. SKILL.md line 126 also agrees with the safeguards: "Opus 4.7 (cannot be downgraded even on --model-tier economy β€” quality-critical)".

Severity: Medium β€” if the LLM follows the override rules section rather than the safeguards section, it will use Sonnet for the adversarial pass on economy tier, breaking the stated quality guarantee.

Fix: Change line 80 from "at least Sonnet" to "at least Opus" to match the rest of the specification.

Resolves PR #6 review comment. Line 80 of model-routing.md said
'Adversarial pass is always at least Sonnet ... Opus is the default',
contradicting:
- Line 106: 'adversarial pass (always Opus or higher)'
- Line 108: 'Adversarial pass cannot be downgraded. Even on
  --model-tier economy, the adversarial check runs on Opus.'
- SKILL.md line 126: 'Opus 4.7 (cannot be downgraded even on
  --model-tier economy β€” quality-critical)'

The contradiction risked an LLM following the override-rules section
using Sonnet for the adversarial pass on economy tier, breaking the
stated quality guarantee. Rewrote line 80 to align with the other
safeguards: 'Adversarial pass is always Opus ... cannot be downgraded
by any tier flag.'

verify-docs.sh: 31 PASS.
@vanterx
Copy link
Copy Markdown
Owner Author

vanterx commented May 18, 2026

Fixed in commit afa8241. Line 80 of model-routing.md now says "Adversarial pass is always Opus (Haiku and Sonnet miss counterfactuals reliably); cannot be downgraded by any tier flag β€” this is the most important quality guarantee." Consistent with lines 106, 108, and SKILL.md line 126.

Copy link
Copy Markdown
Owner Author

@vanterx vanterx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale forward-reference to domain memory

references/risk-rubric.md line 57:

Domain memory escalators are checked once tier-2 introduces facts.json (see backlog plan v4). In tier 1, assume defaults from this table apply.

This PR introduces domain-memory.md in the same references/ directory, making this line stale. It describes domain memory as a future tier-2 feature when the file it references is now present. The risk-rubric should mention references/domain-memory.md by name (similar to how SKILL.md loads domain-memory.md when a facts file is present).

Severity: Low β€” the LLM loading risk-rubric.md will still apply the escalator table (line 55 already covers domain memory); this is a forward-reference that's now incorrect since the referenced feature has landed.

Fix: Replace the sentence with something like "Domain memory escalators are checked against references/domain-memory.md when a facts file is loaded for the target instance."

Resolves PR #6 review comment. Line 57 of risk-rubric.md previously said:

  'Domain memory escalators are checked once tier-2 introduces facts.json
   (see backlog plan v4). In tier 1, assume defaults from this table apply.'

This forward-reference became stale when tier 2 (this PR) introduced
references/domain-memory.md. Replaced with a direct reference to the
now-present file and the canonical facts.json path:

  'Domain memory escalators are checked against references/domain-memory.md
   when a facts file is loaded for the target instance
   (~/.mssql-perf-review/instances/<server>.json). When no facts file is
   present, the defaults in this table apply.'

verify-docs.sh: 31 PASS.
@vanterx
Copy link
Copy Markdown
Owner Author

vanterx commented May 18, 2026

Fixed in commit b891d6f. risk-rubric.md line 57 now reads:

Domain memory escalators are checked against references/domain-memory.md when a facts file is loaded for the target instance (~/.mssql-perf-review/instances/<server>.json). When no facts file is present, the defaults in this table apply.

This matches how SKILL.md loads the file (now-present, no longer a forward-reference) and gives the canonical facts.json path.

Copy link
Copy Markdown
Owner Author

@vanterx vanterx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tautological model routing description

model-routing.md line 53:

| User reports "previous review missed the obvious problem" | maximum (escalates the adversarial pass to Opus from Opus, plus all reasoning to Opus) |

The adversarial pass already runs on Opus in --model-tier standard (line 14). The parenthetical "to Opus from Opus" is a tautology β€” it describes no actual escalation. The intent of --model-tier maximum is to escalate synthesis and deep-dive phases to Opus (correctly described on line 24 as "Sonnet for triage; Opus for all reasoning phases").

Severity: Low β€” documentation, no runtime impact.

Fix: Rewrite the parenthetical to describe what maximum tier adds vs standard (e.g., "adversarial already runs Opus; also escalates synthesis and deep dive to Opus") or simply omit the redundant phrase.

Copy link
Copy Markdown
Owner Author

@vanterx vanterx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bogus flag syntax in follow-up Q&A reference

followup-qa.md line 152:

`--model-tier maximum --no-adversarial=false`

--no-adversarial is defined as a boolean flag (no value) throughout model-routing.md (line 25: "Skip the Opus adversarial pass"). --no-adversarial=false is not valid syntax for a flag β€” this looks like a --flag=<value> pattern but --no-adversarial was never defined to take a value. The adversarial pass is on by default; re-dispatching a follow-up probe without --no-adversarial would already run it.

Severity: Low β€” if an LLM tries to parse this literally and passes --no-adversarial=false as a single flag token, it may not match any defined flag. In practice, the Q&A instructions say "dispatch a targeted adversarial probe" β€” so the flag isn't actually needed for behavior.

Fix: Change to just --model-tier maximum (since --no-adversarial was never passed and adversarial is on by default).

1. Tautological 'to Opus from Opus' in model-routing.md line 53.
   The user-situation table row for 'previous review missed the obvious
   problem' described --model-tier maximum as 'escalates the adversarial
   pass to Opus from Opus'. Since adversarial already runs Opus on
   --model-tier standard (line 14), the parenthetical was tautological.
   Rewrote to describe what maximum actually adds: synthesis and
   deep-dive escalate to Opus (adversarial unchanged).

2. Bogus flag syntax in followup-qa.md line 152.
   The follow-up Q&A reference described a dispatch as
   '--model-tier maximum --no-adversarial=false'. --no-adversarial is
   a boolean flag (model-routing.md line 25), not a --flag=<value>
   token; '--no-adversarial=false' is not valid syntax. Since
   adversarial is on by default, --no-adversarial isn't needed at all
   for an adversarial-emphasising follow-up. Simplified to just
   '--model-tier maximum' with a parenthetical noting adversarial is
   the default.

verify-docs.sh: 31 PASS.
@vanterx
Copy link
Copy Markdown
Owner Author

vanterx commented May 18, 2026

Both nits fixed in commit 62badd8:

  1. Tautological description β€” model-routing.md line 53 now reads: "maximum (adversarial already runs Opus on standard; this also escalates synthesis and deep dive to Opus)" β€” describes what maximum actually adds instead of the no-op "to Opus from Opus".

  2. Bogus flag syntax β€” followup-qa.md line 152 simplified to just --model-tier maximum (with a parenthetical noting adversarial is on by default). Dropped the invalid --no-adversarial=false token.

Tier 2 merged forward into tier 3 (db2947e) so PR #7 stays consistent.

Copy link
Copy Markdown
Owner Author

@vanterx vanterx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-by-one error in skills-skipped count

skill-dag.md line 148 in Example 2 states "10 skills skipped. 4 ran." but the count is wrong. There are 15 sub-skills total. The explicit "Skills NOT run" list covers 9 skills. Two additional skills β€” sqlstats-review and sqlplan-batch β€” are also not run in this scenario but are omitted from the listing. The correct count is 11 skills skipped, 4 ran.

Severity: Low β€” documentation inconsistency; doesn't affect runtime behavior.

@vanterx vanterx deleted the branch feat/perf-review-orchestrator-v4-tier1 May 18, 2026 08:53
@vanterx vanterx closed this May 18, 2026
@vanterx vanterx deleted the feat/perf-review-orchestrator-v4-tier2 branch May 18, 2026 08:57
vanterx added a commit that referenced this pull request May 18, 2026
… add docs/ gitignore entry

references/README.md described model-routing.md as "adversarial always runs on
at least Sonnet" β€” inconsistent with the fix landed in model-routing.md itself
(PR #6 nit). Updated to match: "always runs on Opus and cannot be downgraded".

.gitignore: add docs/ entry (was present locally, never committed).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant