feat(dsql): enhance query plan explainability with type coercion detection, rewrites, and workflow extraction#162
Conversation
8e33741 to
8261713
Compare
amaksimo
left a comment
There was a problem hiding this comment.
I have a few general commets:
- We should use positive language throughout (llm can confuse DO with DO NOT when we trim context)
- We should try to use RFC language more frequently throughout
- We should break up the references in the query-plan folder as some of the files are very long
07b6baa to
6f97294
Compare
PR #162 — Review Summary
Reviewed at head SHA
Reviewer scope. This review covered the diff at the head SHA (21 files, +1131 / −36) — the new query-plan workflow extraction, type-coercion detection, 11-pattern rewrite library, and eval pair. Prior
|
…vals Correctness fixes (review items 1-5): - awslabs#1: push-computation-to-constant — use NUMERIC column 'amount' to avoid integer division non-equivalence - awslabs#2: not-in-to-not-exists — add NULL semantics warning (NOT EXISTS does not preserve NOT IN's NULL-propagation; MUST confirm with user) - awslabs#3/awslabs#4: subquery-unnesting — prefer EXISTS form (true semi-join); document uniqueness precondition for JOIN+DISTINCT alternative - awslabs#5: subquery-unnesting-scalar — add COALESCE(s_count, 0) for COUNT/SUM (LEFT JOIN returns NULL, scalar returns 0) Dangling reference fixes (review items 6-8): - awslabs#6: workflow.md trigger table — "Phase 5" → reassessment re-entry - awslabs#7: Replace all "implicit cast compatibility matrix" references with "pg_amop query in catalog-queries.md" - awslabs#8: plan-interpretation.md L202 — fix cast-vs-operator contradiction Structural fixes (review items 9-14, 24): - awslabs#9: Hedge "integer family" claim with "at time of writing" + verify - awslabs#10: amopmethod=10003 — add provenance comment and verification SQL - awslabs#11: catalog-queries.md TOC — add 3 missing sections - awslabs#12: plan-interpretation.md TOC — add Type Coercion section - awslabs#13: SKILL.md — explicitly delegate routing to workflow.md - awslabs#24: workflow.md — remove em dashes from headings for clean anchors Other fixes (review items 21-23): - awslabs#21: reltuples-estimate — add staleness warning (MUST warn user) - awslabs#22: catalog-queries — add safe_query.build() note for placeholders - awslabs#23: "Skip when" → "SHOULD skip when" in all rewrite files Eval improvements (review items 14, 16): - awslabs#14: README — add query_plan_rewrite_evals to directory tree and eval section - awslabs#16: Add evals 206-210 covering LEFT JOIN, computation push, NOT IN with NULL warning, nested UNION ALL, and negative case (OR across different columns) - awslabs#7 (eval): Update eval 201 expectation — pg_amop instead of matrix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- awslabs#17: Downgrade eval results to qualitative comparison, record model and version, note n=1 and recommend n>=3 for production confidence - awslabs#18: SKILL.md is 281 lines (will update PR body) - awslabs#20: Strengthen awsknowledge fallback to MUST — refuse fallback when recommendation depends on exact limit value - awslabs#21: Already addressed in prior commit (reltuples staleness) - awslabs#15: Document manual-only status and future Python converter direction (per anwesham-lab's suggestion for deterministic rewrites) - awslabs#19: MCP mirror PR noted as follow-up in PR body Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…vals Correctness fixes (review items 1-5): - awslabs#1: push-computation-to-constant — use NUMERIC column 'amount' to avoid integer division non-equivalence - awslabs#2: not-in-to-not-exists — add NULL semantics warning (NOT EXISTS does not preserve NOT IN's NULL-propagation; MUST confirm with user) - awslabs#3/awslabs#4: subquery-unnesting — prefer EXISTS form (true semi-join); document uniqueness precondition for JOIN+DISTINCT alternative - awslabs#5: subquery-unnesting-scalar — add COALESCE(s_count, 0) for COUNT/SUM (LEFT JOIN returns NULL, scalar returns 0) Dangling reference fixes (review items 6-8): - awslabs#6: workflow.md trigger table — "Phase 5" → reassessment re-entry - awslabs#7: Replace all "implicit cast compatibility matrix" references with "pg_amop query in catalog-queries.md" - awslabs#8: plan-interpretation.md L202 — fix cast-vs-operator contradiction Structural fixes (review items 9-14, 24): - awslabs#9: Hedge "integer family" claim with "at time of writing" + verify - awslabs#10: amopmethod=10003 — add provenance comment and verification SQL - awslabs#11: catalog-queries.md TOC — add 3 missing sections - awslabs#12: plan-interpretation.md TOC — add Type Coercion section - awslabs#13: SKILL.md — explicitly delegate routing to workflow.md - awslabs#24: workflow.md — remove em dashes from headings for clean anchors Other fixes (review items 21-23): - awslabs#21: reltuples-estimate — add staleness warning (MUST warn user) - awslabs#22: catalog-queries — add safe_query.build() note for placeholders - awslabs#23: "Skip when" → "SHOULD skip when" in all rewrite files Eval improvements (review items 14, 16): - awslabs#14: README — add query_plan_rewrite_evals to directory tree and eval section - awslabs#16: Add evals 206-210 covering LEFT JOIN, computation push, NOT IN with NULL warning, nested UNION ALL, and negative case (OR across different columns) - awslabs#7 (eval): Update eval 201 expectation — pg_amop instead of matrix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
122b2a3 to
1178334
Compare
- awslabs#17: Downgrade eval results to qualitative comparison, record model and version, note n=1 and recommend n>=3 for production confidence - awslabs#18: SKILL.md is 281 lines (will update PR body) - awslabs#20: Strengthen awsknowledge fallback to MUST — refuse fallback when recommendation depends on exact limit value - awslabs#21: Already addressed in prior commit (reltuples staleness) - awslabs#15: Document manual-only status and future Python converter direction (per anwesham-lab's suggestion for deterministic rewrites) - awslabs#19: MCP mirror PR noted as follow-up in PR body Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #162 — Multi-agent review (revised after empirical validation on a live DSQL cluster)Reviewed at head SHA
Findings dropped after re-validation (full transcripts available on request):
Empirical context (run on live DSQL cluster):
🤖 Generated with Claude Code If this code review was useful, please react with 👍. Otherwise, react with 👎. |
|
I think it's worth using our self-review skill and doing a couple of explicit passes with subagents deployed from the code-review and pr-toolkit-review plugins to get to an explicit convergence state that you can audit and post to log changes being made. |
…evals - Revert broken relative links in SKILL.md (mcp/.mcp.json, scripts/) back to correct ../../ paths - Rename blacklisted_customers → excluded_customers and blacklist → exclusion_list for inclusive language compliance - Fix stale 'implicit cast compatibility matrix' → pg_amop in eval results - Add eval results for 206–210 (LEFT JOIN, computation push, NOT IN NULL warning, nested UNION ALL, negative OR case) - Include hooks.json update
Addresses amaksimo's review comment: after rebase onto main, the Workflow 9 section lost its link to workflow.md and the rewrite index files, making all 16 new query-plan files unreachable orphans. - Add workflow.md as the entry gate in the reference table - Add query-rewrites-generic.md and query-rewrites-dsql-specific.md - Update Workflow 9 section to load workflow.md instead of listing the 4 Phase-0 files directly (workflow.md handles that routing)
…wslabs#8, awslabs#9 - awslabs#4: COALESCE rule in subquery-unnesting-scalar.md restricted to COUNT only; SUM/MAX/MIN return NULL on empty sets in both forms - awslabs#5: verify-comment in catalog-queries.md changed from amname='btree' to amname='btree_index' (DSQL uses btree_index AM) - awslabs#7: workflow.md context disambiguation removes psql fallback offer, now says no MCP means no plan capture (consistent with Safety) - awslabs#8: split-large-joins.md example expanded from 7 to 11 tables, exceeding the stated DP threshold of 10; CTEs project explicit cols - awslabs#9: plan-interpretation.md recommendation template changed ::float to ::integer (only integer-family cross-type operators registered)
Lift the substitution rule to the file preamble with per-position
guidance: identifier positions (FROM, GROUP BY) use ident(); string-
literal positions (WHERE = '{schema}', IN ('{table}')) use allow()
or regex(). The prior note incorrectly prescribed ident() for all
positions, which would produce invalid SQL in WHERE clauses.
…ncorrelated Delete redundant in-subquery-to-exists.md — same input shape and same rewrite as subquery-unnesting-uncorrelated.md. Merge the 'large result set' trigger and 'small static set' skip condition into the canonical file. Update index and eval 200 to route exclusively there.
- awslabs#1: Strip surrounding quotes from all placeholders in catalog-queries so safe_query helpers (which emit their own quotes) don't double-quote. Add worked safe_query.build() example to preamble. - awslabs#2: DP threshold changed from 10 to 8 (validated: SHOW join_collapse_limit = 8 on live DSQL). Agent now instructed to SHOW the value rather than hardcoding. - awslabs#3: Remove pg_stat_user_tables.last_analyze cross-check (DSQL never populates it). Guard reltuples with GREATEST(..., 0) for the -1 sentinel on never-analyzed tables. - awslabs#4: Fix '11 generic' to '10 generic' in SKILL.md reference table.
cc44474 to
1d39000
Compare
- Add Three-Layer Filter Model (Index Cond / Storage Filter / Query Processor Filter) with optimization table to plan-interpretation.md - Add Fixing Storage Lookups guidance (INCLUDE columns) with example - Add Cost Number Interpretation (startup ~100 is normal in DSQL) - Add DPU Interpretation (Read DPU as primary signal, optimization loop) - Add CTE late materialization as DSQL-specific rewrite pattern (defer Storage Lookups past LIMIT) - Update workflow.md Phase 1: recommend plain EXPLAIN first for expensive queries before EXPLAIN ANALYZE VERBOSE
krokoko
left a comment
There was a problem hiding this comment.
Please bump the plugin version in the required files, thanks !
Done! |
fda83ba to
c207942
Compare
Summary
references/query-plan/workflow.md(SKILL.md: 246 → 249 LOC)plan-interpretation.md, indexed column type queries incatalog-queries.mdquery-rewrites/, plus 2 DSQL-specific rewrites (reltuples estimate, split large joins)Validation
validate-size.py: 249 lines (good, under 300 limit)validate-references.py: 0 broken links, 0 new orphansEval Results
Manual qualitative comparison (n=1, Claude Opus 4.6). Full results in
tools/evals/databases-on-aws/dsql/query_plan_rewrite_eval_results.md:Follow-ups
awslabs/mcpsrc/aurora-dsql-mcp-server/skills/dsql-skill/needs to be synced with these changes (workflow.md, query-rewrites/ split, updated catalog-queries.md, plan-interpretation.md). Will open companion PR after this merges.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.
🤖 Generated with Claude Code