Skip to content

Commit 665a238

Browse files
Morlejclaude
andcommitted
fix(dsql): address PR awslabs#162 review — correctness, references, evals
Correctness fixes (review items 1-5): - awslabs#1: push-computation-to-constant — use NUMERIC column 'amount' to avoid integer division non-equivalence - awslabs#2: not-in-to-not-exists — add NULL semantics warning (NOT EXISTS does not preserve NOT IN's NULL-propagation; MUST confirm with user) - awslabs#3/awslabs#4: subquery-unnesting — prefer EXISTS form (true semi-join); document uniqueness precondition for JOIN+DISTINCT alternative - awslabs#5: subquery-unnesting-scalar — add COALESCE(s_count, 0) for COUNT/SUM (LEFT JOIN returns NULL, scalar returns 0) Dangling reference fixes (review items 6-8): - awslabs#6: workflow.md trigger table — "Phase 5" → reassessment re-entry - awslabs#7: Replace all "implicit cast compatibility matrix" references with "pg_amop query in catalog-queries.md" - awslabs#8: plan-interpretation.md L202 — fix cast-vs-operator contradiction Structural fixes (review items 9-14, 24): - awslabs#9: Hedge "integer family" claim with "at time of writing" + verify - awslabs#10: amopmethod=10003 — add provenance comment and verification SQL - awslabs#11: catalog-queries.md TOC — add 3 missing sections - awslabs#12: plan-interpretation.md TOC — add Type Coercion section - awslabs#13: SKILL.md — explicitly delegate routing to workflow.md - awslabs#24: workflow.md — remove em dashes from headings for clean anchors Other fixes (review items 21-23): - awslabs#21: reltuples-estimate — add staleness warning (MUST warn user) - awslabs#22: catalog-queries — add safe_query.build() note for placeholders - awslabs#23: "Skip when" → "SHOULD skip when" in all rewrite files Eval improvements (review items 14, 16): - awslabs#14: README — add query_plan_rewrite_evals to directory tree and eval section - awslabs#16: Add evals 206-210 covering LEFT JOIN, computation push, NOT IN with NULL warning, nested UNION ALL, and negative case (OR across different columns) - awslabs#7 (eval): Update eval 201 expectation — pg_amop instead of matrix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 6f97294 commit 665a238

19 files changed

Lines changed: 184 additions & 63 deletions

plugins/databases-on-aws/skills/dsql/SKILL.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -111,8 +111,10 @@ sampled in [mcp/.mcp.json](mcp/.mcp.json)
111111

112112
### Query Plan Explainability (modular):
113113

114-
**When:** MUST load [query-plan/workflow.md](references/query-plan/workflow.md) at Workflow 8 entry — it gates the remaining files
115-
**Contains:** Trigger criteria, context disambiguation, routing, phased workflow, and references to: [plan-interpretation.md](references/query-plan/plan-interpretation.md), [catalog-queries.md](references/query-plan/catalog-queries.md), [guc-experiments.md](references/query-plan/guc-experiments.md), [report-format.md](references/query-plan/report-format.md), [query-rewrites-generic.md](references/query-plan/query-rewrites-generic.md), [query-rewrites-dsql-specific.md](references/query-plan/query-rewrites-dsql-specific.md)
114+
#### [query-plan/workflow.md](references/query-plan/workflow.md)
115+
116+
**When:** MUST load at Workflow 8 entry — it gates all other query-plan files
117+
**Contains:** Trigger criteria, context disambiguation, routing, phased workflow (Phase 0–4). Workflow.md specifies which reference files to load at each phase — follow its loading instructions rather than loading all files upfront
116118

117119
### SQL Compatibility Validation:
118120

plugins/databases-on-aws/skills/dsql/references/query-plan/catalog-queries.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,10 @@ Exact SQL for interrogating optimizer statistics and actual cardinalities agains
99
3. [Index Definitions](#index-definitions)
1010
4. [Actual Row Counts](#actual-row-counts)
1111
5. [Actual Distinct Counts](#actual-distinct-counts)
12-
6. [Value Distribution Analysis](#value-distribution-analysis)
12+
6. [Column Types for Predicate Columns](#column-types-for-predicate-columns)
13+
7. [B-Tree Cross-Type Operator Support](#b-tree-cross-type-operator-support)
14+
8. [Indexed Column Types](#indexed-column-types)
15+
9. [Value Distribution Analysis](#value-distribution-analysis)
1316

1417
---
1518

@@ -105,6 +108,8 @@ Compare against `pg_stats.n_distinct`:
105108

106109
## Column Types for Predicate Columns
107110

111+
MUST substitute `'{schema}'`, `'{table}'`, and `'{col}'` placeholders in the queries below via `safe_query.build()` with `ident()` — see input-validation.md.
112+
108113
Retrieve the declared types for columns used in WHERE predicates and JOIN conditions, to detect type coercion index bypass (see plan-interpretation.md):
109114

110115
```sql
@@ -120,7 +125,7 @@ WHERE c.table_schema = '{schema}'
120125
AND c.column_name IN ('{col1}', '{col2}');
121126
```
122127

123-
Cross-reference the column type against predicate literals visible in the EXPLAIN output. When the types differ, check the implicit cast compatibility matrix in plan-interpretation.md to determine whether the mismatch prevents index usage.
128+
Cross-reference the column type against predicate literals visible in the EXPLAIN output. When the types differ, use the B-Tree Cross-Type Operator Support query below to determine whether the mismatch prevents index usage.
124129

125130
## B-Tree Cross-Type Operator Support
126131

@@ -133,6 +138,8 @@ SELECT DISTINCT
133138
FROM pg_amop ao
134139
JOIN pg_type lt ON lt.oid = ao.amoplefttype
135140
JOIN pg_type rt ON rt.oid = ao.amoprighttype
141+
-- 10003 is DSQL's B-Tree OID (PG mainline is 403).
142+
-- Verify with: SELECT oid FROM pg_am WHERE amname = 'btree'
136143
WHERE ao.amopmethod = 10003
137144
AND ao.amoplefttype != ao.amoprighttype
138145
ORDER BY lt.typname, rt.typname;
@@ -148,6 +155,7 @@ SELECT EXISTS (
148155
FROM pg_amop ao
149156
JOIN pg_type lt ON lt.oid = ao.amoplefttype
150157
JOIN pg_type rt ON rt.oid = ao.amoprighttype
158+
-- 10003 = DSQL B-Tree OID; verify with: SELECT oid FROM pg_am WHERE amname = 'btree'
151159
WHERE ao.amopmethod = 10003
152160
AND lt.typname = '{predicate_type}'
153161
AND rt.typname = '{column_type}'

plugins/databases-on-aws/skills/dsql/references/query-plan/plan-interpretation.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@
1111
7. [Hash Table Resizing](#hash-table-resizing)
1212
8. [High-Loop Storage Lookups](#high-loop-storage-lookups)
1313
9. [Anomalous Values](#anomalous-values)
14-
10. [Projections and Row Width](#projections-and-row-width)
14+
10. [Type Coercion and Index Bypass](#type-coercion-and-index-bypass)
15+
11. [Projections and Row Width](#projections-and-row-width)
1516

1617
---
1718

@@ -194,14 +195,14 @@ Flag this condition when **all** of the following are true:
194195
1. An index exists whose leading column matches a WHERE predicate column
195196
2. The plan uses a Full Scan or Seq Scan on that table instead of an Index Scan
196197
3. The predicate literal's type differs from the indexed column's declared type
197-
4. The type pair is **not** in the implicit cast compatibility matrix below
198+
4. The `pg_amop` query in catalog-queries.md (B-Tree Cross-Type Operator Support) returns no row for the type pair
198199

199200
### Why It Happens
200201

201-
DSQL (like PostgreSQL) can only use a B-Tree index when the comparison operator's input types match the index's operator class. When a predicate supplies a value of a different type:
202+
DSQL (like PostgreSQL) can only use a B-Tree index when a cross-type B-Tree operator is registered in `pg_amop` for the (predicate-type, column-type) pair. When a predicate supplies a value of a different type:
202203

203-
- If an implicit cast exists from the predicate type to the column type, the planner applies it transparently and can still use the index
204-
- If no implicit cast exists, the planner must apply a per-row cast or comparison function that cannot use the index's ordering — resulting in a full scan
204+
- If a cross-type B-Tree operator is registered (verify via the `pg_amop` query in catalog-queries.md), the index can be used
205+
- If no cross-type operator is registered, the planner MUST apply a per-row cast or comparison function that cannot use the index's ordering — resulting in a full scan
205206

206207
This is particularly surprising to users because the query returns correct results (the cast happens at execution time, row by row) but performance degrades dramatically on large tables.
207208

@@ -211,7 +212,7 @@ Rather than relying on a static matrix, query `pg_amop` directly on the cluster
211212

212213
The key insight: DSQL's B-Tree access method (amopmethod `10003`) only supports index scans when a registered operator exists for the specific (left-type, right-type) pair. If no operator is registered for the pair, the index cannot be used — regardless of whether a general-purpose implicit cast exists in `pg_cast`.
213214

214-
In practice, cross-type index support is limited to the integer family (smallint, integer, bigint — all combinations). All other indexed types (text, numeric, uuid, timestamp, date, boolean, etc.) require an exact type match between the predicate and the indexed column for the index to be usable.
215+
At time of writing, cross-type index support is limited to the integer family (smallint, integer, bigint — all combinations). All other indexed types (text, numeric, uuid, timestamp, date, boolean, etc.) require an exact type match. MUST verify via the `pg_amop` query in catalog-queries.md before asserting this to a user, as DSQL MAY add cross-type operator families in future releases.
215216

216217
### Quantifying Impact
217218

@@ -238,7 +239,7 @@ To confirm this pattern, cross-reference:
238239
1. The column type from `pg_attribute` or `information_schema.columns` (see catalog-queries.md)
239240
2. The index definition from `pg_indexes`
240241
3. The predicate literal in the EXPLAIN output (visible in `Filter:` or `Index Cond:` lines)
241-
4. The implicit cast matrix above
242+
4. The `pg_amop` query in catalog-queries.md (B-Tree Cross-Type Operator Support)
242243

243244
## Projections and Row Width
244245

plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/flatten-union-all.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ When a query contains UNION ALL nested inside another UNION ALL, flatten all bra
44

55
**SHOULD apply when:** All set operations are UNION ALL (no deduplication).
66

7-
**Skip when:** Any branch uses UNION (deduplicating), which MUST remain distinct.
7+
**SHOULD skip when:** Any branch uses UNION (deduplicating), which MUST remain distinct.
88

99
```sql
1010
-- Original

plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/in-subquery-to-exists.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ When a column is compared to a subquery using IN and the subquery may return man
44

55
**SHOULD apply when:** The IN subquery returns a large or variable number of rows.
66

7-
**Skip when:** The IN list is a small static set of constants.
7+
**SHOULD skip when:** The IN list is a small static set of constants.
88

99
```sql
1010
-- Original

plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/left-join-to-inner.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ When a query uses LEFT JOIN but the WHERE clause rejects NULLs on the joined tab
44

55
**SHOULD apply when:** The WHERE clause rejects NULLs from the right-hand side of a LEFT JOIN (e.g., `IS NOT NULL`, equality comparisons, or any predicate that cannot be true for NULL).
66

7-
**Skip when:** NULLs from the right-hand side are intentionally preserved in the result.
7+
**SHOULD skip when:** NULLs from the right-hand side are intentionally preserved in the result.
88

99
```sql
1010
-- Original

plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/not-in-to-not-exists.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
# Rewrite: Replace NOT IN with NOT EXISTS
22

3-
When a column is filtered with `NOT IN (subquery)`, rewrite as a correlated NOT EXISTS. This avoids building a large intermediate set and sidesteps NULL semantics issues with NOT IN.
3+
When a column is filtered with `NOT IN (subquery)`, rewrite as a correlated NOT EXISTS. This avoids building a large intermediate set.
44

5-
**SHOULD apply when:** The NOT IN subquery returns many rows or MAY contain NULLs.
5+
**Semantics warning:** NOT EXISTS does not preserve NOT IN's NULL-propagation behaviour. When the subquery MAY contain NULLs, `NOT IN` returns no rows while `NOT EXISTS` returns the non-matching rows — the rewrite changes results. MUST confirm intent with the user before applying when NULLs are possible.
66

7-
**Skip when:** The exclusion list is a small static set of constants.
7+
**SHOULD apply when:** The NOT IN subquery returns many rows and the subquery column is guaranteed NOT NULL (or the user confirms the changed NULL behaviour is acceptable).
8+
9+
**SHOULD skip when:** The exclusion list is a small static set of constants.
810

911
```sql
1012
-- Original

plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/or-to-in.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Rewrite multiple OR clauses comparing the same column to different constant valu
44

55
**SHOULD apply when:** All OR comparisons target the same column using equality (`=`) with constant values.
66

7-
**Skip when:** OR clauses compare different columns or involve non-constant expressions.
7+
**SHOULD skip when:** OR clauses compare different columns or involve non-constant expressions.
88

99
```sql
1010
-- Original

plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/propagate-filter.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ When a query has an equality join condition and a filter predicate on one join a
44

55
**SHOULD apply when:** The filter predicate is on a column involved in an equality join condition.
66

7-
**Skip when:** The predicate is on a non-join column.
7+
**SHOULD skip when:** The predicate is on a non-join column.
88

99
```sql
1010
-- Original

plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/push-computation-to-constant.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,16 @@ When a filter predicate applies invertible arithmetic to an indexed column, move
44

55
**SHOULD apply when:** All operations on the column are mathematically invertible (addition, subtraction, multiplication/division by non-zero constant).
66

7-
**Skip when:** The computation involves non-invertible functions (substring, lower/upper, trigonometric functions) or moving the computation changes query semantics (precision loss, integer-division rounding).
7+
**SHOULD skip when:** The computation involves non-invertible functions (substring, lower/upper, trigonometric functions) or moving the computation changes query semantics (precision loss, integer-division rounding).
88

99
```sql
10-
-- Original
11-
SELECT * FROM titles
12-
WHERE emp_no * 100 / 5 = 10001;
10+
-- Original (amount is NUMERIC)
11+
SELECT * FROM transactions
12+
WHERE amount * 100 / 5 = 2000.00;
1313

1414
-- Rewritten
15-
SELECT * FROM titles
16-
WHERE emp_no = 10001 * 5 / 100;
15+
SELECT * FROM transactions
16+
WHERE amount = 2000.00 * 5 / 100;
1717
```
1818

1919
```sql

0 commit comments

Comments
 (0)