Skip to content

Commit cc44474

Browse files
committed
fix(dsql): address merge-blocking review findings #1-#4
- #1: Strip surrounding quotes from all placeholders in catalog-queries so safe_query helpers (which emit their own quotes) don't double-quote. Add worked safe_query.build() example to preamble. - #2: DP threshold changed from 10 to 8 (validated: SHOW join_collapse_limit = 8 on live DSQL). Agent now instructed to SHOW the value rather than hardcoding. - #3: Remove pg_stat_user_tables.last_analyze cross-check (DSQL never populates it). Guard reltuples with GREATEST(..., 0) for the -1 sentinel on never-analyzed tables. - #4: Fix '11 generic' to '10 generic' in SKILL.md reference table.
1 parent 00884a4 commit cc44474

4 files changed

Lines changed: 35 additions & 23 deletions

File tree

plugins/databases-on-aws/skills/dsql/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ Load these files as needed for detailed guidance:
8484
| [query-plan/catalog-queries.md](references/query-plan/catalog-queries.md) | MUST load at Workflow 9 Phase 0 | `pg_class`/`pg_stats`/`pg_indexes` SQL, correlated-predicate verification |
8585
| [query-plan/guc-experiments.md](references/query-plan/guc-experiments.md) | MUST load at Workflow 9 Phase 0 | GUC experiment procedures, 30-second skip protocol |
8686
| [query-plan/report-format.md](references/query-plan/report-format.md) | MUST load at Workflow 9 Phase 0 | Required report structure, element checklist, support request template |
87-
| [query-plan/query-rewrites-generic.md](references/query-plan/query-rewrites-generic.md) | SHOULD load at Phase 0; sub-files on-demand | Index of 11 generic rewrite patterns |
87+
| [query-plan/query-rewrites-generic.md](references/query-plan/query-rewrites-generic.md) | SHOULD load at Phase 0; sub-files on-demand | Index of 10 generic rewrite patterns |
8888
| [query-plan/query-rewrites-dsql-specific.md](references/query-plan/query-rewrites-dsql-specific.md) | SHOULD load at Phase 0; sub-files on-demand | Index of DSQL-specific rewrite patterns |
8989

9090
---

plugins/databases-on-aws/skills/dsql/references/query-plan/catalog-queries.md

Lines changed: 28 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,22 @@
22

33
Exact SQL for interrogating optimizer statistics and actual cardinalities against the DSQL cluster.
44

5-
**Placeholder substitution:** All queries in this file use `'{...}'` placeholders. MUST substitute via `safe_query.build()` — see input-validation.md. Use the correct helper per position:
5+
**Placeholder substitution:** All queries in this file use `{...}` placeholders. MUST substitute via `safe_query.build()` — see input-validation.md. Use the correct helper per position:
66

77
- **Identifier positions** (FROM clause, GROUP BY, column aliases): `ident()` → emits `"value"`
8-
- **String-literal positions** (WHERE `= '{schema}'`, `IN ('{table}')`, equality comparisons against catalog columns): `allow()` or `regex()` → emits `'value'`
8+
- **String-literal positions** (WHERE `= {schema}`, `IN ({table})`, equality comparisons against catalog columns): `allow()` or `regex()` → emits `'value'`
9+
10+
Worked example:
11+
12+
```python
13+
safe_query.build(
14+
"SELECT reltuples FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace "
15+
"WHERE n.nspname = {schema} AND c.relname IN ({t1}, {t2})",
16+
schema=regex(r"^[a-z_]+$", user_schema),
17+
t1=regex(r"^[a-z_]+$", table1),
18+
t2=regex(r"^[a-z_]+$", table2),
19+
)
20+
```
921

1022
## Table of Contents
1123

@@ -33,8 +45,8 @@ SELECT
3345
relpages
3446
FROM pg_class c
3547
JOIN pg_namespace n ON n.oid = c.relnamespace
36-
WHERE n.nspname = '{schema}'
37-
AND c.relname IN ('{table1}', '{table2}', '{table3}');
48+
WHERE n.nspname = {schema}
49+
AND c.relname IN ({table1}, {table2}, {table3});
3850
```
3951

4052
Compare `reltuples` against actual `COUNT(*)`. A divergence >20% on the table-stats snapshot indicates stale `reltuples` requiring `ANALYZE`. This is distinct from the row-estimate-vs-actual error thresholds used for plan findings (see plan-interpretation.md: 2x–5x minor, 5x–50x significant, 50x+ severe).
@@ -54,9 +66,9 @@ SELECT
5466
histogram_bounds,
5567
correlation
5668
FROM pg_stats
57-
WHERE schemaname = '{schema}'
58-
AND tablename = '{table}'
59-
AND attname IN ('{col1}', '{col2}');
69+
WHERE schemaname = {schema}
70+
AND tablename = {table}
71+
AND attname IN ({col1}, {col2});
6072
```
6173

6274
**Key fields:**
@@ -79,8 +91,8 @@ SELECT
7991
indexname,
8092
indexdef
8193
FROM pg_indexes
82-
WHERE schemaname = '{schema}'
83-
AND tablename IN ('{table1}', '{table2}', '{table3}')
94+
WHERE schemaname = {schema}
95+
AND tablename IN ({table1}, {table2}, {table3})
8496
ORDER BY tablename, indexname;
8597
```
8698

@@ -123,9 +135,9 @@ SELECT
123135
c.udt_name,
124136
c.is_nullable
125137
FROM information_schema.columns c
126-
WHERE c.table_schema = '{schema}'
127-
AND c.table_name IN ('{table1}', '{table2}')
128-
AND c.column_name IN ('{col1}', '{col2}');
138+
WHERE c.table_schema = {schema}
139+
AND c.table_name IN ({table1}, {table2})
140+
AND c.column_name IN ({col1}, {col2});
129141
```
130142

131143
Cross-reference the column type against predicate literals visible in the EXPLAIN output. When the types differ, use the B-Tree Cross-Type Operator Support query below to determine whether the mismatch prevents index usage.
@@ -160,8 +172,8 @@ SELECT EXISTS (
160172
JOIN pg_type rt ON rt.oid = ao.amoprighttype
161173
-- 10003 = DSQL B-Tree OID; verify with: SELECT oid FROM pg_am WHERE amname = 'btree_index'
162174
WHERE ao.amopmethod = 10003
163-
AND lt.typname = '{predicate_type}'
164-
AND rt.typname = '{column_type}'
175+
AND lt.typname = {predicate_type}
176+
AND rt.typname = {column_type}
165177
) AS index_usable;
166178
```
167179

@@ -183,8 +195,8 @@ JOIN pg_attribute a ON a.attrelid = ix.indrelid
183195
AND a.attnum = ANY(ix.indkey)
184196
JOIN pg_type t ON t.oid = a.atttypid
185197
JOIN pg_namespace n ON n.oid = ic.relnamespace
186-
WHERE n.nspname = '{schema}'
187-
AND i.tablename IN ('{table1}', '{table2}')
198+
WHERE n.nspname = {schema}
199+
AND i.tablename IN ({table1}, {table2})
188200
ORDER BY i.tablename, i.indexname, a.attnum;
189201
```
190202

plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/reltuples-estimate.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ When a query performs `COUNT(*)` on a large table, rewrite to use the `reltuples
44

55
**SHOULD apply when:** An approximate count is acceptable and the table is large enough that `COUNT(*)` is prohibitively expensive.
66

7-
**Staleness warning:** `reltuples` reflects the last `ANALYZE` or autovacuum run. MUST warn the user that the value MAY be stale on write-heavy or recently created tables. SHOULD recommend cross-checking `pg_stat_user_tables.last_analyze` when the count drives a decision.
7+
**Staleness warning:** `reltuples` reflects the last `ANALYZE` run. MUST warn the user that the value MAY be stale on write-heavy or recently created tables (DSQL does not populate `pg_stat_user_tables.last_analyze`). A value of `-1` means statistics have never been gathered — treat as "unknown" and recommend running `ANALYZE` first.
88

99
**SHOULD skip when:** The application requires an exact count.
1010

@@ -13,8 +13,8 @@ When a query performs `COUNT(*)` on a large table, rewrite to use the `reltuples
1313
SELECT COUNT(*) AS exact_count
1414
FROM big_table;
1515

16-
-- Rewritten (DSQL)
17-
SELECT reltuples::bigint AS estimated_count
16+
-- Rewritten (DSQL) — GREATEST guards against -1 (never-analyzed)
17+
SELECT GREATEST(reltuples, 0)::bigint AS estimated_count
1818
FROM pg_class
1919
WHERE oid = 'public.big_table'::regclass;
2020
```

plugins/databases-on-aws/skills/dsql/references/query-plan/query-rewrites/split-large-joins.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Rewrite: Split Large Joins for DP Join Ordering (DSQL-Specific)
22

3-
When a query joins more tables than the optimizer's DP threshold (e.g., 10 joins for Aurora DSQL), rewrite it into multiple subqueries each joining no more tables than the threshold, then join the subquery results.
3+
When a query joins more tables than the optimizer's DP threshold, rewrite it into multiple subqueries each joining no more tables than the threshold, then join the subquery results. The agent MUST run `SHOW join_collapse_limit;` on the target cluster to determine the actual threshold rather than assuming a fixed value (default is **8** on Aurora DSQL).
44

55
This allows the PostgreSQL-based DSQL engine to apply dynamic-programming (DP) join ordering within each smaller block, producing a better overall join plan than a greedy algorithm on many tables.
66

@@ -9,7 +9,7 @@ This allows the PostgreSQL-based DSQL engine to apply dynamic-programming (DP) j
99
**SHOULD skip when:** The total table count is at or below the threshold, or splitting would prevent necessary cross-block optimizations.
1010

1111
```sql
12-
-- Original (11 tables — exceeds DP threshold of 10)
12+
-- Original (11 tables — exceeds default DP threshold of 8)
1313
SELECT *
1414
FROM R1
1515
JOIN R2 ON R1.id = R2.r1_id
@@ -24,7 +24,7 @@ FROM R1
2424
JOIN R11 ON R10.id = R11.r10_id
2525
WHERE Filters;
2626

27-
-- Rewritten (DSQL) — split into two CTEs, each ≤ 10 tables
27+
-- Rewritten (DSQL) — split into two CTEs, each ≤ 8 tables
2828
WITH
2929
sub1 AS (
3030
SELECT R1.id, R6.id AS r6_id, R6.col

0 commit comments

Comments
 (0)