You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: "Build with Aurora DSQL — manage schemas, execute queries, handle migrations, diagnose query plans, and develop applications with a serverless, distributed SQL database. Covers IAM auth, multi-tenant patterns, MySQL-to-DSQL migration, DDL operations, and query plan explainability. Triggers on phrases like: DSQL, Aurora DSQL, create DSQL table, DSQL schema, migrate to DSQL, distributed SQL database, serverless PostgreSQL-compatible database, DSQL query plan, DSQL EXPLAIN ANALYZE, why is my DSQL query slow, DSQL query performance, DSQL full scan, DSQL DPU, DSQL query cost, DSQL latency, optimize this query, this query is slow, explain this plan, query performance, high DPU, make this faster, why is this doing a full scan."
7
4
---
8
5
9
6
# Amazon Aurora DSQL Skill
@@ -35,7 +32,7 @@ Load these files as needed for detailed guidance:
35
32
36
33
**When:** Always load for guidance using or updating the DSQL MCP server
37
34
**Contains:** Instructions for setting up the DSQL MCP server with 2 configuration options as
38
-
sampled in [.mcp.json](../../.mcp.json)
35
+
sampled in [mcp/.mcp.json](mcp/.mcp.json)
39
36
40
37
1. Documentation-Tools Only
41
38
2. Database Operations (requires a cluster endpoint)
@@ -153,16 +150,18 @@ defaults that may change — when a user's decision depends on an exact limit, v
153
150
| Max indexes per table | 24 |`aurora dsql index limits`|
154
151
| Max columns per index | 8 |`aurora dsql index limits`|
| Supported column data types | See docs |`aurora dsql supported data types`|
157
153
158
-
**When to verify:** Before recommending batch sizes, connection pool settings, or schema designs where hitting a limit would cause failures; any time the exact number can affect user decision.
154
+
**When to verify:** Before recommending batch sizes, connection pool settings, or schema designs
155
+
where hitting a limit would cause failures. No need to verify for general guidance or when
156
+
the exact number doesn't affect the user's decision.
159
157
160
-
**Fallback:** If `awsknowledge` is unavailable, use the defaults above and flag that limits should be verified against [DSQL documentation](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/).
158
+
**Fallback:** If `awsknowledge` is unavailable, use the defaults above and note to the user
159
+
that limits should be verified against [DSQL documentation](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/).
161
160
162
161
## CLI Scripts Available
163
162
164
-
Bash scripts in [scripts/](../../scripts/) for cluster management (create, delete, list, cluster info), psql connection, and bulk data loading from local/s3 csv/tsv/parquet files.
165
-
See [scripts/README.md](../../scripts/README.md) for usage and hook configuration.
163
+
Bash scripts in [scripts/](scripts/) for cluster management (create, delete, list, cluster info), psql connection, and bulk data loading from local/s3 csv/tsv/parquet files.
164
+
See [scripts/README.md](scripts/README.md) for usage.
166
165
167
166
---
168
167
@@ -206,7 +205,7 @@ ALTER COLUMN TYPE, DROP COLUMN, DROP CONSTRAINT → Table Recreation Pattern (Wo
206
205
- MUST include tenant_id in all tables
207
206
- MUST use `CREATE INDEX ASYNC` exclusively
208
207
- MUST issue each DDL in its own transact call: `transact(["CREATE TABLE ..."])`
209
-
- MUST serialize arrays as TEXT or JSON; cast back at query time (`string_to_array(text, ',')` or `jsonb_array_elements_text(json::jsonb)`)
208
+
- MUST store arrays/JSON as TEXT
210
209
211
210
### Workflow 2: Safe Data Migration
212
211
@@ -220,7 +219,10 @@ ALTER COLUMN TYPE, DROP COLUMN, DROP CONSTRAINT → Table Recreation Pattern (Wo
220
219
- MUST batch updates under 3,000 rows in separate transact calls
221
220
- MUST issue each ALTER TABLE in its own transaction
222
221
223
-
**Recovery — batch fails midway:** Rows already updated keep their new value (each batch committed independently). Resume by filtering on the unset state (`WHERE new_column IS NULL`) and continue. Re-running is safe because the filter naturally excludes completed rows.
222
+
**Recovery — batch fails midway:** Rows already updated keep their new value (each batch committed
223
+
in its own transaction). Resume by filtering on the unset state — e.g. add
224
+
`WHERE new_column IS NULL` (or the sentinel value) to the next UPDATE — and continue from there.
225
+
Re-running the entire migration is safe because the filter naturally excludes completed rows.
@@ -252,7 +254,42 @@ MUST load [mysql-migrations/type-mapping.md](references/mysql-migrations/type-ma
252
254
253
255
### Workflow 8: Query Plan Explainability
254
256
255
-
Explains why the DSQL optimizer chose a particular plan. Triggered by slow queries, high DPU, unexpected Full Scans, or plans the user doesn't understand. **REQUIRES a structured Markdown diagnostic report is the deliverable** beyond conversation — run the workflow end-to-end before answering. Use the `aurora-dsql` MCP when connected; fall back to raw `psql` with a generated IAM token (see the fallback block below) otherwise.
257
+
Explains why the DSQL optimizer chose a particular plan. **REQUIRES a structured Markdown diagnostic report as the deliverable** — run the workflow end-to-end before answering. Use the `aurora-dsql` MCP when connected; fall back to raw `psql` with a generated IAM token (see the fallback block below) otherwise.
258
+
259
+
#### Trigger Criteria
260
+
261
+
Enter this workflow if **ANY** of these signals are present:
| Multiple database MCPs are connected and no DSQL signal in the message | Ask the user which database they mean before proceeding |
282
+
| No database MCP is connected | Inform the user that the `aurora-dsql` MCP is required and offer the psql fallback |
283
+
284
+
#### Routing (sub-path selection)
285
+
286
+
| Condition | Path |
287
+
|-----------|------|
288
+
| User provides SQL but no plan output | Full workflow: Phase 0 → 1 → 2 → 3 → 4 |
289
+
| User pastes plan output + asks to fix/optimize | Full workflow: Phase 0 → 1 (re-capture fresh plan) → 2 → 3 → 4 |
290
+
| User pastes plan output + asks what it means (educational) | Full workflow: Phase 0 → 1 (re-capture fresh plan) → 2 → 3 → 4. The report is the explanation — do not produce a shorter conversational answer instead |
291
+
| Execution time >30s detected at Phase 1 | Phase 3 skips experiments per guc-experiments.md |
292
+
| User says "reassess" or equivalent | Re-run Phase 1–2, append Addendum to existing report |
256
293
257
294
**Phase 0 — Load reference material.** Read all four before starting — each has content later phases need verbatim (node-type math, exact catalog SQL, the `>30s` skip protocol, required report elements):
258
295
@@ -263,7 +300,7 @@ Explains why the DSQL optimizer chose a particular plan. Triggered by slow queri
263
300
264
301
**Phase 1 — Capture the plan.****ALWAYS** run `readonly_query("EXPLAIN ANALYZE VERBOSE …")` on the user's query verbatim (SELECT form) — **ALWAYS** capture a fresh plan from the cluster, even when the user describes the plan or reports an anomaly. **MAY** leverage `get_schema` or `information_schema` for schema sanity checks. When EXPLAIN errors (`relation does not exist`, `column does not exist`), **MUST** report the error verbatim — **MUST NOT** invent DSQL-specific semantics (e.g., case sensitivity, identifier quoting) as the root cause. Extract Query ID, Planning Time, Execution Time, DPU Estimate. **SELECT** runs as-is. **UPDATE/DELETE** rewrite to the equivalent SELECT (same join chain + WHERE) — the optimizer picks the same plan shape. **INSERT**, pl/pgsql, DO blocks, and functions **MUST** be rejected. **MUST NOT** use `transact --allow-writes` for plan capture; it bypasses MCP safety.
265
302
266
-
**Phase 2 — Gather evidence.** Using SQL from `catalog-queries.md`, query `pg_class`, `pg_stats`, `pg_indexes`, `COUNT(*)`, `COUNT(DISTINCT)`. Classify estimation errors per `plan-interpretation.md` (2x–5x minor, 5x–50x significant, 50x+ severe). Detect correlated predicates and data skew.
303
+
**Phase 2 — Gather evidence.** Using SQL from `catalog-queries.md`, query `pg_class`, `pg_stats`, `pg_indexes`, `COUNT(*)`, `COUNT(DISTINCT)`. Classify estimation errors per `plan-interpretation.md` (2x–5x minor, 5x–50x significant, 50x+ severe). Detect correlated predicates and data skew. When a Full Scan appears despite an apparently usable index, check for type coercion index bypass: retrieve indexed column types and compare against predicate literal types using the implicit cast compatibility matrix in `plan-interpretation.md`.
267
304
268
305
**Phase 3 — Experiment (conditional).** ≤30s: run GUC experiments per `guc-experiments.md` (default + merge-join-only) plus optional redundant-predicate test. >30s: skip experiments, include the manual GUC testing SQL verbatim in the report, and do not re-run for redundant-predicate testing. Anomalous values (impossible row counts): confirm query results are correct despite the anomalous EXPLAIN, flag as a potential DSQL bug, and produce the Support Request Template from `report-format.md`.
Copy file name to clipboardExpand all lines: plugins/databases-on-aws/skills/dsql/references/query-plan/catalog-queries.md
+76Lines changed: 76 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -103,6 +103,82 @@ Compare against `pg_stats.n_distinct`:
103
103
- If `n_distinct` is positive: compare directly
104
104
- If `n_distinct` is negative: multiply absolute value by actual row count to get estimated distinct count
105
105
106
+
## Column Types for Predicate Columns
107
+
108
+
Retrieve the declared types for columns used in WHERE predicates and JOIN conditions, to detect type coercion index bypass (see plan-interpretation.md):
109
+
110
+
```sql
111
+
SELECT
112
+
c.table_name,
113
+
c.column_name,
114
+
c.data_type,
115
+
c.udt_name,
116
+
c.is_nullable
117
+
FROMinformation_schema.columns c
118
+
WHEREc.table_schema='{schema}'
119
+
ANDc.table_nameIN ('{table1}', '{table2}')
120
+
ANDc.column_nameIN ('{col1}', '{col2}');
121
+
```
122
+
123
+
Cross-reference the column type against predicate literals visible in the EXPLAIN output. When the types differ, check the implicit cast compatibility matrix in plan-interpretation.md to determine whether the mismatch prevents index usage.
124
+
125
+
## B-Tree Cross-Type Operator Support
126
+
127
+
Determine which type pairs the DSQL B-Tree access method supports for index scans. If a (predicate-type, column-type) pair has no registered operator, the index cannot be used for that comparison:
128
+
129
+
```sql
130
+
SELECT DISTINCT
131
+
lt.typnameAS left_type,
132
+
rt.typnameAS right_type
133
+
FROM pg_amop ao
134
+
JOIN pg_type lt ONlt.oid=ao.amoplefttype
135
+
JOIN pg_type rt ONrt.oid=ao.amoprighttype
136
+
WHEREao.amopmethod=10003
137
+
ANDao.amoplefttype!=ao.amoprighttype
138
+
ORDER BYlt.typname, rt.typname;
139
+
```
140
+
141
+
This returns only the cross-type pairs (where left and right types differ). Same-type pairs are always supported. Use this to confirm whether a suspected type mismatch actually prevents index usage — if the pair appears in the result, the index CAN be used and the issue lies elsewhere.
142
+
143
+
To check a specific pair:
144
+
145
+
```sql
146
+
SELECT EXISTS (
147
+
SELECT1
148
+
FROM pg_amop ao
149
+
JOIN pg_type lt ONlt.oid=ao.amoplefttype
150
+
JOIN pg_type rt ONrt.oid=ao.amoprighttype
151
+
WHEREao.amopmethod=10003
152
+
ANDlt.typname='{predicate_type}'
153
+
ANDrt.typname='{column_type}'
154
+
) AS index_usable;
155
+
```
156
+
157
+
## Indexed Column Types
158
+
159
+
Retrieve index definitions together with their column types to identify type coercion bypass candidates:
160
+
161
+
```sql
162
+
SELECT
163
+
i.indexname,
164
+
i.tablename,
165
+
a.attnameAS column_name,
166
+
t.typnameAS column_type,
167
+
i.indexdef
168
+
FROM pg_indexes i
169
+
JOIN pg_class ic ONic.relname=i.indexname
170
+
JOIN pg_index ix ONix.indexrelid=ic.oid
171
+
JOIN pg_attribute a ONa.attrelid=ix.indrelid
172
+
ANDa.attnum= ANY(ix.indkey)
173
+
JOIN pg_type t ONt.oid=a.atttypid
174
+
JOIN pg_namespace n ONn.oid=ic.relnamespace
175
+
WHEREn.nspname='{schema}'
176
+
ANDi.tablenameIN ('{table1}', '{table2}')
177
+
ORDER BYi.tablename, i.indexname, a.attnum;
178
+
```
179
+
180
+
Use this when a Full Scan appears despite an apparently usable index — compare the index column's `column_type` against the predicate literal's inferred type.
181
+
106
182
## Value Distribution Analysis
107
183
108
184
For columns with suspected data skew, retrieve the actual top-N value frequencies:
Copy file name to clipboardExpand all lines: plugins/databases-on-aws/skills/dsql/references/query-plan/plan-interpretation.md
+57Lines changed: 57 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -183,6 +183,63 @@ Detect physically impossible row counts in DSQL plan nodes:
183
183
184
184
These anomalous values do not affect query correctness — only diagnostic output accuracy.
185
185
186
+
## Type Coercion and Index Bypass
187
+
188
+
An index may exist on a column yet not be used when the predicate value's type does not match the column's declared type and no implicit cast exists between the two types.
189
+
190
+
### Detection Pattern
191
+
192
+
Flag this condition when **all** of the following are true:
193
+
194
+
1. An index exists whose leading column matches a WHERE predicate column
195
+
2. The plan uses a Full Scan or Seq Scan on that table instead of an Index Scan
196
+
3. The predicate literal's type differs from the indexed column's declared type
197
+
4. The type pair is **not** in the implicit cast compatibility matrix below
198
+
199
+
### Why It Happens
200
+
201
+
DSQL (like PostgreSQL) can only use a B-Tree index when the comparison operator's input types match the index's operator class. When a predicate supplies a value of a different type:
202
+
203
+
- If an implicit cast exists from the predicate type to the column type, the planner applies it transparently and can still use the index
204
+
- If no implicit cast exists, the planner must apply a per-row cast or comparison function that cannot use the index's ordering — resulting in a full scan
205
+
206
+
This is particularly surprising to users because the query returns correct results (the cast happens at execution time, row by row) but performance degrades dramatically on large tables.
207
+
208
+
### Determining Index-Compatible Type Pairs
209
+
210
+
Rather than relying on a static matrix, query `pg_amop` directly on the cluster to determine which cross-type comparisons the DSQL B-Tree index access method supports. See catalog-queries.md for the exact SQL.
211
+
212
+
The key insight: DSQL's B-Tree access method (amopmethod `10003`) only supports index scans when a registered operator exists for the specific (left-type, right-type) pair. If no operator is registered for the pair, the index cannot be used — regardless of whether a general-purpose implicit cast exists in `pg_cast`.
213
+
214
+
In practice, cross-type index support is limited to the integer family (smallint, integer, bigint — all combinations). All other indexed types (text, numeric, uuid, timestamp, date, boolean, etc.) require an exact type match between the predicate and the indexed column for the index to be usable.
215
+
216
+
### Quantifying Impact
217
+
218
+
When this pattern is detected:
219
+
220
+
```
221
+
Full Scan rows processed = actual_rows from Full Scan node
222
+
Index Scan rows (expected) = estimated rows matching the predicate (from pg_stats selectivity)
223
+
Scan amplification = Full Scan rows / Index Scan rows (expected)
224
+
```
225
+
226
+
### Recommendation Template
227
+
228
+
When a type coercion bypass is confirmed:
229
+
230
+
-**Explicit cast in the predicate:** Rewrite `WHERE col = '42'` as `WHERE col = 42::float` (cast the literal to the column type)
231
+
-**Application-layer fix:** Ensure the application passes parameters with the correct type rather than relying on implicit conversion
232
+
-**Do NOT recommend changing the column type** to accommodate mismatched predicates — this masks the real issue and may break other queries
233
+
234
+
### Evidence Gathering
235
+
236
+
To confirm this pattern, cross-reference:
237
+
238
+
1. The column type from `pg_attribute` or `information_schema.columns` (see catalog-queries.md)
239
+
2. The index definition from `pg_indexes`
240
+
3. The predicate literal in the EXPLAIN output (visible in `Filter:` or `Index Cond:` lines)
241
+
4. The implicit cast matrix above
242
+
186
243
## Projections and Row Width
187
244
188
245
Capture Projections lists from Storage Scan and Storage Lookup nodes:
SQL rewrites that address Aurora DSQL-specific behaviors and limitations. Apply these when the plan reveals inefficiency unique to DSQL's distributed architecture or optimizer constraints.
4
+
5
+
## Table of Contents
6
+
7
+
1.[Replace COUNT(*) with reltuples Estimate](#replace-count-with-reltuples-estimate)
8
+
2.[Split Large Joins to Enable Optimal Join Ordering](#split-large-joins-to-enable-optimal-join-ordering)
9
+
10
+
---
11
+
12
+
## Replace COUNT(*) with reltuples Estimate
13
+
14
+
When a query performs `COUNT(*)` on a large table, rewrite to use the `reltuples` value from `pg_class` for an approximate row count. This is a common workaround for cases where `COUNT(*)` is too slow or times out on large tables.
15
+
16
+
**When to apply:** An approximate count is acceptable and the table is large enough that `COUNT(*)` is prohibitively expensive.
17
+
18
+
**Do not apply:** The application requires an exact count.
19
+
20
+
```sql
21
+
-- Original
22
+
SELECTCOUNT(*) AS exact_count
23
+
FROM big_table;
24
+
25
+
-- Rewritten (DSQL)
26
+
SELECT reltuples::bigintAS estimated_count
27
+
FROM pg_class
28
+
WHEREoid='public.big_table'::regclass;
29
+
```
30
+
31
+
```sql
32
+
-- Not applicable: exact count required
33
+
SELECTCOUNT(*) AS exact_count
34
+
FROM big_table;
35
+
```
36
+
37
+
---
38
+
39
+
## Split Large Joins to Enable Optimal Join Ordering
40
+
41
+
If a query joins more tables than the optimizer's DP threshold (e.g., 10 joins for Aurora DSQL), rewrite it into multiple subqueries each joining no more tables than the threshold, then join the subquery results.
42
+
43
+
This allows the PostgreSQL-based DSQL engine to apply dynamic-programming (DP) join ordering within each smaller block, producing a better overall join plan than a greedy algorithm on many tables.
44
+
45
+
**When to apply:** The total number of joined tables exceeds the DP threshold (`join_collapse_limit` or `from_collapse_limit`). Partition the join into CTEs each with table count at or below the threshold, push down relevant filters, and join the CTE results.
46
+
47
+
**Do not apply:** The total table count is at or below the threshold, or splitting would prevent necessary cross-block optimizations.
0 commit comments