Skip to content

Commit 7402408

Browse files
committed
feat: add Step 9 result presentation guidelines to data-parity skill
Require that every diff result summary surfaces: - Exact scope (tables + warehouses compared) - Filters and time period applied (or explicitly states none) - Key columns used and how they were confirmed - Columns compared and excluded, with reasons (auto-timestamp, user request) - Algorithm used Includes example full result summary and guidance for identical results — emphasising that bare numbers without context are meaningless to the user.
1 parent 19c2376 commit 7402408

1 file changed

Lines changed: 71 additions & 1 deletion

File tree

.opencode/skills/data-parity/SKILL.md

Lines changed: 71 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Here's my plan:
1919
6. [ ] Run column-level profile (cheap — no row scan)
2020
7. [ ] Ask whether to proceed with row-level diff (may be expensive for large tables)
2121
8. [ ] Run targeted row-level diff on diverging columns only
22-
9. [ ] Report findings
22+
9. [ ] Present findings with scope, filters, time period, columns compared/excluded, and assumptions
2323
```
2424

2525
Update each item to `[x]` as you complete it. This plan should be visible before any tool is called.
@@ -317,6 +317,76 @@ The output lists which columns were auto-excluded and why.
317317

318318
---
319319

320+
## Step 9: Present Findings — Always Surface Context
321+
322+
When reporting diff results, **never present bare numbers**. Always frame the result with the full context that determines what the numbers actually mean.
323+
324+
### Required elements in every result summary
325+
326+
**1. Scope — what was compared**
327+
State exactly which tables/queries were diffed and on which warehouses:
328+
> "Compared `public.orders` on **postgres_prod** vs `public.orders` on **snowflake_dw**"
329+
330+
**2. Filters and time period applied**
331+
If any `where_clause` or `partition_column` was used, state it explicitly:
332+
> "Scope limited to: `created_at >= '2024-01-01' AND created_at < '2024-04-01'` (Q1 2024 only)"
333+
> "Partitioned by `l_shipdate` (monthly buckets) — diff covers Jan 2023 through Mar 2024"
334+
335+
If no filter was applied, say so:
336+
> "No row filter applied — full table compared"
337+
338+
**3. Key columns used**
339+
> "Key: `order_id` (confirmed unique — 150,000 distinct values = 150,000 rows)"
340+
341+
**4. Columns included and excluded**
342+
List what was compared and what was skipped, and why:
343+
> "Compared columns: `amount`, `status`, `customer_id`"
344+
> "Excluded (auto-timestamp defaults): `created_at`, `updated_at`, `_loaded_at`"
345+
> "Excluded (user request): `internal_score`"
346+
347+
If the user confirmed exclusions in Step 4, reference that confirmation:
348+
> "Excluded per your confirmation: `created_at`, `updated_at`"
349+
350+
**5. Algorithm used**
351+
> "Algorithm: `hashdiff` (cross-database)"
352+
353+
### Example full result summary
354+
355+
```
356+
## Data Parity Results
357+
358+
**Compared:** `public.orders` (postgres_prod) → `public.orders` (snowflake_dw)
359+
**Scope:** `created_at >= '2024-01-01'` (Q1 2024 only — 42,301 rows in scope)
360+
**Key:** `order_id`
361+
**Columns compared:** `amount`, `status`, `customer_id`, `region`
362+
**Columns excluded:** `created_at`, `updated_at` (auto-timestamp defaults, per your confirmation)
363+
**Algorithm:** hashdiff
364+
365+
### Result: ✗ DIFFER
366+
367+
| Metric | Value |
368+
|--------|-------|
369+
| Source rows | 42,301 |
370+
| Target rows | 42,298 |
371+
| Only in source | 3 |
372+
| Only in target | 0 |
373+
| Updated rows | 47 |
374+
| Identical rows | 42,251 |
375+
376+
**Findings:**
377+
- 3 rows exist in source but are missing in target → possible ETL delete propagation gap
378+
- 47 rows have value differences in `amount` or `status` → check rounding or status mapping
379+
```
380+
381+
### When result is IDENTICAL — still surface the scope
382+
383+
Even when tables match perfectly, state what was checked:
384+
> "✓ Tables are **identical** across 150,000 rows. Compared `amount`, `status`, `customer_id` (full table, no filter, key=`order_id`). Auto-timestamp columns `created_at`, `updated_at` were excluded."
385+
386+
**Why this matters:** "Tables are identical" without context is meaningless — the user needs to know if you checked Q1 only, skipped 5 columns, or used a WHERE clause that covered just 1% of the data.
387+
388+
---
389+
320390
## Common Mistakes
321391

322392
**Writing manual diff SQL instead of calling data_diff**

0 commit comments

Comments
 (0)