You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add Step 9 result presentation guidelines to data-parity skill
Require that every diff result summary surfaces:
- Exact scope (tables + warehouses compared)
- Filters and time period applied (or explicitly states none)
- Key columns used and how they were confirmed
- Columns compared and excluded, with reasons (auto-timestamp, user request)
- Algorithm used
Includes example full result summary and guidance for identical results —
emphasising that bare numbers without context are meaningless to the user.
When reporting diff results, **never present bare numbers**. Always frame the result with the full context that determines what the numbers actually mean.
323
+
324
+
### Required elements in every result summary
325
+
326
+
**1. Scope — what was compared**
327
+
State exactly which tables/queries were diffed and on which warehouses:
328
+
> "Compared `public.orders` on **postgres_prod** vs `public.orders` on **snowflake_dw**"
329
+
330
+
**2. Filters and time period applied**
331
+
If any `where_clause` or `partition_column` was used, state it explicitly:
332
+
> "Scope limited to: `created_at >= '2024-01-01' AND created_at < '2024-04-01'` (Q1 2024 only)"
333
+
> "Partitioned by `l_shipdate` (monthly buckets) — diff covers Jan 2023 through Mar 2024"
**Columns excluded:** `created_at`, `updated_at` (auto-timestamp defaults, per your confirmation)
363
+
**Algorithm:** hashdiff
364
+
365
+
### Result: ✗ DIFFER
366
+
367
+
| Metric | Value |
368
+
|--------|-------|
369
+
| Source rows | 42,301 |
370
+
| Target rows | 42,298 |
371
+
| Only in source | 3 |
372
+
| Only in target | 0 |
373
+
| Updated rows | 47 |
374
+
| Identical rows | 42,251 |
375
+
376
+
**Findings:**
377
+
- 3 rows exist in source but are missing in target → possible ETL delete propagation gap
378
+
- 47 rows have value differences in `amount` or `status` → check rounding or status mapping
379
+
```
380
+
381
+
### When result is IDENTICAL — still surface the scope
382
+
383
+
Even when tables match perfectly, state what was checked:
384
+
> "✓ Tables are **identical** across 150,000 rows. Compared `amount`, `status`, `customer_id` (full table, no filter, key=`order_id`). Auto-timestamp columns `created_at`, `updated_at` were excluded."
385
+
386
+
**Why this matters:** "Tables are identical" without context is meaningless — the user needs to know if you checked Q1 only, skipped 5 columns, or used a WHERE clause that covered just 1% of the data.
387
+
388
+
---
389
+
320
390
## Common Mistakes
321
391
322
392
**Writing manual diff SQL instead of calling data_diff**
0 commit comments