EP-DATACOMPARE: row-level data comparison and sync between two databases

## Summary

PgProj has rich **schema** comparison (`SchemaCompare`, `SchemaComparer`, `IdentityDiffEngine`) but no **data** comparison. SSDT/VS Data Compare compares row-level data between a source and target database, categorizes rows, and synchronizes by updating the target or emitting an editable DML script. Closing this gap brings PgProj to SSDT parity for the data-diff workflow and complements the existing schema-deploy story (reference data / seed tables can drift independently of schema).

## SSDT / SqlPackage reference

VS **Data Compare** provides:

- A **New Data Comparison wizard** to pick source/target databases and the tables/views to compare (eligible objects must have a primary key, unique constraint, or unique index).
- **Comparison-key selection** — choose which key columns identify a logical row and which columns participate in the value comparison.
- **Difference review/filtering** — rows are categorized **Different / Only-in-Source / Only-in-Target / Identical**, with drill-in to per-column diffs and per-row include/exclude toggles.
- **Synchronization** by **Update Target** (transaction-scoped, applied directly) or by **generating an editable DML script**.
- Compare **against a reference database** as the source of truth.

## Proposed PgProj approach

Add a `data-compare` capability that mirrors the existing schema-compare ergonomics and reuses deploy-planner infrastructure:

- **Eligibility + key mapping** — reuse the parsed semantic model to map SSDT comparison keys onto PG **PRIMARY KEY / UNIQUE constraint / unique index**; skip keyless tables/views (report them). Allow explicit key/column overrides per table.
- **Row diff engine** — stream rows via the existing parallel ADO.NET reader, bucket into **Different / Only-in-Source / Only-in-Target / Identical**, and surface per-column diffs. Honor identity/generated columns (excluded from value comparison by default).
- **DML synthesis** — emit a deterministic INSERT/UPDATE/DELETE script (or `INSERT ... ON CONFLICT (key) DO UPDATE` upsert) ordered by **FK topo-sort** (reuse the `PhasedDeployer` / `DeployScriptGenerator` ordering already used by the schema planner). Run destructiveness/risk classification through `RiskAnalyzer`.
- **Apply** — transaction-wrap by default (rollback on failure), reusing `DatabaseDeployer` for execution; validate against the shadow-DB harness in tests.
- **CLI** — `pgproj data-compare --source <conn> --target <conn> [--tables ...] [--output diff.json|script.sql]` to match `schema-compare`; support compare-against-reference-database and profile-driven defaults (`.pgproj` profiles / `.pgpkg`).

## Acceptance criteria

- [ ] `pgproj data-compare --source --target` enumerates eligible tables/views (PK / UNIQUE constraint / unique index) and reports skipped keyless objects.
- [ ] Diff output categorizes every compared row as Different / Only-in-Source / Only-in-Target / Identical with per-column detail, writable to `--output diff.json`.
- [ ] `--output script.sql` emits a deterministic, FK-topo-sorted INSERT/UPDATE/DELETE (or `ON CONFLICT` upsert) DML script that, applied to the target, makes its data match the source.
- [ ] Apply path (Update Target) is transaction-wrapped by default and rolls back on any failure; identity/generated columns are honored.
- [ ] Comparison key and value-column selection can be overridden per table via CLI/profile.
- [ ] Tests cover all four diff categories and a round-trip (compare → apply → re-compare yields Identical) on the shadow-DB harness.

## References

- docs/ssdt/how-to-compare-and-synchronize-the-data-of-two-databases.md (MicrosoftDocs/sql-docs)
- Priority: medium



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EP-DATACOMPARE: row-level data comparison and sync between two databases #132

Summary

SSDT / SqlPackage reference

Proposed PgProj approach

Acceptance criteria

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

EP-DATACOMPARE: row-level data comparison and sync between two databases #132

Description

Summary

SSDT / SqlPackage reference

Proposed PgProj approach

Acceptance criteria

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions