Summary
PgProj has rich schema comparison (SchemaCompare, SchemaComparer, IdentityDiffEngine) but no data comparison. SSDT/VS Data Compare compares row-level data between a source and target database, categorizes rows, and synchronizes by updating the target or emitting an editable DML script. Closing this gap brings PgProj to SSDT parity for the data-diff workflow and complements the existing schema-deploy story (reference data / seed tables can drift independently of schema).
SSDT / SqlPackage reference
VS Data Compare provides:
- A New Data Comparison wizard to pick source/target databases and the tables/views to compare (eligible objects must have a primary key, unique constraint, or unique index).
- Comparison-key selection — choose which key columns identify a logical row and which columns participate in the value comparison.
- Difference review/filtering — rows are categorized Different / Only-in-Source / Only-in-Target / Identical, with drill-in to per-column diffs and per-row include/exclude toggles.
- Synchronization by Update Target (transaction-scoped, applied directly) or by generating an editable DML script.
- Compare against a reference database as the source of truth.
Proposed PgProj approach
Add a data-compare capability that mirrors the existing schema-compare ergonomics and reuses deploy-planner infrastructure:
- Eligibility + key mapping — reuse the parsed semantic model to map SSDT comparison keys onto PG PRIMARY KEY / UNIQUE constraint / unique index; skip keyless tables/views (report them). Allow explicit key/column overrides per table.
- Row diff engine — stream rows via the existing parallel ADO.NET reader, bucket into Different / Only-in-Source / Only-in-Target / Identical, and surface per-column diffs. Honor identity/generated columns (excluded from value comparison by default).
- DML synthesis — emit a deterministic INSERT/UPDATE/DELETE script (or
INSERT ... ON CONFLICT (key) DO UPDATE upsert) ordered by FK topo-sort (reuse the PhasedDeployer / DeployScriptGenerator ordering already used by the schema planner). Run destructiveness/risk classification through RiskAnalyzer.
- Apply — transaction-wrap by default (rollback on failure), reusing
DatabaseDeployer for execution; validate against the shadow-DB harness in tests.
- CLI —
pgproj data-compare --source <conn> --target <conn> [--tables ...] [--output diff.json|script.sql] to match schema-compare; support compare-against-reference-database and profile-driven defaults (.pgproj profiles / .pgpkg).
Acceptance criteria
References
- docs/ssdt/how-to-compare-and-synchronize-the-data-of-two-databases.md (MicrosoftDocs/sql-docs)
- Priority: medium
Summary
PgProj has rich schema comparison (
SchemaCompare,SchemaComparer,IdentityDiffEngine) but no data comparison. SSDT/VS Data Compare compares row-level data between a source and target database, categorizes rows, and synchronizes by updating the target or emitting an editable DML script. Closing this gap brings PgProj to SSDT parity for the data-diff workflow and complements the existing schema-deploy story (reference data / seed tables can drift independently of schema).SSDT / SqlPackage reference
VS Data Compare provides:
Proposed PgProj approach
Add a
data-comparecapability that mirrors the existing schema-compare ergonomics and reuses deploy-planner infrastructure:INSERT ... ON CONFLICT (key) DO UPDATEupsert) ordered by FK topo-sort (reuse thePhasedDeployer/DeployScriptGeneratorordering already used by the schema planner). Run destructiveness/risk classification throughRiskAnalyzer.DatabaseDeployerfor execution; validate against the shadow-DB harness in tests.pgproj data-compare --source <conn> --target <conn> [--tables ...] [--output diff.json|script.sql]to matchschema-compare; support compare-against-reference-database and profile-driven defaults (.pgprojprofiles /.pgpkg).Acceptance criteria
pgproj data-compare --source --targetenumerates eligible tables/views (PK / UNIQUE constraint / unique index) and reports skipped keyless objects.--output diff.json.--output script.sqlemits a deterministic, FK-topo-sorted INSERT/UPDATE/DELETE (orON CONFLICTupsert) DML script that, applied to the target, makes its data match the source.References