roachtest: add pg_dump round-trip compatibility test #167543
Draft
rafiss wants to merge 7 commits intocockroachdb:masterfrom
Draft
roachtest: add pg_dump round-trip compatibility test #167543rafiss wants to merge 7 commits intocockroachdb:masterfrom
rafiss wants to merge 7 commits intocockroachdb:masterfrom
Conversation
pg_dump fails when connecting to CockroachDB because it looks up catalog
objects by {tableoid, oid} using hardcoded PostgreSQL RelationId constants
(e.g. NamespaceRelationId = 2615 for pg_namespace), but CockroachDB
returns its internal descriptor IDs as tableoid (e.g. 4294967048). The
key mismatch causes "schema with OID X does not exist" errors.
Add a `pg_dump_compatibility` session variable (default off) that, when
enabled, causes virtual pg_catalog tables to report standard PostgreSQL
tableoid values instead of CockroachDB's internal descriptor IDs. The
mapping covers the 10 pg_catalog tables that pg_dump looks up by
hardcoded tableoid: pg_class, pg_type, pg_proc, pg_operator, pg_am,
pg_collation, pg_namespace, pg_extension, pg_publication, and
pg_subscription. Non-pg_catalog virtual tables (information_schema,
crdb_internal) are unaffected.
Informs: cockroachdb#20296
Release note: None
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
pg_dump issues `LOCK TABLE ... IN ACCESS SHARE MODE` for all tables it discovers. CockroachDB didn't parse `LOCK TABLE` at all — `LOCK` wasn't even a keyword in the grammar. Since CockroachDB uses MVCC, lock modes like ACCESS SHARE are semantically meaningless (reads never block), so this can be safely no-oped. Add `LOCK` and `EXCLUSIVE` as unreserved keywords and implement parsing for the full `LOCK TABLE` statement with all eight PostgreSQL lock modes and the optional `NOWAIT` clause. When `pg_dump_compatibility` is true, the statement is accepted as a no-op. Otherwise, it returns a FeatureNotSupported error. Informs: cockroachdb#20296 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
When pg_dump_compatibility is enabled, remap several pg_catalog OIDs so pg_dump treats built-in objects correctly: - pg_cast: assign sequential OIDs starting from 14000 instead of FNV-32 hashes, keeping them below 16384 (FirstNormalObjectId) so pg_dump skips them as built-in - pg_am: emit PostgreSQL's btree (403) and heap (2) access methods instead of CockroachDB's prefix and inverted, preventing broken CREATE ACCESS METHOD statements - pg_class.relam: use heapAmOid (2) for tables and btreeAmOid (403) for forward indexes, preventing SET default_table_access_method output - pg_tablespace: use OID 1663 for pg_default (matching PostgreSQL v17) instead of 0, preventing spurious "Tablespace: pg_default" annotations Informs: cockroachdb#20296 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
When pg_dump_compatibility is enabled, hide CockroachDB-internal objects from pg_catalog tables to prevent pg_dump from attempting to dump them: - pg_proc: skip crdb_internal and information_schema built-in functions with OIDs below 16384, which pg_dump would collect as "user-defined" - pg_class: skip crdb_internal tables so pg_dump does not try to dump internal system tables - pg_namespace: skip the crdb_internal schema to prevent pg_dump from emitting CREATE SCHEMA crdb_internal - pg_type: skip crdb_internal composite types to avoid referencing a missing schema OID Uses schema ID comparison (catconstants.CrdbInternalID) instead of name comparison for robustness. Informs: cockroachdb#20296 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Previously, the pg_dump_compatibility session setting was a boolean. This changes it to a string enum with three values: "off" (default), "postgres", and "cockroachdb". Both "postgres" and "cockroachdb" modes enable the existing pg_catalog compatibility behavior (standard OIDs, hidden crdb_internal schema, etc.). The "postgres" mode additionally suppresses CRDB-specific storage parameters like schema_locked from SHOW CREATE TABLE output, pg_class.reloptions, and LOCALITY clauses, so that pg_dump produces valid PostgreSQL syntax. The GetStorageParams interface gains an excludeCRDBInternal parameter to support filtering schema_locked when in postgres compat mode. Informs: cockroachdb#20296 Release note (sql change): Added a `pg_dump_compatibility` session variable that improves compatibility with the pg_dump PostgreSQL tool. Set it to `postgres` to make pg_catalog report OIDs that match the hardcoded ones expected by pg_dump, and hide CockroachDB-internal objects that do not work with pg_dump, and make other fixups that allow the pg_dump output to run on non-CockroachDB servers. Set it to `cockroachdb` for the same pg_catalog fixes, but still keeping CockroachDB-specific syntax. Like other session variables,`pg_dump_compatibility` can be set in the connection string so that it takes effect as soon as pg_dump connect to CockroachDB. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
When pg_dump_compatibility is enabled, strip the database catalog from table references in pg_views and pg_matviews definitions (e.g. "mydb.public.t" becomes "public.t"). PostgreSQL does not have cross-database references, so view definitions should only use schema-qualified names. Without this fix, pg_dump emits CREATE VIEW statements that reference the source database name, which fail on restore to a differently-named database. The new viewQueryWithoutDatabasePrefix helper parses the view query and re-formats it using FmtReformatTableNames to omit the catalog from all table name nodes. Informs: cockroachdb#20296 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add a weekly roachtest that validates pg_dump round-trip compatibility with CockroachDB. The test builds pg_dump from PostgreSQL source (REL_17_4), creates a representative schema in a source database, dumps it with `pg_dump --schema-only` using the `pg_dump_compatibility` session variable, restores into a target database via psql, dumps again, and compares the two outputs. Known deviations are tracked via an expected-diff baseline file in testdata, following the same pattern as the pg_regress roachtest. The current baseline captures one known issue: pg_dump emits a view before the table it depends on, causing the view to be lost during restore. The test covers enum types, tables with various column types, constraints, secondary indexes (including partial and covering), foreign keys, sequences, views, comments, multi-schema objects, and default expressions. Resolves: cockroachdb#167442 Epic: CRDB-28751 Release note: None
Contributor
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. It looks like your PR touches SQL parser code but doesn't add or edit parser tests. Please make sure you add or edit parser tests if you edit the parser. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a weekly roachtest that validates pg_dump round-trip compatibility
with CockroachDB. The test builds pg_dump from PostgreSQL source
(REL_17_4), creates a representative schema in a source database,
dumps it with
pg_dump --schema-onlyusing thepg_dump_compatibilitysession variable, restores into a target database via psql, dumps
again, and compares the two outputs.
Known deviations are tracked via an expected-diff baseline file in
testdata, following the same pattern as the pg_regress roachtest. The
current baseline captures one known issue: pg_dump emits a view
before the table it depends on, causing the view to be lost during
restore.
The test covers enum types, tables with various column types,
constraints, secondary indexes (including partial and covering),
foreign keys, sequences, views, comments, multi-schema objects,
and default expressions.
Resolves: #167442
Epic: CRDB-28751
Release note: None