[FLINK-34806][postgres] Expose scan.newly-added-table.enabled option#4436
Open
cansakiroglu wants to merge 2 commits into
Open
[FLINK-34806][postgres] Expose scan.newly-added-table.enabled option#4436cansakiroglu wants to merge 2 commits into
cansakiroglu wants to merge 2 commits into
Conversation
Add the scan.newly-added-table.enabled YAML option to the Postgres Pipeline connector. The underlying SnapshotSplitAssigner.captureNewlyAddedTables() mechanism + PostgresSourceBuilder.scanNewlyAddedTableEnabled() builder method already exist in the postgres-cdc source; this PR adds the missing YAML-side wiring. Mirrors the same option already exposed by the MySQL Pipeline connector (MySqlDataSourceOptions.SCAN_NEWLY_ADDED_TABLE_ENABLED). Default is false, so the change is no-op for existing pipelines. When set to true, restoring from a savepoint will discover tables that match the source tables: pattern but were not part of the captured set at savepoint time — enabling DMS-style 'add a new table without re-snapshotting existing tables' workflows. Signed-off-by: Mehmet Can Şakiroğlu <cansakiroglu@gmail.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR exposes the scan.newly-added-table.enabled YAML option for the Postgres Pipeline connector by adding the missing ConfigOption and wiring it through PostgresDataSourceFactory into the underlying PostgresSourceBuilder.
Changes:
- Added
scan.newly-added-table.enabledtoPostgresDataSourceOptions(defaultfalse). - Plumbed the option from
PostgresDataSourceFactoryconfiguration intoPostgresSourceBuilder.scanNewlyAddedTableEnabled(...). - Registered the option in the factory’s
optionalOptions()set so it is accepted by config validation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
.../source/PostgresDataSourceOptions.java |
Defines the new scan.newly-added-table.enabled config option. |
.../factory/PostgresDataSourceFactory.java |
Reads the option from config, passes it into the source builder, and adds it to optional options. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SUMMARY
Add the scan.newly-added-table.enabled YAML option to the Postgres Pipeline connector. The underlying SnapshotSplitAssigner.captureNewlyAddedTables() mechanism + PostgresSourceBuilder.scanNewlyAddedTableEnabled() builder method already exist in the postgres-cdc source; this PR adds the missing YAML-side wiring.
Mirrors the same option already exposed by the MySQL Pipeline connector (MySqlDataSourceOptions.SCAN_NEWLY_ADDED_TABLE_ENABLED).
Default is false, so the change is no-op for existing pipelines. When set to true, restoring from a savepoint will discover tables that match the source tables: pattern but were not part of the captured set at savepoint time — enabling DMS-style 'add a new table without re-snapshotting existing tables' workflows.
JIRA
FLINK-34806
"[Feature][Postgres] Support automatically identify newly added tables"
What changes
Adds the
scan.newly-added-table.enabledYAML option to the Postgres Pipelineconnector.
No behaviour change unless the user opts in (
defaultValue(false)).Why
The MySQL Pipeline connector exposes the same option via
MySqlDataSourceOptions.SCAN_NEWLY_ADDED_TABLE_ENABLEDand reads it inMySqlDataSourceFactory. This PR brings the Postgres Pipeline connector to parity.When this matters
Adding a new table to a long-running pipeline. Today the only way to capture a newly-created PG table on an already-running Pipeline job is to cancel the job and re-snapshot every captured table from scratch. With this option set to
true, on savepoint+restore the source compares the saved snapshot's table set against PG's current table set, picks up newly-matching tables, snapshots only the new ones, and resumes the existing captured tables from their saved WAL offsets. No re-snapshot of existing tables, no source-side load spike.Default
false— preserves current behaviour for existing pipelines. Opt-in only.