fix(cdc): skip mysql no-pk tables in initial snapshot unless chunk key configured#642
Open
Cq-study wants to merge 2 commits intoapache:masterfrom
Open
fix(cdc): skip mysql no-pk tables in initial snapshot unless chunk key configured#642Cq-study wants to merge 2 commits intoapache:masterfrom
Cq-study wants to merge 2 commits intoapache:masterfrom
Conversation
Member
|
This is a behavior change; it's recommended to add a configuration instead of enabling it by default. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
Issue Number: close #641
Problem Summary:
In MySQL-to-Doris full-database CDC sync (
mysql-sync-database), tables without primary keys were not filtered automatically.When startup mode is
initial(incremental snapshot phase), these no-PK tables can fail split-based snapshot reading and cause repeated Flink job restarts.This PR adds a guard in MySQL schema discovery to improve stability by default.
What is changed
MysqlDatabaseSync#getSchemaList, skip tables that meet all of the following:initial(or empty, treated as initial)scan.incremental.snapshot.chunk.key.column(table-level mapping)Add warning logs for skipped tables with clear guidance to configure chunk key.
Add fail-fast message when all matched tables are skipped:
MysqlDatabaseSyncTesttestSkipTableWithoutPrimaryKeyInInitialSnapshottestDoNotSkipTableWithoutPrimaryKeyWhenChunkKeyConfiguredtestDoNotSkipTableWithoutPrimaryKeyForNonInitialStartupWhy
This avoids default job instability in common real-world schemas where some source tables do not have primary keys, while preserving an explicit opt-in path (chunk key) for required no-PK tables.
Checklist(Required)
Further comments
This change is intentionally scoped to MySQL CDC database sync path and does not alter sink logic or other source connectors.