Skip to content

fix(cdc): skip mysql no-pk tables in initial snapshot unless chunk key configured#642

Open
Cq-study wants to merge 2 commits intoapache:masterfrom
Cq-study:fix/mysql-cdc-skip-no-pk-snapshot
Open

fix(cdc): skip mysql no-pk tables in initial snapshot unless chunk key configured#642
Cq-study wants to merge 2 commits intoapache:masterfrom
Cq-study:fix/mysql-cdc-skip-no-pk-snapshot

Conversation

@Cq-study
Copy link
Copy Markdown

@Cq-study Cq-study commented Mar 5, 2026

Proposed changes

Issue Number: close #641

Problem Summary:

In MySQL-to-Doris full-database CDC sync (mysql-sync-database), tables without primary keys were not filtered automatically.
When startup mode is initial (incremental snapshot phase), these no-PK tables can fail split-based snapshot reading and cause repeated Flink job restarts.

This PR adds a guard in MySQL schema discovery to improve stability by default.

What is changed

  1. In MysqlDatabaseSync#getSchemaList, skip tables that meet all of the following:
  • table has no primary key
  • startup mode is initial (or empty, treated as initial)
  • no chunk key is configured for that table
  1. Keep no-PK tables syncable when chunk key is explicitly configured via:
  • scan.incremental.snapshot.chunk.key.column (table-level mapping)
  1. Add warning logs for skipped tables with clear guidance to configure chunk key.

  2. Add fail-fast message when all matched tables are skipped:

  • throw explicit exception describing why no table is left for synchronization.
  1. Add unit tests:
  • MysqlDatabaseSyncTest
    • testSkipTableWithoutPrimaryKeyInInitialSnapshot
    • testDoNotSkipTableWithoutPrimaryKeyWhenChunkKeyConfigured
    • testDoNotSkipTableWithoutPrimaryKeyForNonInitialStartup

Why

This avoids default job instability in common real-world schemas where some source tables do not have primary keys, while preserving an explicit opt-in path (chunk key) for required no-PK tables.

Checklist(Required)

  1. Does it affect the original behavior: Yes (only for no-PK tables in initial snapshot mode without chunk key; behavior becomes safer by default)
  2. Has unit tests been added: Yes
  3. Has document been added or modified: No
  4. Does it need to update dependencies: No
  5. Are there any changes that cannot be rolled back: No

Further comments

This change is intentionally scoped to MySQL CDC database sync path and does not alter sink logic or other source connectors.

@JNSimba
Copy link
Copy Markdown
Member

JNSimba commented Mar 30, 2026

This is a behavior change; it's recommended to add a configuration instead of enabling it by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] mysql-sync-database should skip no-PK tables in initial snapshot mode or provide clear guidance

2 participants