mysql_cdc: single-table snapshots are not parallelised, making very large tables the bottleneck

## Problem

`mysql_cdc` reads each table's initial snapshot through a single goroutine iterating rows with keyset pagination. Even after #4320 adds inter-table parallelism, the snapshot of any *individual* table is still strictly sequential, so pipelines dominated by a single very large table see essentially no speedup.

Concretely, on a representative message-style table:

- 80M rows observed throughput: ~30M rows/hour
- Extrapolated 400M row production table: ~12 hours
- AWS DMS completes the same 400M rows in ~45 minutes

The ~16x gap vs DMS is almost entirely intra-table. DMS and Debezium's incremental snapshot both split the primary-key range of a single table across multiple workers.

## Root cause

`readSnapshotTable` does one keyset-paginated SELECT per batch:

```
SELECT * FROM t ORDER BY pk LIMIT N
SELECT * FROM t WHERE (pk) > (last) ORDER BY pk LIMIT N
...
```

Only one goroutine ever drives that loop per table. With #4320 a different table can run on a different worker, but inside any one table the reads are strictly serial.

For workloads where 90%+ of snapshot time is one table, the effective parallelism is 1 regardless of how `snapshot_max_parallel_tables` is set.

## Desired behaviour

Operators should be able to partition a single table's primary-key space into N chunks and have the existing worker pool consume chunks in parallel. The consistency guarantee (every worker reads under one `FLUSH TABLES WITH READ LOCK` window at the same binlog position) must be preserved.

Target throughput: match AWS DMS on the 400M row reference workload — ~45 minutes, with 1 hour as an acceptable ceiling. That requires meaningful concurrency (~16 workers x reasonable chunk count) on a single table.

## Proposed solution

A new advanced field `snapshot_chunks_per_table` that, when set above 1:

1. Under the existing consistent-snapshot window, probes `MIN(pk)` and `MAX(pk)` on the first primary-key column.
2. Splits `[MIN, MAX]` into N half-open chunks.
3. Emits one work unit per chunk and fans them out across the worker pool controlled by `snapshot_max_parallel_tables`.

Scope of the first cut:

- **Single-column integer PKs**: fully supported.
- **Composite PKs**: chunk on the leading PK column; per-chunk keyset pagination continues to respect the full PK ordering. This is the common shape for `(tenant_id, id)` / `(shard, ts)` composites.
- **Non-numeric first PK columns** (VARCHAR, UUID, binary, etc.): fall back to a single whole-table read with an informational log line. Supporting these cleanly requires boundary discovery via sampling or `OFFSET`-based probing and is deliberately out of scope here.

Skewed leading-column distributions (e.g. 90% of rows under one `tenant_id`) will produce uneven chunks; operators hitting that can simply leave `snapshot_chunks_per_table` at 1 and rely on table-level parallelism alone. No correctness impact.

PR implementing this: will be linked below.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mysql_cdc: single-table snapshots are not parallelised, making very large tables the bottleneck #4341

Problem

Root cause

Desired behaviour

Proposed solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

mysql_cdc: single-table snapshots are not parallelised, making very large tables the bottleneck #4341

Description

Problem

Root cause

Desired behaviour

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions