Skip to content

TiCDC can spike memory and CPU when bulk-creating idle changefeeds #4831

@wlwilliamx

Description

@wlwilliamx

What did you do?

Bulk-create many TiCDC changefeeds whose tables have no row traffic.

Observed workload:

  • Creating 800 changefeeds can OOM a 96 GB machine.
  • Creating 400 changefeeds can quickly push TiCDC memory to about 60 GB and total CPU usage to about 93%.
  • After roughly 3 minutes, memory drops back to about 17 GB.

Code inspection on upstream/master found that changefeed creation can schedule many maintainer bootstraps concurrently.

What did you expect to see?

Bulk changefeed creation should respect the configured scheduler concurrency limit and avoid launching hundreds of maintainer bootstraps at the same time.

Memory and CPU should increase gradually during creation, and should not spike high enough to OOM a machine when the steady-state workload can run those changefeeds.

What did you see instead?

The coordinator is created with hard-coded scheduling settings:

  • max task concurrency: 10000
  • balance interval: time.Minute

This bypasses the server scheduler config, whose default max-task-concurrency is 10.

The basic scheduler uses this value as its batch size for absent changefeeds. When hundreds of changefeeds are bulk-created, many AddMaintainer operators can be issued almost at once.

Each maintainer bootstrap performs startup work even when tables have no row traffic, including loading table metadata from schema store and building schema/span info. loadAllPhysicalTablesAtTs currently also loads full table metadata before applying table filters. This makes creation-time memory and CPU scale poorly with the number of concurrently bootstrapping changefeeds.

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

Not captured from the original workload.

Upstream TiKV version (execute tikv-server --version):

Not captured from the original workload.

TiCDC version (execute cdc version):

Code issue verified by inspection on upstream/master at 0a418b4132466aa084517ec7137b3d5f24013dcc.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions