Fast watcher, slow processing issue

**Seeking help from k8s experts.**

I leveraged client-go / controller-runtime to implement a controller for my CRD. And now I noticed a symptom that my controller's performance cannot be improved no matter I added more shards to controller or increased the max-requests-inflight/max-mutating-request-inflight. 

Below is the overview of my CRD reconciling.

1. Add finalizer
2. Mark the CRD status to pending
3. Create another CRD and waits for the status to be ready.
4. Mark the CRD status to running.

The avg latency of above 4 steps is around 1s - 5s. 

I simulated 10000 CRDs creation, and found the E2E duration for all CRD becoming running needs around ~20s. 
I observed sometimes entering reconcile (step #1) occurs 8s after the CR creation on api server side.
When I checked api server logs, I found 
https://github.com/kubernetes/kubernetes/blob/release-1.28/staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher.go#L139
- around 80k "Fast watcher, slow processing. Probably caused by slow decoding, user not receiving fast, or other processing logic" incomingEvents=100 objectType="*unstructured.Unstructured" ..."
- around 500 "Fast watcher, slow processing. Probably caused by slow dispatching events to watchers" outgoingEvents=100 objectType="*unstructured.Unstructured" ..."

I cannot tell whether the bottleneck is on controller side or api server side? I tried to increase the shards of the controller, but no help. And I also observed the cpu/memory usage of k8s api server, the usage is around ~50%, not very high.

Any suggestions how to do the further troubleshooting and improve the controller's performance?

The parameters I used:
1. controller: 3 shards and max_concurrent_reconciles of each shard is 2000 (the load is balanced across all shards).
2. api server side: 3 api server and max-requests-inflight = 2000, max-mutating-request-inflight = 2000 on every api server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast watcher, slow processing issue #3501

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fast watcher, slow processing issue #3501

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions