fix: etcd fails due to unaccounted space usage (e.g. WAL)#115
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new EtcdVolumeStatsProvider to monitor and reconcile etcd volume usage by querying kubelet stats, allowing for more accurate tracking of disk usage beyond just the etcd database size. It includes updates to the etcd client to support filtering by ready pods, improvements to the etcd cluster reconciler to incorporate filesystem usage metrics, and necessary dependency updates. I have no feedback to provide.
There was a problem hiding this comment.
Pull request overview
This PR fixes etcd volume auto-resize logic by using actual filesystem usage (via kubelet stats summary) instead of relying solely on etcd-reported DB size, accounting for extra space usage like WAL/snapshot files.
Changes:
- Add an etcd volume stats provider that queries kubelet
/stats/summaryand computes max etcd volume usage across pods. - Update the etcd cluster reconciler to compute an “effective” volume usage as
max(dbSize, filesystemUsage)and emit warnings when they diverge significantly. - Update etcd status collection to target ready pods and parallelize member calls; adjust wiring, stubs, and tests accordingly.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
test/etcd_stubs.go |
Updates etcd client stub signature and adds a stub for the new volume stats provider. |
pkg/reconcilers/etcd_cluster/volume_stats/volume_stats.go |
Implements kubelet-based filesystem usage collection for etcd data volumes. |
pkg/reconcilers/etcd_cluster/volume_stats/volume_stats_test.go |
Adds unit tests for volume identification (isEtcdDataVolume). |
pkg/reconcilers/etcd_cluster/reconciler.go |
Integrates filesystem usage into resize decisions; adds pod listing and adjusts health check placement and args gating. |
pkg/reconcilers/etcd_cluster/reconciler_test.go |
Extends tests for new effective-usage and warning behaviors. |
pkg/reconcilers/etcd_cluster/etcd_client/etcd_client.go |
Changes GetStatuses API to accept ready pod names and parallelizes per-member calls. |
pkg/hostedcontrolplane/controller.go |
Wires the new volume stats provider into the reconciler; adjusts events RBAC verbs. |
go.mod |
Bumps several dependencies (k8s libs, controller-runtime, grpc, cilium). |
go.sum |
Updates checksums for dependency upgrades. |
.golangci.yaml |
Adds import aliasing for kubelet stats API package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Head branch was pushed to by a user without write access
ea81376 to
2e6eac4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2e6eac4 to
516cda8
Compare
516cda8 to
b37d3d4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b37d3d4 to
2a5b8ad
Compare
2a5b8ad to
3c927cd
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3c927cd to
d38cd45
Compare
d38cd45 to
1dc1b75
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
382c399 to
76e651c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 13 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fe0dd96 to
b4747de
Compare
b4747de to
daeb324
Compare
This way the real volume usage is looked at and used for resizing.
chore: replace span/log/recorder with emit
daeb324 to
8e3d4cd
Compare
This way the real volume usage is looked at and used for resizing.