Commit 3561a7b
authored
[CASCL-623] feat(cluster-agent): enhance DatadogPodAutoscaler metrics with dedicate metrics store (#46833)
## Scope of This PR
This PR revisits the previous attempt:
#42547
The focus here is strictly on **refactoring the existing metric generation logic**, without introducing new metrics.
A follow-up PR will build on this foundation to introduce additional DPA metrics.
## Motivation
Today, `DatadogPodAutoscaler` (DPA) resource metrics are exposed as OpenMetrics/Prometheus metrics and scraped by the `datadog_cluster_agent` check.
While functional, this approach has several drawbacks:
- All scraped metrics automatically inherit the **Cluster Agent pod context** (`pod_name`, `kube_namespace`, `kube_container_name`, etc.).
- This creates confusion around DPA metric tags, as metrics appear tied to Cluster Agent pods rather than the DPA resources themselves.
- It increases metric cardinality due to multiple Cluster Agent instances (leader, followers, rollouts) contributing additional metric contexts.
## Proposed Approach
This PR changes how DPA metrics are generated.
Instead of exposing them via OpenMetrics and relying on the `datadog_cluster_agent` check for collection, DPA metrics are now produced directly within the Cluster Agent autoscaling component — following the same pattern used for `kubernetes_state` metrics.
This provides:
- Better control over which tags are attached to DPA metrics
- Cleaner and more accurate metric context
- Reduced unnecessary metric cardinality
## Summary
### Core changes
- Refactor `ObserverFunc` signature from `(string, string)` to `(string, interface{})` to pass the actual object to observers, enabling richer metric generation
- Add new `pkg/clusteragent/autoscaling/workload/metrics` package with a `PodAutoscalerMetricsStore` that generates and periodically sends structured metrics (gauges/counts) for `DatadogPodAutoscaler` objects via `sender.Sender`
- Replace old telemetry helpers (`telemetry.go` tag-based metrics) with leader-aware metric submission; metrics are only emitted by the leader
### Action metrics consolidation
- Move horizontal/vertical scaling action metrics from event-driven `Submit*` functions in `counters.go` into the state-based `GeneratePodAutoscalerMetrics` generator
- Delete `counters.go` entirely (`SubmitReceivedRecommendationsVersion` was dead code; remaining `Submit*` functions replaced by the generator)
- Remove `sender` and `isLeader` from `horizontalController` and `verticalController` since they were only used for the now-removed `Submit*` calls
### Metrics emitted
| Metric | Type | Notes |
|--------|------|-------|
| `received_recommendations_version` | Gauge | RC version of last received main scaling values; only emitted when > 0 |
| `horizontal_scaling_received_replicas` | Gauge | Replicas recommended by the product recommender |
| `vertical_scaling_received_requests` | Gauge | Per-container requested resources from recommender |
| `vertical_scaling_received_limits` | Gauge | Per-container resource limits from recommender |
| `horizontal_scaling_applied_replicas` | Gauge | Replicas from the last applied horizontal action |
| `horizontal_scaling_actions` | Count | Cumulative count of horizontal scaling actions, tagged `status:ok` or `status:error` |
| `vertical_rollout_triggered` | Count | Cumulative count of vertical rollout actions, tagged `status:ok` or `status:error` |
| `autoscaler_conditions` | Gauge | 1.0/0.0 per condition type from CRD status |
| `local_fallback_enabled` | Gauge | 1.0 when horizontal active source is local fallback |
### Model changes
- Add `mainScalingValuesVersion uint64` to `PodAutoscalerInternal` to persist the remote config version of the last received main scaling values
- Extend `UpdateFromMainValues` to accept and persist the RC version; `RemoveMainValues` resets it to 0
- Add `horizontalActionErrorCount`/`horizontalActionSuccessCount` and `verticalActionErrorCount`/`verticalActionSuccessCount` counter fields, incremented on each action outcome
- Add public getters: `MainScalingValuesVersion()`, `HorizontalActionErrorCount()`, `HorizontalActionSuccessCount()`, `VerticalActionErrorCount()`, `VerticalActionSuccessCount()`
## Test plan
- [x] Unit tests added for `metrics/store`, `metrics/generator`, and `metrics/writer`
- [x] `generator_test.go` covers all 9 metrics including both `status:ok` and `status:error` count variants
- [x] `config_retriever_values_test.go` updated with expected `MainScalingValuesVersion` values
- [x] `controller_horizontal_test.go` updated to assert `HorizontalActionSuccessCount`/`HorizontalActionErrorCount`
- [x] Existing `workload` controller tests updated to remove now-deleted `sender`/`isLeader` fixtures
- [x] All tests in `pkg/clusteragent/autoscaling/...` pass locally
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: cedric.lamoriniere <cedric.lamoriniere@datadoghq.com>1 parent f6e6779 commit 3561a7b
33 files changed
Lines changed: 1435 additions & 383 deletions
File tree
- .github
- cmd/cluster-agent/subcommands/start
- pkg/clusteragent
- autoscaling
- cluster
- externalmetrics
- workload
- external
- local
- metrics
- model
- provider
- metricsstore
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
466 | 466 | | |
467 | 467 | | |
468 | 468 | | |
| 469 | + | |
469 | 470 | | |
470 | 471 | | |
471 | 472 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
538 | 538 | | |
539 | 539 | | |
540 | 540 | | |
541 | | - | |
| 541 | + | |
542 | 542 | | |
543 | 543 | | |
544 | 544 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | | - | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
| 42 | + | |
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | | - | |
| 123 | + | |
124 | 124 | | |
125 | 125 | | |
126 | 126 | | |
| |||
Lines changed: 6 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| |||
Lines changed: 7 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
32 | | - | |
33 | | - | |
34 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
| |||
146 | 147 | | |
147 | 148 | | |
148 | 149 | | |
149 | | - | |
150 | | - | |
| 150 | + | |
| 151 | + | |
151 | 152 | | |
152 | 153 | | |
153 | 154 | | |
| |||
352 | 353 | | |
353 | 354 | | |
354 | 355 | | |
355 | | - | |
| 356 | + | |
356 | 357 | | |
357 | 358 | | |
358 | 359 | | |
| |||
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | | - | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
| |||
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | | - | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
108 | 110 | | |
109 | 111 | | |
110 | 112 | | |
111 | | - | |
| 113 | + | |
112 | 114 | | |
113 | 115 | | |
114 | 116 | | |
115 | 117 | | |
116 | | - | |
| 118 | + | |
117 | 119 | | |
118 | 120 | | |
119 | 121 | | |
120 | | - | |
| 122 | + | |
121 | 123 | | |
122 | | - | |
| 124 | + | |
123 | 125 | | |
124 | 126 | | |
125 | 127 | | |
126 | 128 | | |
127 | | - | |
| 129 | + | |
128 | 130 | | |
129 | 131 | | |
130 | 132 | | |
| |||
152 | 154 | | |
153 | 155 | | |
154 | 156 | | |
155 | | - | |
| 157 | + | |
156 | 158 | | |
157 | 159 | | |
158 | 160 | | |
159 | | - | |
| 161 | + | |
160 | 162 | | |
161 | 163 | | |
162 | 164 | | |
163 | | - | |
164 | | - | |
| 165 | + | |
| 166 | + | |
165 | 167 | | |
166 | 168 | | |
167 | 169 | | |
168 | 170 | | |
169 | 171 | | |
170 | | - | |
| 172 | + | |
171 | 173 | | |
172 | 174 | | |
173 | 175 | | |
174 | 176 | | |
175 | | - | |
| 177 | + | |
176 | 178 | | |
177 | 179 | | |
178 | 180 | | |
179 | 181 | | |
180 | | - | |
| 182 | + | |
181 | 183 | | |
182 | 184 | | |
0 commit comments