feat(device): add cpu.meters config and switch-based CPU meter selection#2468
Conversation
|
|
|
📊 Profiling reports are ready to be viewed
💻 CPU Comparison with base Kepler💾 Memory Comparison with base Kepler (Inuse)💾 Memory Comparison with base Kepler (Alloc)⬇️ Download the Profiling artifacts from the Actions Summary page 📦 Artifact name: 🔧 Or use GitHub CLI to download artifacts: gh run download 25287442059 -n profile-artifacts-2468 |
80a3e01 to
9387d88
Compare
|
|
9387d88 to
5dd5593
Compare
|
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2468 +/- ##
==========================================
+ Coverage 91.89% 91.97% +0.08%
==========================================
Files 55 56 +1
Lines 5859 5945 +86
==========================================
+ Hits 5384 5468 +84
Misses 340 340
- Partials 135 137 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
📊 Profiling reports are ready to be viewed
💻 CPU Comparison with base Kepler💾 Memory Comparison with base Kepler (Inuse)💾 Memory Comparison with base Kepler (Alloc)⬇️ Download the Profiling artifacts from the Actions Summary page 📦 Artifact name: 🔧 Or use GitHub CLI to download artifacts: gh run download 25287694043 -n profile-artifacts-2468 |
5dd5593 to
b0c6206
Compare
|
|
|
📊 Profiling reports are ready to be viewed
💻 CPU Comparison with base Kepler💾 Memory Comparison with base Kepler (Inuse)💾 Memory Comparison with base Kepler (Alloc)⬇️ Download the Profiling artifacts from the Actions Summary page 📦 Artifact name: 🔧 Or use GitHub CLI to download artifacts: gh run download 25287772647 -n profile-artifacts-2468 |
Promote the unexported `powerMeter` interface in `internal/device/power_meter.go` to an exported `PowerMeter` requiring `Name` and `Init` (via `service.Service` and `service.Initializer`). Backends with resources can opt into cleanup by implementing `service.Shutdowner`; the run framework already type-asserts it. `device.CPUPowerMeter` embeds `PowerMeter`. The fake CPU meter and `MockCPUPowerMeter` gain a no-op `Init()` to satisfy the interface; no behaviour change. Signed-off-by: nikimanoledaki <niki.manoledaki@grafana.com>
…ction `createCPUMeter` in `cmd/kepler/main.go` hardcoded the RAPL → hwmon fallback chain. This commit replaces it with a config-driven switch: - New `cfg.Cpu.Meters` config (default `["rapl", "hwmon"]`) lets operators reorder backend priority without forking. - New `device.CreateCPUMeter` walks the list, builds each backend, runs `Init()`, and returns the first that reports zones. Real failures aggregate via `errors.Join`; empty zones are a soft skip. - New `buildCPUMeter` is a switch over backend names — adding a new backend means adding one case and one constructor. - `cmd/kepler/main.go` calls `device.CreateCPUMeter` and drops the old `createCPUMeter` / `createHwmonMeter` helpers (~70 lines deleted). Two legacy keys translate to `cpu.meters` at startup with a deprecation warning, preserving operator behaviour: - `dev.fake-cpu-meter.enabled: true` → `cpu.meters: ["fake"]` - `experimental.hwmon.forceEnabled: true` → `cpu.meters: ["hwmon"]` Sample configs (`hack/config.yaml`, helm values, k8s configmap, dev compose) and `docs/user/configuration.md` updated. No behaviour change for healthy systems running default config. Signed-off-by: nikimanoledaki <niki.manoledaki@grafana.com>
b0c6206 to
fd6e5d1
Compare
|
📊 Profiling reports are ready to be viewed
💻 CPU Comparison with base Kepler💾 Memory Comparison with base Kepler (Inuse)💾 Memory Comparison with base Kepler (Alloc)⬇️ Download the Profiling artifacts from the Actions Summary page 📦 Artifact name: 🔧 Or use GitHub CLI to download artifacts: gh run download 25288135534 -n profile-artifacts-2468 |
|
|
|
📊 Profiling reports are ready to be viewed
💻 CPU Comparison with base Kepler💾 Memory Comparison with base Kepler (Inuse)💾 Memory Comparison with base Kepler (Alloc)⬇️ Download the Profiling artifacts from the Actions Summary page 📦 Artifact name: 🔧 Or use GitHub CLI to download artifacts: gh run download 25288808461 -n profile-artifacts-2468 |
PowerMeter interface and CPU registrycpu.meters config and switch-based CPU meter selection
Three CI fixes for PR #2468: - `yamllint`: drop redundant double-quotes in `meters: [rapl, hwmon]` across `hack/config.yaml`, `compose/dev/kepler-dev/etc/kepler/config.yaml`, and `manifests/helm/kepler/values.yaml`. Project's `.yamllint.yaml` enforces `quoted-strings: required: only-when-needed`. - `compose-deploy` / `build-and-deploy`: the CI runs `sed -i '/fake-cpu-meter:/{n;s/enabled: false/enabled: true/}'` to flip the fake meter on for tests. The previous commit added a multi-line deprecation comment between `fake-cpu-meter:` and `enabled: false`, which broke the sed (`n` jumped to the comment instead of `enabled:`). Move the deprecation note to a trailing comment on the same line as `enabled: false` in all four affected files. - `codecov/patch`: add `internal/device/cpu_power_meter_test.go` covering `CreateCPUMeter` and `buildCPUMeter` (success, unknown name, fallthrough, empty meters). Signed-off-by: nikimanoledaki <niki.manoledaki@grafana.com>
|
|
|
📊 Profiling reports are ready to be viewed
💻 CPU Comparison with base Kepler💾 Memory Comparison with base Kepler (Inuse)💾 Memory Comparison with base Kepler (Alloc)⬇️ Download the Profiling artifacts from the Actions Summary page 📦 Artifact name: 🔧 Or use GitHub CLI to download artifacts: gh run download 25291674840 -n profile-artifacts-2468 |
…eter Extends `cpu_power_meter_test.go` to exercise the previously-uncovered new lines flagged by codecov/patch: - `buildCPUMeter` "rapl" case (with `cfg.Rapl.Zones` filter logging). - `buildCPUMeter` "hwmon" case with experimental config (zones + chipRules). - `buildCPUMeter` "hwmon" case with nil experimental config. - `CreateCPUMeter` factory-error path (rapl with bogus sysfs). - `CreateCPUMeter` Init-error path (hwmon with bogus sysfs). - `CreateCPUMeter` aggregated-error path (rapl + hwmon both fail). Tests use `/nonexistent/sysfs/path` to force factory or Init failures without requiring real hardware. Signed-off-by: nikimanoledaki <niki.manoledaki@grafana.com>
|
|
|
📊 Profiling reports are ready to be viewed
💻 CPU Comparison with base Kepler💾 Memory Comparison with base Kepler (Inuse)💾 Memory Comparison with base Kepler (Alloc)⬇️ Download the Profiling artifacts from the Actions Summary page 📦 Artifact name: 🔧 Or use GitHub CLI to download artifacts: gh run download 25303089672 -n profile-artifacts-2468 |
Three polish changes on top of the switch-based design:
- `ApplyCpuMeterDeprecations` rewritten as a `switch` for symmetry. The
previous `if { ...; return } if { ... }` shape had a load-bearing early
return that read as asymmetric.
- `cpu_power_meter.go` logging uses static messages with a `"meter"` attr
instead of `fmt.Sprintf`-baked dynamic strings. Mirrors the existing
convention in `gpu/registry.go` and the rest of `internal/device/`,
and gives operators a queryable `"meter"` attr that didn't exist before.
- `config_cpu_test.go` collapses `TestCpuMetersDefault` and
`TestApplyCpuMeterDeprecations` into a single `TestCpuMeters` table.
The default case is one row where `setup` is a no-op. Drops the
`io.Discard` boilerplate for `slog.DiscardHandler` (Go 1.24+).
Signed-off-by: nikimanoledaki <niki.manoledaki@grafana.com>
|
|
When all configured backends in `cpu.Meters` returned empty zones, the soft-skip path discarded that signal and the function returned a misleading "cpu.meters is empty" error. Append to `errs` on the empty-zones path so the joined error reports each skipped backend by name. Handle the truly empty-config case with an early return at the top of the function. Addresses review feedback on #2468. Signed-off-by: nikimanoledaki <niki.manoledaki@grafana.com>
Validate `cpu.meters` against the set of known backends in `Config.Validate` and return an explicit error listing the valid names. A typo such as `["rappl", "hwmon"]` is now caught at startup instead of silently falling through to the next backend. The `Warn` in `CreateCPUMeter` stays as defense-in-depth for callers that skip validation. Addresses review feedback on #2468. Signed-off-by: nikimanoledaki <niki.manoledaki@grafana.com>
- Switch `discardLogger()` in `cpu_power_meter_test.go` to `slog.New(slog.DiscardHandler)` to match `config_cpu_test.go`. Drops the unused `io` import. - Note in `ApplyCpuMeterDeprecations` that `fake` takes precedence over `hwmon` when both legacy keys are set. Addresses review feedback on #2468. Signed-off-by: nikimanoledaki <niki.manoledaki@grafana.com>
|
|
|
📊 Profiling reports are ready to be viewed
💻 CPU Comparison with base Kepler💾 Memory Comparison with base Kepler (Inuse)💾 Memory Comparison with base Kepler (Alloc)⬇️ Download the Profiling artifacts from the Actions Summary page 📦 Artifact name: 🔧 Or use GitHub CLI to download artifacts: gh run download 25435843659 -n profile-artifacts-2468 |
vimalk78
left a comment
There was a problem hiding this comment.
minor comment on the new config name already in the proposal PR, can be addressed in later PRs.
Part of #2467.
Why
createCPUMeterincmd/kepler/main.gohardcodes the RAPL → hwmon fallback chain. Operators on platforms where neither is the right source cannot reorder backend priority without forking. This PR makes the priority list config-driven without inventing new vocabulary (no registry, no factories, no init() side effects).What
Changes
optional via service.Shutdowner.
first that yields zones. Real failures aggregate via errors.Join; empty zones are a soft skip.
Behaviour change
None for systems running the default config. Preserves the same RAPL to hwmon fallback. Operators using the legacy keys see a deprecation warning but continue to get the same selection.