docs(opslog): align --track-bucket-slo descriptions with CLI help text#50
docs(opslog): align --track-bucket-slo descriptions with CLI help text#50senolcolak wants to merge 3 commits into
Conversation
Introduce dedicated low-cardinality bucket SLI tracking for GET/LIST-style operations in the opslog producer. This adds a new TrackBucketSLO metrics option, wires it through CLI flags and environment configuration, and enables it automatically when track-everything is selected. When enabled, metrics updates now classify supported bucket operations, derive HTTP status classes, and emit dedicated Prometheus SLI request and latency metrics. Also document the new flag, environment variable, and exported metrics in the opslog README, and add tests covering SLI operation classification and metric emission. Promote github.com/prometheus/client_model to a direct dependency for the new metric assertions.
Prevent malformed opslog HTTP status values from creating unexpected Prometheus label values in bucket SLI metrics. Instead of blindly deriving the label from the first byte, status classification now returns when the status is empty or does not start with a digit. This keeps SLI metric cardinality bounded for invalid upstream values such as strings or whitespace-prefixed statuses. Add unit tests covering valid statuses and malformed inputs.
The CLI flag uses "Track" but the README used "Enable" in two places. Align to "Track" for consistency with other TRACK_* flag descriptions.
📝 WalkthroughWalkthroughThis PR adds bucket-level Service-Level Indicator (SLI) metrics for S3 GET/LIST operations to the ops-log producer. The feature includes configuration support, operation classification and HTTP status bucketing helpers, conditional Prometheus metric registration, integration into metrics collection, full CLI/environment wiring, and comprehensive documentation and tests. ChangesBucket SLO Metrics Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/producers/opslog/prometheus_sli.go (1)
28-31: ⚡ Quick win
prometheus.MustRegisterwill panic ifregisterSLIMetrics()is called more than once.The current call-site (guarded by
if metricsConfig.TrackBucketSLO) is single-call in practice, but any future test that exercisesinitPrometheusSettingswithTrackBucketSLO: truewill panic on the second invocation with"duplicate metrics collector registration attempted".Use
prometheus.Registerwith an explicit "already registered" check, or wrap withsync.Once:♻️ Proposed fix
+import "sync" +var registerSLIOnce sync.Once func registerSLIMetrics() { - prometheus.MustRegister(sliRequestsTotal) - prometheus.MustRegister(sliRequestDuration) + registerSLIOnce.Do(func() { + prometheus.MustRegister(sliRequestsTotal) + prometheus.MustRegister(sliRequestDuration) + }) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/producers/opslog/prometheus_sli.go` around lines 28 - 31, registerSLIMetrics uses prometheus.MustRegister which will panic on duplicate registration; change it to call prometheus.Register for sliRequestsTotal and sliRequestDuration and handle the returned error by ignoring prometheus.AlreadyRegisteredError (or its type) while surfacing other errors (or logging them). Update registerSLIMetrics to attempt registration for both sliRequestsTotal and sliRequestDuration, check the error from prometheus.Register, and only skip when the error indicates the metric is already registered; keep the rest of the init logic (e.g., where initPrometheusSettings uses metricsConfig.TrackBucketSLO) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/producers/opslog/sli.go`:
- Around line 27-35: statusClass currently maps any leading digit to a class,
allowing invalid HTTP classes like 0xx or 6xx–9xx; update statusClass to only
accept leading digits '1' through '5' and return "unknown" otherwise. Locate the
statusClass function and change its digit check from a generic numeric range to
explicitly allow '1' <= status[0] <= '5', keeping the existing empty and
non-digit guards, and return status[:1] + "xx" only when the leading digit is
within 1–5.
---
Nitpick comments:
In `@pkg/producers/opslog/prometheus_sli.go`:
- Around line 28-31: registerSLIMetrics uses prometheus.MustRegister which will
panic on duplicate registration; change it to call prometheus.Register for
sliRequestsTotal and sliRequestDuration and handle the returned error by
ignoring prometheus.AlreadyRegisteredError (or its type) while surfacing other
errors (or logging them). Update registerSLIMetrics to attempt registration for
both sliRequestsTotal and sliRequestDuration, check the error from
prometheus.Register, and only skip when the error indicates the metric is
already registered; keep the rest of the init logic (e.g., where
initPrometheusSettings uses metricsConfig.TrackBucketSLO) unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 076e95ce-8cdf-408e-9789-788b75179c14
📒 Files selected for processing (9)
go.modpkg/commands/producer_ops_log.gopkg/producers/opslog/README.mdpkg/producers/opslog/config.gopkg/producers/opslog/metrics.gopkg/producers/opslog/metrics_test.gopkg/producers/opslog/prometheus.gopkg/producers/opslog/prometheus_sli.gopkg/producers/opslog/sli.go
| func statusClass(status string) string { | ||
| if len(status) == 0 { | ||
| return "unknown" | ||
| } | ||
| if status[0] < '0' || status[0] > '9' { | ||
| return "unknown" | ||
| } | ||
| return status[:1] + "xx" | ||
| } |
There was a problem hiding this comment.
Restrict status classes to valid HTTP ranges (1xx–5xx)
statusClass currently turns any leading digit into a class, so malformed statuses can produce 0xx/6xx–9xx. That degrades metric quality and can skew SLO queries expecting valid HTTP classes.
💡 Proposed fix
func statusClass(status string) string {
if len(status) == 0 {
return "unknown"
}
- if status[0] < '0' || status[0] > '9' {
+ if status[0] < '1' || status[0] > '5' {
return "unknown"
}
return status[:1] + "xx"
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func statusClass(status string) string { | |
| if len(status) == 0 { | |
| return "unknown" | |
| } | |
| if status[0] < '0' || status[0] > '9' { | |
| return "unknown" | |
| } | |
| return status[:1] + "xx" | |
| } | |
| func statusClass(status string) string { | |
| if len(status) == 0 { | |
| return "unknown" | |
| } | |
| if status[0] < '1' || status[0] > '5' { | |
| return "unknown" | |
| } | |
| return status[:1] + "xx" | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/producers/opslog/sli.go` around lines 27 - 35, statusClass currently maps
any leading digit to a class, allowing invalid HTTP classes like 0xx or 6xx–9xx;
update statusClass to only accept leading digits '1' through '5' and return
"unknown" otherwise. Locate the statusClass function and change its digit check
from a generic numeric range to explicitly allow '1' <= status[0] <= '5',
keeping the existing empty and non-digit guards, and return status[:1] + "xx"
only when the leading digit is within 1–5.
Summary
--track-bucket-sloflag descriptionsTRACK_*entriesNote: This PR should be merged after PR #48 (which introduces the
--track-bucket-sloflag).Test Plan
"Track low-cardinality bucket GET/LIST SLI metrics for Prometheus SLOs"TRACK_*env vars in README use "Track" verbSummary by CodeRabbit
New Features
--track-bucket-sloCLI flag and environment variable supportDocumentation