Skip to content

docs(opslog): align --track-bucket-slo descriptions with CLI help text#50

Open
senolcolak wants to merge 3 commits into
mainfrom
docs/fix-readme-verb-consistency
Open

docs(opslog): align --track-bucket-slo descriptions with CLI help text#50
senolcolak wants to merge 3 commits into
mainfrom
docs/fix-readme-verb-consistency

Conversation

@senolcolak
Copy link
Copy Markdown
Contributor

@senolcolak senolcolak commented May 8, 2026

Summary

  • Fix verb inconsistency in README.md for --track-bucket-slo flag descriptions
  • CLI help text uses "Track" but README used "Enable" in two places (flag list and env var table)
  • Aligns both to "Track" for consistency with all other TRACK_* entries

Note: This PR should be merged after PR #48 (which introduces the --track-bucket-slo flag).

Test Plan

  • Verified CLI help string: "Track low-cardinality bucket GET/LIST SLI metrics for Prometheus SLOs"
  • Verified other TRACK_* env vars in README use "Track" verb
  • No functional code changes

Summary by CodeRabbit

  • New Features

    • Added bucket SLO (Service Level Objective) metrics tracking capability with --track-bucket-slo CLI flag and environment variable support
    • Added Prometheus metrics for bucket GET/LIST operations including request counters and latency histograms
  • Documentation

    • Updated ops-log documentation with bucket SLO tracking options and newly collected metrics details

jrse and others added 3 commits April 27, 2026 08:53
  Introduce dedicated low-cardinality bucket SLI tracking for GET/LIST-style
  operations in the opslog producer.

  This adds a new TrackBucketSLO metrics option, wires it through CLI flags and
  environment configuration, and enables it automatically when track-everything
  is selected. When enabled, metrics updates now classify supported bucket
  operations, derive HTTP status classes, and emit dedicated Prometheus SLI
  request and latency metrics.

  Also document the new flag, environment variable, and exported metrics in the
  opslog README, and add tests covering SLI operation classification and metric
  emission. Promote github.com/prometheus/client_model to a direct dependency for
  the new metric assertions.
  Prevent malformed opslog HTTP status values from creating unexpected
  Prometheus  label values in bucket SLI metrics.

  Instead of blindly deriving the label from the first byte, status
  classification now returns  when the status is empty or does not
  start with a digit. This keeps SLI metric cardinality bounded for invalid
  upstream values such as strings or whitespace-prefixed statuses.

  Add unit tests covering valid statuses and malformed inputs.
The CLI flag uses "Track" but the README used "Enable" in two places.
Align to "Track" for consistency with other TRACK_* flag descriptions.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds bucket-level Service-Level Indicator (SLI) metrics for S3 GET/LIST operations to the ops-log producer. The feature includes configuration support, operation classification and HTTP status bucketing helpers, conditional Prometheus metric registration, integration into metrics collection, full CLI/environment wiring, and comprehensive documentation and tests.

Changes

Bucket SLO Metrics Feature

Layer / File(s) Summary
Dependencies
go.mod
Promotes github.com/prometheus/client_model from indirect to direct dependency for SLI metrics.
Configuration Schema
pkg/producers/opslog/config.go
Adds TrackBucketSLO boolean field to MetricsConfig with YAML tag; enables it in ApplyShortcuts() when TrackEverything is selected.
SLI Core Logic
pkg/producers/opslog/sli.go
Defines SLOperation enum and implements classifyBucketSLOOperation (GET/LIST mapping), statusClass (HTTP status bucketing to 2xx/3xx/etc.), and observeBucketSLI (records request counts and latency to Prometheus).
Prometheus Metrics
pkg/producers/opslog/prometheus_sli.go
Defines two low-cardinality metric collectors: sliRequestsTotal counter and sliRequestDuration histogram with tenant/bucket/operation/status-class labels; provides registerSLIMetrics() registration function.
Metrics Integration
pkg/producers/opslog/metrics.go, pkg/producers/opslog/prometheus.go
Calls observeBucketSLI in Update() when TrackBucketSLO is enabled; conditionally registers SLI metrics in initPrometheusSettings().
Configuration Wiring
pkg/commands/producer_ops_log.go
Adds CLI flag --track-bucket-slo, environment variable TRACK_BUCKET_SLO, config variable opsTrackBucketSLO; wires into MetricsConfig; updates debugTrackingConfig to log and count the flag.
Documentation & Tests
pkg/producers/opslog/README.md, pkg/producers/opslog/metrics_test.go
Documents CLI flag, environment variable, and new bucket SLI metrics in README; adds unit tests for operation classification and status bucketing; adds integration test validating SLI counter increments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through metrics bright,
Bucket SLOs now in sight!
Classification brings the way,
GET and LIST shall have their day. 📊✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title describes documentation alignment changes, but the changeset includes substantial functional code additions (new metrics, configuration flags, test coverage) alongside documentation updates. Revise the title to reflect the primary feature: introduce bucket SLO tracking for opslog. For example: 'feat(opslog): Add low-cardinality bucket SLI metrics tracking' or similar to accurately represent the main work.
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/fix-readme-verb-consistency

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/producers/opslog/prometheus_sli.go (1)

28-31: ⚡ Quick win

prometheus.MustRegister will panic if registerSLIMetrics() is called more than once.

The current call-site (guarded by if metricsConfig.TrackBucketSLO) is single-call in practice, but any future test that exercises initPrometheusSettings with TrackBucketSLO: true will panic on the second invocation with "duplicate metrics collector registration attempted".

Use prometheus.Register with an explicit "already registered" check, or wrap with sync.Once:

♻️ Proposed fix
+import "sync"

+var registerSLIOnce sync.Once

 func registerSLIMetrics() {
-	prometheus.MustRegister(sliRequestsTotal)
-	prometheus.MustRegister(sliRequestDuration)
+	registerSLIOnce.Do(func() {
+		prometheus.MustRegister(sliRequestsTotal)
+		prometheus.MustRegister(sliRequestDuration)
+	})
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/producers/opslog/prometheus_sli.go` around lines 28 - 31,
registerSLIMetrics uses prometheus.MustRegister which will panic on duplicate
registration; change it to call prometheus.Register for sliRequestsTotal and
sliRequestDuration and handle the returned error by ignoring
prometheus.AlreadyRegisteredError (or its type) while surfacing other errors (or
logging them). Update registerSLIMetrics to attempt registration for both
sliRequestsTotal and sliRequestDuration, check the error from
prometheus.Register, and only skip when the error indicates the metric is
already registered; keep the rest of the init logic (e.g., where
initPrometheusSettings uses metricsConfig.TrackBucketSLO) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/producers/opslog/sli.go`:
- Around line 27-35: statusClass currently maps any leading digit to a class,
allowing invalid HTTP classes like 0xx or 6xx–9xx; update statusClass to only
accept leading digits '1' through '5' and return "unknown" otherwise. Locate the
statusClass function and change its digit check from a generic numeric range to
explicitly allow '1' <= status[0] <= '5', keeping the existing empty and
non-digit guards, and return status[:1] + "xx" only when the leading digit is
within 1–5.

---

Nitpick comments:
In `@pkg/producers/opslog/prometheus_sli.go`:
- Around line 28-31: registerSLIMetrics uses prometheus.MustRegister which will
panic on duplicate registration; change it to call prometheus.Register for
sliRequestsTotal and sliRequestDuration and handle the returned error by
ignoring prometheus.AlreadyRegisteredError (or its type) while surfacing other
errors (or logging them). Update registerSLIMetrics to attempt registration for
both sliRequestsTotal and sliRequestDuration, check the error from
prometheus.Register, and only skip when the error indicates the metric is
already registered; keep the rest of the init logic (e.g., where
initPrometheusSettings uses metricsConfig.TrackBucketSLO) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 076e95ce-8cdf-408e-9789-788b75179c14

📥 Commits

Reviewing files that changed from the base of the PR and between b91778b and 6ee9f15.

📒 Files selected for processing (9)
  • go.mod
  • pkg/commands/producer_ops_log.go
  • pkg/producers/opslog/README.md
  • pkg/producers/opslog/config.go
  • pkg/producers/opslog/metrics.go
  • pkg/producers/opslog/metrics_test.go
  • pkg/producers/opslog/prometheus.go
  • pkg/producers/opslog/prometheus_sli.go
  • pkg/producers/opslog/sli.go

Comment on lines +27 to +35
func statusClass(status string) string {
if len(status) == 0 {
return "unknown"
}
if status[0] < '0' || status[0] > '9' {
return "unknown"
}
return status[:1] + "xx"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restrict status classes to valid HTTP ranges (1xx5xx)

statusClass currently turns any leading digit into a class, so malformed statuses can produce 0xx/6xx9xx. That degrades metric quality and can skew SLO queries expecting valid HTTP classes.

💡 Proposed fix
 func statusClass(status string) string {
 	if len(status) == 0 {
 		return "unknown"
 	}
-	if status[0] < '0' || status[0] > '9' {
+	if status[0] < '1' || status[0] > '5' {
 		return "unknown"
 	}
 	return status[:1] + "xx"
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func statusClass(status string) string {
if len(status) == 0 {
return "unknown"
}
if status[0] < '0' || status[0] > '9' {
return "unknown"
}
return status[:1] + "xx"
}
func statusClass(status string) string {
if len(status) == 0 {
return "unknown"
}
if status[0] < '1' || status[0] > '5' {
return "unknown"
}
return status[:1] + "xx"
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/producers/opslog/sli.go` around lines 27 - 35, statusClass currently maps
any leading digit to a class, allowing invalid HTTP classes like 0xx or 6xx–9xx;
update statusClass to only accept leading digits '1' through '5' and return
"unknown" otherwise. Locate the statusClass function and change its digit check
from a generic numeric range to explicitly allow '1' <= status[0] <= '5',
keeping the existing empty and non-digit guards, and return status[:1] + "xx"
only when the leading digit is within 1–5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants