Skip to content

ROX-34459: fix scanner restart noise#339

Merged
johannes94 merged 2 commits into
masterfrom
jmalsam/fix-scanner-restart-noise
May 12, 2026
Merged

ROX-34459: fix scanner restart noise#339
johannes94 merged 2 commits into
masterfrom
jmalsam/fix-scanner-restart-noise

Conversation

@johannes94
Copy link
Copy Markdown
Contributor

@johannes94 johannes94 commented May 12, 2026

Changing the restart alert for scanner restarts to use changes(...)[30m] as opposed to increase(...)[30m]
This PR also pins the grafana linter as the latest version broke the go install flow by using replace directives in their go.mod.

Because of the float vs int nature and prometheus extrapolations increase can produce inflated values, leading to false positive alerts for scanner restarts.

Looking at the graphs it appears that changes() capture our intent to exclude restarts and updates from the alerting much better.

image

Full claude explanation:

increase() in Prometheus doesn't just compute last_sample - first_sample. It looks at the
  actual time span covered by the samples within the window and then scales the result to cover
  the full requested window.

  Example with a 30m window and 60s scrape interval:

  - The first sample in the window is at t+28s, the last at t+29m31s
  - That's ~29m03s of actual sample coverage
  - The raw difference between those samples is 3 restarts
  - Prometheus scales: 3 × (30m / 29m03s) ≈ 3.1

  That's mild. But it gets worse when restarts happen near the window edges, because Prometheus
  also extrapolates beyond the first and last samples by up to half a scrape interval on each
  side. The exact math depends on sample timing and can push a real count of 3 to 4+.

  With counter resets (which happen on every container restart), each reset adds additional
  imprecision to the calculation because Prometheus has to infer the pre-reset value from
  surrounding samples.

  This is a known Prometheus behavior — increase() on a counter that increments in whole numbers
  can return non-integer, inflated values. That's why round(increase(...)) or using changes()
  instead is common when you care about exact counts.

@johannes94 johannes94 requested a review from a team as a code owner May 12, 2026 07:45
@johannes94 johannes94 requested review from GrimmiMeloni and removed request for a team May 12, 2026 07:45
@johannes94 johannes94 merged commit b7b4ed5 into master May 12, 2026
1 check passed
@johannes94 johannes94 deleted the jmalsam/fix-scanner-restart-noise branch May 12, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants