feat: add flagger_canary_phase metric with granular phase values#1927
Open
Softer wants to merge 1 commit into
Open
feat: add flagger_canary_phase metric with granular phase values#1927Softer wants to merge 1 commit into
Softer wants to merge 1 commit into
Conversation
flagger_canary_status collapses the 11 canary phases into 3 values (0 running, 1 successful, 2 failed), so dashboards cannot tell WaitingPromotion, Promoting, Finalising or Succeeded apart on a Grafana state-timeline. Add a new flagger_canary_phase gauge that exposes each phase as a unique value (0=Initializing ... 10=Terminated) via a deterministic phase-to-value map. SetStatus now also sets the new gauge, so every existing call site is covered without touching the scheduler. flagger_canary_status is left unchanged to avoid breaking existing dashboards and alerts. The Terminating (9) phase is recorded from the finalizer and the Terminated (10) phase from the informer delete handler, so deleted canaries keep emitting a filterable value (flagger_canary_phase < 9) instead of leaving a stale series. This addresses the stale-metric problem from fluxcd#1029 without deleting metrics, which was flagged as a breaking change in fluxcd#1856. Signed-off-by: Softer <sft.nik@gmail.com>
aryan9600
reviewed
Jun 26, 2026
aryan9600
left a comment
Member
There was a problem hiding this comment.
thanks for your contribution!
| // (which collapses all phases into running/successful/failed), this mapping | ||
| // keeps every phase distinct so they can be rendered on a Grafana state-timeline. | ||
| var canaryPhaseValues = map[flaggerv1.CanaryPhase]float64{ | ||
| flaggerv1.CanaryPhaseInitializing: 0, |
Member
There was a problem hiding this comment.
no canary will be ever set to Initializing because unfortunately setPhaseInitializing doesn't use SetStatus to update the status.
ref:
flagger/pkg/controller/scheduler.go
Line 1016 in 83bab68
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1927 +/- ##
==========================================
+ Coverage 30.00% 30.04% +0.04%
==========================================
Files 288 288
Lines 18455 18474 +19
==========================================
+ Hits 5537 5551 +14
- Misses 12189 12193 +4
- Partials 729 730 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
flagger_canary_statuscollapses the 11 canary phases into only 3 values (0running,1successful,2failed). On a Grafana state-timeline this makes it impossible to distinguishWaitingPromotion,Promoting,FinalisingandSucceeded— they all map to1.Changing
flagger_canary_statuswould break every existing dashboard and alert, so this PR adds a new metric instead and leaves the existing one untouched.What this PR does
Adds a new gauge
flagger_canary_phase(labels:name,namespace) that exposes each phase as a unique value via a deterministic phase-to-value map:SetStatusnow also sets the new gauge (viaSetPhase), so every existing call site is covered without changing the scheduler.flagger_canary_statusis not modified.Terminating(9) phase is recorded from the finalizer (forrevertOnDeletion: truecanaries), andTerminated(10) from the informer delete handler for any deletion. A deleted canary therefore keeps emitting a filterable value, so queries can exclude removed canaries withflagger_canary_phase < 9.This gives a non-breaking answer to the stale-metric problem in #1029: instead of deleting metrics on canary removal (flagged as a breaking change in #1856), the phase metric exposes a terminated sentinel that dashboards/alerts can
filter on. It is also relevant to #1819 where distinguishing
WaitingPromotionmatters.