Skip to content

apollo_dashboard,apollo_deployments: add top-level intervalSec to Alerts; use it in alert builder#13449

Merged
Itay-Tsabary-Starkware merged 1 commit into
mainfrom
apollo_dashboard_apollo_deployments_alerts_top_level_interval_sec
Apr 14, 2026
Merged

apollo_dashboard,apollo_deployments: add top-level intervalSec to Alerts; use it in alert builder#13449
Itay-Tsabary-Starkware merged 1 commit into
mainfrom
apollo_dashboard_apollo_deployments_alerts_top_level_interval_sec

Conversation

@Itay-Tsabary-Starkware
Copy link
Copy Markdown
Contributor

@Itay-Tsabary-Starkware Itay-Tsabary-Starkware commented Mar 24, 2026

Move the evaluation interval from a per-alert field to a top-level field
on the Alerts struct. The interval is group-scoped in Grafana, so a single
value for all alerts is the correct model.

Also restructure alert_builder.py to upload/dump per rule-group instead of
per individual alert, using the top-level intervalSec as the group interval.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com


Note

Medium Risk
Changes the alert JSON schema and the provisioning/deployment flow (grouped updates and per-group CRDs), which could alter alert evaluation cadence or cause missing/duplicated rules if grouping or intervals are misapplied.

Overview
Updates the generated dev_grafana_alerts.json schema to include top-level groups (with per-group evaluation intervals) and moves alerts into these rule groups (evaluation_rate_high/default/low), along with various severity/observer applicability tweaks driven by the regrouping.

Replaces AlertGroup with EvaluationRate in the Rust alert definitions, adds deterministic sorting, and changes Alerts serialization to emit the new groups + alerts structure.

Refactors alert_builder.py and the sequencer Grafana construct to provision alerts per rule group (one group file/CRD per group) and to set the rule-group interval once, rather than per alert.

Reviewed by Cursor Bugbot for commit 96c5d81. Bugbot is set up for automated code reviews on this repo. Configure here.

@reviewable-StarkWare
Copy link
Copy Markdown

This change is Reviewable

Copy link
Copy Markdown
Contributor Author

Itay-Tsabary-Starkware commented Mar 24, 2026

Comment thread deployments/monitoring/src/builders/alert_builder.py
@Itay-Tsabary-Starkware Itay-Tsabary-Starkware changed the base branch from main-v0.14.2 to graphite-base/13449 April 6, 2026 07:21
@Itay-Tsabary-Starkware Itay-Tsabary-Starkware force-pushed the apollo_dashboard_apollo_deployments_alerts_top_level_interval_sec branch from 3f7193d to e542be6 Compare April 6, 2026 07:21
@Itay-Tsabary-Starkware Itay-Tsabary-Starkware changed the base branch from graphite-base/13449 to main April 6, 2026 07:21
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

Comment thread crates/apollo_dashboard/src/alerts.rs Outdated
@Itay-Tsabary-Starkware Itay-Tsabary-Starkware force-pushed the apollo_dashboard_apollo_deployments_alerts_top_level_interval_sec branch 2 times, most recently from 29ac4d5 to 79dc8e9 Compare April 6, 2026 10:46
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 79dc8e9. Configure here.

Comment thread deployments/monitoring/src/builders/alert_builder.py
@Itay-Tsabary-Starkware Itay-Tsabary-Starkware force-pushed the apollo_dashboard_apollo_deployments_alerts_top_level_interval_sec branch from 79dc8e9 to 96c5d81 Compare April 9, 2026 08:33
Copy link
Copy Markdown
Collaborator

@matanl-starkware matanl-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matanl-starkware reviewed 19 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on Itay-Tsabary-Starkware).


crates/apollo_dashboard/resources/dev_grafana_alerts.json line 41 at r4 (raw file):

      ],
      "for": "30s",
      "intervalSec": 30,

Why is this still here?

Code quote:

"intervalSec": 30,

crates/apollo_dashboard/src/alerts.rs line 223 at r4 (raw file):

            Self::Low => "evaluation_rate_low",
        }
    }

Why is this required in addition to the serde/rename instructions above?
Consider keeping only one of them.

Code quote:

    pub(crate) fn name(&self) -> &'static str {
        match self {
            Self::High => "evaluation_rate_high",
            Self::Default => "evaluation_rate_default",
            Self::Low => "evaluation_rate_low",
        }
    }

Copy link
Copy Markdown
Collaborator

@matanl-starkware matanl-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matanl-starkware made 1 comment and resolved 1 discussion.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on Itay-Tsabary-Starkware).


crates/apollo_dashboard/resources/dev_grafana_alerts.json line 41 at r4 (raw file):

Previously, matanl-starkware (Matan Lior) wrote…

Why is this still here?

Removed in following PR 13450

@Itay-Tsabary-Starkware Itay-Tsabary-Starkware force-pushed the apollo_dashboard_apollo_deployments_alerts_top_level_interval_sec branch from 96c5d81 to 287db05 Compare April 13, 2026 15:00
@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 13, 2026

PR Summary

Medium Risk
Changes the on-disk alert JSON schema and the provisioning pipeline (Grafana upload + Kubernetes CRDs) from per-alert intervals to rule-group based intervals, which can alter alert evaluation cadence and deployment behavior if any integration expects the old format.

Overview
Refactors apollo_dashboard alert serialization to introduce explicit Grafana rule groups via a new EvaluationRate enum (default/high/low), emit a top-level groups section, and stably sort alerts by group then name for deterministic JSON output.

Updates the monitoring deployment tooling to build/upload/dump alerts as rule groups instead of individual rules: alert_builder.py now batches alerts by ruleGroup and updates each group in Grafana in one call, and the sequencer grafana.py construct now creates one SharedGrafanaAlertRuleGroup CRD per group using the group interval. The generated dev_grafana_alerts.json is reorganized accordingly (new groups plus many ruleGroup/severity/observer applicability tweaks).

Reviewed by Cursor Bugbot for commit d04135f. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Collaborator

@matanl-starkware matanl-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matanl-starkware reviewed 4 files and all commit messages, and resolved 1 discussion.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on Itay-Tsabary-Starkware).

@Itay-Tsabary-Starkware Itay-Tsabary-Starkware force-pushed the apollo_dashboard_apollo_deployments_alerts_top_level_interval_sec branch from 287db05 to 24c5ff9 Compare April 14, 2026 06:54
…rts; use it in alert builder

Move the evaluation interval from a per-alert field to a top-level field
on the Alerts struct. The interval is group-scoped in Grafana, so a single
value for all alerts is the correct model.

Also restructure alert_builder.py to upload/dump per rule-group instead of
per individual alert, using the top-level intervalSec as the group interval.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Itay-Tsabary-Starkware Itay-Tsabary-Starkware force-pushed the apollo_dashboard_apollo_deployments_alerts_top_level_interval_sec branch from 24c5ff9 to d04135f Compare April 14, 2026 07:29
Copy link
Copy Markdown
Contributor Author

@Itay-Tsabary-Starkware Itay-Tsabary-Starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Itay-Tsabary-Starkware reviewed 21 files and all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on Itay-Tsabary-Starkware).

@Itay-Tsabary-Starkware Itay-Tsabary-Starkware added this pull request to the merge queue Apr 14, 2026
Merged via the queue into main with commit 696bda0 Apr 14, 2026
41 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants