feat: add link group correlation for multi-link monitoring#3493
feat: add link group correlation for multi-link monitoring#3493opastorello wants to merge 6 commits intobluewave-labs:developfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a174715feb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| status: true, | ||
| statusCode, | ||
| message, | ||
| severity: decision.groupCorrelation?.severity ?? "none", |
There was a problem hiding this comment.
Persist group severity for Timescale incidents
When DB_TYPE=timescaledb, this new severity value is dropped: IncidentService now sets severity on create, but TimescaleIncidentsRepository.create still inserts only the old columns and toEntity never maps a severity field, so correlated outages are stored without the intended high/critical incident severity in Timescale deployments. This makes incident severity inconsistent across supported database backends and breaks the new feature outside Mongo.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in baeb9a3: added migration 0022_add_incident_severity to ALTER TABLE incidents ADD COLUMN severity, updated TimescaleIncidentsRepository to insert, select, update, and map the field.
ajhollid
left a comment
There was a problem hiding this comment.
This is a pretty clean PR, thanks for taking the time to work on this 👍
There are a few issues to resolve, but nothing too major. Give a shout if you need clarification on anything.
| resolvedBy?: string | null; | ||
| resolvedByEmail?: string | null; | ||
| comment?: string | null; | ||
| severity?: "none" | "high" | "critical" | null; |
There was a problem hiding this comment.
I don't believe null should be allowed in the union here. The TimescaleDB implementation explicitly does not allow null, so the type shouldn't allow it either.
There was a problem hiding this comment.
Fixed in 2511ceb: removed null from the union. severity now uses IncidentSeverity ("none" | "high" | "critical") derived from IncidentSeverities as const, consistent with the TimescaleDB constraint.
| status: true, | ||
| statusCode, | ||
| message, | ||
| severity: (decision.groupCorrelation?.severity ?? "none") as "none" | "high" | "critical", |
There was a problem hiding this comment.
Having to cast values is often a sign that there's something wrong with the types. That is indeed the case here, please see my comments in the Queue file for the correct fix for this.
There was a problem hiding this comment.
Fixed in 2511ceb: the cast is gone. The incident object is now typed as Partial<Incident>, so severity: context?.groupCorrelation?.severity ?? "none" resolves cleanly to IncidentSeverity without widening.
| groupName: string; | ||
| downCount: number; | ||
| totalCount: number; | ||
| severity: "high" | "critical"; |
There was a problem hiding this comment.
This should not be a union of string, it should be a severity type
You can use the MonitorType as a guide:
export const MonitorTypes = ["http", "ping", "pagespeed", "hardware", "docker", "port", "game", "grpc", "websocket", "unknown"] as const;
export type MonitorType = (typeof MonitorTypes)[number];
There was a problem hiding this comment.
Fixed in 2511ceb: following the same pattern as MonitorType. Added IncidentSeverities = ["none", "high", "critical"] as const and IncidentSeverity = (typeof IncidentSeverities)[number] in types/incident.ts. GroupCorrelation.severity now uses Exclude<IncidentSeverity, "none">.
| disk?: boolean; | ||
| temp?: boolean; | ||
| }; | ||
| groupCorrelation?: GroupCorrelation; |
There was a problem hiding this comment.
This is not part of a decision. The MontiorActionDecision interface is only used to decide what happens when a monitor's status changes.
The groupCorrelation interface is purely metadata; it should be passed to handleIncident and handleNotification on its own. Perhaps wrapping it in something like an 'IncidentContext` interface to allow for expandability in the future 🤔
There was a problem hiding this comment.
Fixed in 2511ceb: groupCorrelation was removed from MonitorActionDecision. Added a separate IncidentContext interface (with optional groupCorrelation) that is passed as an independent parameter to both handleIncident and handleNotifications.
| const decision = this.evaluateMonitorAction(statusChangeResult); | ||
|
|
||
| // Step 5b. Evaluate group correlation if monitor belongs to a group | ||
| if (monitor.group && (decision.shouldCreateIncident || decision.shouldResolveIncident)) { |
There was a problem hiding this comment.
The groupCorrelation is only used when creating incidents, not resolving incidents. I don't believe you need decision.shouldResolveIncident in the conditional here.
As far as I can tell the result of findByGroupAndTeamId is discarded in the "resolve" branch of the decision tree.
There was a problem hiding this comment.
Fixed in 2511ceb: the conditional now only checks decision.shouldCreateIncident. The shouldResolveIncident branch was removed since incidentContext is only relevant when creating incidents.
Monitors can be assigned to a named group (e.g. branch-sp-01). When checks run, the system evaluates all monitors in the group: - Some links down: severity high - All links down: severity critical Backend: - Add findByGroupAndTeamId to monitors repository (Mongo + Timescale) - Add GroupCorrelation interface and groupCorrelation field to MonitorActionDecision - Evaluate group status in SuperSimpleQueueHelper after each check (Step 5b) - Persist severity field on Incident model and type - Enrich notification title/summary with group context and severity override Frontend: - Add group field to monitor create/edit form (all monitor types) - Add i18n keys for group label, placeholder, helper text
…itecture - Add IncidentSeverity type following as-const pattern (like MonitorType) - Extract IncidentContext interface; remove groupCorrelation from MonitorActionDecision - Pass IncidentContext as separate param to handleIncident and handleNotifications - Step 5b: evaluate group correlation only on shouldCreateIncident, not resolve - Remove null from Incident.severity type and Mongoose enum - Annotate incident object as Partial<Incident> to avoid severity type widening
2511ceb to
345a47a
Compare
…rrelation # Conflicts: # CLAUDE.md
Summary
branch-sp-01) to represent multiple internet links at the same unit/branchChanges
Backend
IMonitorsRepository+ both implementations (Mongo, Timescale): newfindByGroupAndTeamIdquerySuperSimpleQueueHelper: exportedGroupCorrelationinterface; Step 5b evaluates group state after each checkMonitorActionDecision: new optionalgroupCorrelationfieldIncidentmodel + type: newseverityfield (none | high | critical)incidentService: persists severity on incident creationnotificationMessageBuilder: overrides severity and enriches title/summary with group contextnotificationMessagetype:metadata.groupCorrelationadded to notification payloadFrontend
useMonitorForm: seedsgroupdefault from existing monitor dataValidation/monitor:groupfield added to base schemalocales/en.json: i18n keys for label, placeholder, helper textTest plan
branch-sp-01)highcritical, title prefixed with[CRITICAL] All Links Down