Skip to content

feat: add link group correlation for multi-link monitoring#3493

Open
opastorello wants to merge 6 commits intobluewave-labs:developfrom
opastorello:feat/link-group-correlation
Open

feat: add link group correlation for multi-link monitoring#3493
opastorello wants to merge 6 commits intobluewave-labs:developfrom
opastorello:feat/link-group-correlation

Conversation

@opastorello
Copy link
Copy Markdown

Summary

  • Monitors can be assigned to a named group (e.g. branch-sp-01) to represent multiple internet links at the same unit/branch
  • When checks run, the system evaluates all monitors in the group and escalates severity:
    • Some links down → severity high
    • All links down → severity critical
  • Monitors without a group are completely unaffected

Changes

Backend

  • IMonitorsRepository + both implementations (Mongo, Timescale): new findByGroupAndTeamId query
  • SuperSimpleQueueHelper: exported GroupCorrelation interface; Step 5b evaluates group state after each check
  • MonitorActionDecision: new optional groupCorrelation field
  • Incident model + type: new severity field (none | high | critical)
  • incidentService: persists severity on incident creation
  • notificationMessageBuilder: overrides severity and enriches title/summary with group context
  • notificationMessage type: metadata.groupCorrelation added to notification payload

Frontend

  • Monitor create/edit form: new Link Group text field (all monitor types)
  • useMonitorForm: seeds group default from existing monitor data
  • Validation/monitor: group field added to base schema
  • locales/en.json: i18n keys for label, placeholder, helper text

Test plan

  • Create two HTTP monitors with the same group name (e.g. branch-sp-01)
  • Bring one monitor down → incident and notification should have severity high
  • Bring both monitors down → incident and notification should have severity critical, title prefixed with [CRITICAL] All Links Down
  • Recover one monitor → correlation re-evaluates on next check
  • Monitor with no group assigned → behavior unchanged

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a174715feb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

status: true,
statusCode,
message,
severity: decision.groupCorrelation?.severity ?? "none",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Persist group severity for Timescale incidents

When DB_TYPE=timescaledb, this new severity value is dropped: IncidentService now sets severity on create, but TimescaleIncidentsRepository.create still inserts only the old columns and toEntity never maps a severity field, so correlated outages are stored without the intended high/critical incident severity in Timescale deployments. This makes incident severity inconsistent across supported database backends and breaks the new feature outside Mongo.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in baeb9a3: added migration 0022_add_incident_severity to ALTER TABLE incidents ADD COLUMN severity, updated TimescaleIncidentsRepository to insert, select, update, and map the field.

Copy link
Copy Markdown
Collaborator

@ajhollid ajhollid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty clean PR, thanks for taking the time to work on this 👍

There are a few issues to resolve, but nothing too major. Give a shout if you need clarification on anything.

Comment thread server/src/types/incident.ts Outdated
resolvedBy?: string | null;
resolvedByEmail?: string | null;
comment?: string | null;
severity?: "none" | "high" | "critical" | null;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe null should be allowed in the union here. The TimescaleDB implementation explicitly does not allow null, so the type shouldn't allow it either.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2511ceb: removed null from the union. severity now uses IncidentSeverity ("none" | "high" | "critical") derived from IncidentSeverities as const, consistent with the TimescaleDB constraint.

status: true,
statusCode,
message,
severity: (decision.groupCorrelation?.severity ?? "none") as "none" | "high" | "critical",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to cast values is often a sign that there's something wrong with the types. That is indeed the case here, please see my comments in the Queue file for the correct fix for this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2511ceb: the cast is gone. The incident object is now typed as Partial<Incident>, so severity: context?.groupCorrelation?.severity ?? "none" resolves cleanly to IncidentSeverity without widening.

groupName: string;
downCount: number;
totalCount: number;
severity: "high" | "critical";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a union of string, it should be a severity type

You can use the MonitorType as a guide:

export const MonitorTypes = ["http", "ping", "pagespeed", "hardware", "docker", "port", "game", "grpc", "websocket", "unknown"] as const;
export type MonitorType = (typeof MonitorTypes)[number];

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2511ceb: following the same pattern as MonitorType. Added IncidentSeverities = ["none", "high", "critical"] as const and IncidentSeverity = (typeof IncidentSeverities)[number] in types/incident.ts. GroupCorrelation.severity now uses Exclude<IncidentSeverity, "none">.

disk?: boolean;
temp?: boolean;
};
groupCorrelation?: GroupCorrelation;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not part of a decision. The MontiorActionDecision interface is only used to decide what happens when a monitor's status changes.

The groupCorrelation interface is purely metadata; it should be passed to handleIncident and handleNotification on its own. Perhaps wrapping it in something like an 'IncidentContext` interface to allow for expandability in the future 🤔

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2511ceb: groupCorrelation was removed from MonitorActionDecision. Added a separate IncidentContext interface (with optional groupCorrelation) that is passed as an independent parameter to both handleIncident and handleNotifications.

const decision = this.evaluateMonitorAction(statusChangeResult);

// Step 5b. Evaluate group correlation if monitor belongs to a group
if (monitor.group && (decision.shouldCreateIncident || decision.shouldResolveIncident)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The groupCorrelation is only used when creating incidents, not resolving incidents. I don't believe you need decision.shouldResolveIncident in the conditional here.

As far as I can tell the result of findByGroupAndTeamId is discarded in the "resolve" branch of the decision tree.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2511ceb: the conditional now only checks decision.shouldCreateIncident. The shouldResolveIncident branch was removed since incidentContext is only relevant when creating incidents.

Monitors can be assigned to a named group (e.g. branch-sp-01).
When checks run, the system evaluates all monitors in the group:
- Some links down: severity high
- All links down: severity critical

Backend:
- Add findByGroupAndTeamId to monitors repository (Mongo + Timescale)
- Add GroupCorrelation interface and groupCorrelation field to MonitorActionDecision
- Evaluate group status in SuperSimpleQueueHelper after each check (Step 5b)
- Persist severity field on Incident model and type
- Enrich notification title/summary with group context and severity override

Frontend:
- Add group field to monitor create/edit form (all monitor types)
- Add i18n keys for group label, placeholder, helper text
…itecture

- Add IncidentSeverity type following as-const pattern (like MonitorType)
- Extract IncidentContext interface; remove groupCorrelation from MonitorActionDecision
- Pass IncidentContext as separate param to handleIncident and handleNotifications
- Step 5b: evaluate group correlation only on shouldCreateIncident, not resolve
- Remove null from Incident.severity type and Mongoose enum
- Annotate incident object as Partial<Incident> to avoid severity type widening
@opastorello opastorello force-pushed the feat/link-group-correlation branch from 2511ceb to 345a47a Compare April 12, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants