Skip to content

feat(alertrouter): B.3b — system-alert-router (15/15 ACs, 100%)#424

Merged
remyluslosius merged 1 commit into
mainfrom
feat/slice-b-b3b-alert-router
May 29, 2026
Merged

feat(alertrouter): B.3b — system-alert-router (15/15 ACs, 100%)#424
remyluslosius merged 1 commit into
mainfrom
feat/slice-b-b3b-alert-router

Conversation

@remyluslosius
Copy link
Copy Markdown
Contributor

Summary

Slice B.3b — the alert router. Subscribes to the event bus (B.3a),
translates events to typed Alerts, dedupes per (alert_type, host_id, rule_id) within a configurable TTL (default 60min), and dispatches
matching alerts to registered channels. Concrete channel
implementations (Slack, email, webhook) live in subpackages so the
core router has no external SDK dependencies — this PR ships the
interface + the test fake.

Stack

Branched off feat/slice-b-b3a-event-bus. Includes two upstream
commits that PR #423 also carries — a gofmt cleanup and a
spec-coverage fix linking AC-12 to C-01. When #423 merges, this PR
rebases cleanly with the alertrouter commit on top.

Spec

app/specs/system/alert-router.spec.yaml (status: approved)
15 ACs across 9 constraints, all at 100% coverage.

  • AC-01/02: AlertType + Severity closed enums (5 values each)
  • AC-03/04/05: Event-to-alert translation
    (HeartbeatPulse{reachable=false} → host_unreachable High;
    recovery → host_recovered Info; DriftDetected{major} → High;
    minor → Medium; improvement → Info)
  • AC-06/07: Dedup gate (skip within TTL; pass after TTL)
  • AC-08/09: Tag-filter routing (specific filter; empty Tags = wildcard)
  • AC-10: Channel failure isolation (per-channel + aggregate counters)
  • AC-11/12: Router.Start subscribes to both EventKinds; Stop drains
    in-flight Channel.Send with 10s timeout
  • AC-13: Source-inspection — core package imports no external
    notification SDKs (slack-go, sendgrid, mailgun, twilio, PagerDuty,
    opsgenie, gomail, etc.)
  • AC-14: Metrics (ReceivedCount, RoutedCount, DedupedCount,
    ChannelFailureCount)
  • AC-15: ValidateDedupTTL range check (60s, 24h) with typed sentinel

Files

Path Purpose
app/specs/system/alert-router.spec.yaml The spec
app/internal/alertrouter/doc.go Architectural choices
app/internal/alertrouter/types.go AlertType + Severity enums, Alert, Channel, ChannelRegistration, ValidateDedupTTL
app/internal/alertrouter/dedup.go DedupGate with TTL + injectable clock
app/internal/alertrouter/router.go Router with Start/Stop, event translation, per-channel dispatch
app/internal/alertrouter/metrics.go Counters + JSON snapshot
app/internal/alertrouter/{router,types,source}_test.go 15/15 ACs

Verification

go vet ./...                    clean
golangci-lint                   clean
govulncheck                     clean
go test -race -count=1 ./...    PASS (1.10s)
specter parse                   PASS (system-alert-router@1.0.0)
specter check                   PASS (32 specs, 0 errors)
specter coverage                system-alert-router 15/15 (100%)

Test plan

  • CI: Go CI gates pass (vet, lint, vuln, test-race, specter sync)
  • Verify spec parses + checks cleanly in CI's specter run
  • Confirm system-alert-router reaches 100% in specter coverage

@remyluslosius remyluslosius enabled auto-merge (squash) May 29, 2026 05:32
@remyluslosius remyluslosius force-pushed the feat/slice-b-b3b-alert-router branch 5 times, most recently from 860932b to 2412b01 Compare May 29, 2026 13:06
…Cs, 100%)

Continues Slice B.3 (orchestration plumbing). The alert router is the
bridge between OpenWatch's event bus (B.3a) and external notification
channels — Slack, email, webhook, PagerDuty. It subscribes to bus
events at boot, translates each into a typed Alert, applies a dedup
gate per (alert_type, host_id, rule_id) tuple, and dispatches matching
alerts to registered Channels.

Concrete channel implementations live in subpackages so the core
router has no external SDK dependencies. This PR ships the interface +
a test fake; Slack, email, webhook implementations land in follow-ups.

Spec
  New: app/specs/system/alert-router.spec.yaml (status: approved).
  15 ACs across 9 constraints.

internal/alertrouter package
  doc.go      Architectural choices: bus subscription on Start, closed
              AlertType + Severity enums, in-memory dedup with TTL,
              channel registration with tag-filter routing, per-
              channel goroutine with failure isolation, Stop drains
              with 10s timeout.

  types.go    AlertType closed enum (HostUnreachable, HostRecovered,
                DriftMajor, DriftMinor, DriftImprovement).
              Severity closed enum (Critical, High, Medium, Low,
                Info) + SeverityOrder rank map.
              Alert struct with Type, Severity, HostID, RuleID, Tags.
              Channel interface (Name + Send).
              ChannelRegistration with Tags filter; empty Tags =
                wildcard.
              ValidateDedupTTL enforces [60s, 24h] range with typed
              ErrDedupTTLOutOfRange.

  dedup.go    DedupGate keyed by Alert.DedupKey(); in-memory map with
              opportunistic reap on every ShouldSkip call to keep
              size bounded under churn. Injectable now() for testing.

  router.go   Router with Start (subscribe to HeartbeatPulse +
                DriftDetected) / Stop (unsubscribe + drain
                in-flight Channel.Send with 10s timeout).
              Per-channel goroutine dispatch; one channel's error or
                panic does NOT block delivery to other channels.
              Event translation: HeartbeatPulse{Reachable=false}
                → host_unreachable (High); recovery → host_recovered
                (Info); DriftDetected{major} → drift_major (High);
                minor → drift_minor (Medium); improvement → Info.

  metrics.go  ReceivedCount + RoutedCount + DedupedCount +
                ChannelFailureCount with JSON-friendly Snapshot.

Tests (15/15 ACs, all under -race)
  types_test.go    AC-01 enum closure (5 alert types).
                   AC-02 severity enum + SeverityOrder ranking.
                   AC-15 ValidateDedupTTL range check (boundary
                     cases + typed error sentinel).

  router_test.go   AC-03/04/05 event-to-alert translation.
                   AC-06 dedup skip within TTL (Channel.Send NOT
                     called for the skipped alert).
                   AC-07 dedup pass after TTL (fake clock on gate).
                   AC-08 tag-filter rejects non-match.
                   AC-09 wildcard channel (empty Tags) receives
                     every alert.
                   AC-10 channel error doesn't block other channels
                     (per-channel + aggregate failure counters).
                   AC-11 Start subscribes to BOTH event kinds.
                   AC-12 Stop drains slow sends + post-Stop publishes
                     ignored.
                   AC-14 all four metric counters increment under
                     compound scenarios.

  source_test.go   AC-13 internal/alertrouter (core, not subpackages)
                     imports no external notification SDKs
                     (slack-go, sendgrid, mailgun, twilio,
                     PagerDuty SDK, opsgenie, gomail, etc.).
                     AST-based import scan.

Verification
  go vet ./...            clean
  golangci-lint           clean
  govulncheck             clean
  go test -race -count=1  PASS (1.10s)
  specter parse           PASS (system-alert-router@1.0.0)
  specter check           PASS (32 specs)
  specter coverage        system-alert-router 15/15 (100%)

Spec: app/specs/system/alert-router.spec.yaml
@remyluslosius remyluslosius force-pushed the feat/slice-b-b3b-alert-router branch from 2412b01 to 16f2dde Compare May 29, 2026 13:15
@remyluslosius remyluslosius merged commit d4fdd92 into main May 29, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant