feat(alertrouter): B.3b — system-alert-router (15/15 ACs, 100%)#424
Merged
Conversation
860932b to
2412b01
Compare
…Cs, 100%)
Continues Slice B.3 (orchestration plumbing). The alert router is the
bridge between OpenWatch's event bus (B.3a) and external notification
channels — Slack, email, webhook, PagerDuty. It subscribes to bus
events at boot, translates each into a typed Alert, applies a dedup
gate per (alert_type, host_id, rule_id) tuple, and dispatches matching
alerts to registered Channels.
Concrete channel implementations live in subpackages so the core
router has no external SDK dependencies. This PR ships the interface +
a test fake; Slack, email, webhook implementations land in follow-ups.
Spec
New: app/specs/system/alert-router.spec.yaml (status: approved).
15 ACs across 9 constraints.
internal/alertrouter package
doc.go Architectural choices: bus subscription on Start, closed
AlertType + Severity enums, in-memory dedup with TTL,
channel registration with tag-filter routing, per-
channel goroutine with failure isolation, Stop drains
with 10s timeout.
types.go AlertType closed enum (HostUnreachable, HostRecovered,
DriftMajor, DriftMinor, DriftImprovement).
Severity closed enum (Critical, High, Medium, Low,
Info) + SeverityOrder rank map.
Alert struct with Type, Severity, HostID, RuleID, Tags.
Channel interface (Name + Send).
ChannelRegistration with Tags filter; empty Tags =
wildcard.
ValidateDedupTTL enforces [60s, 24h] range with typed
ErrDedupTTLOutOfRange.
dedup.go DedupGate keyed by Alert.DedupKey(); in-memory map with
opportunistic reap on every ShouldSkip call to keep
size bounded under churn. Injectable now() for testing.
router.go Router with Start (subscribe to HeartbeatPulse +
DriftDetected) / Stop (unsubscribe + drain
in-flight Channel.Send with 10s timeout).
Per-channel goroutine dispatch; one channel's error or
panic does NOT block delivery to other channels.
Event translation: HeartbeatPulse{Reachable=false}
→ host_unreachable (High); recovery → host_recovered
(Info); DriftDetected{major} → drift_major (High);
minor → drift_minor (Medium); improvement → Info.
metrics.go ReceivedCount + RoutedCount + DedupedCount +
ChannelFailureCount with JSON-friendly Snapshot.
Tests (15/15 ACs, all under -race)
types_test.go AC-01 enum closure (5 alert types).
AC-02 severity enum + SeverityOrder ranking.
AC-15 ValidateDedupTTL range check (boundary
cases + typed error sentinel).
router_test.go AC-03/04/05 event-to-alert translation.
AC-06 dedup skip within TTL (Channel.Send NOT
called for the skipped alert).
AC-07 dedup pass after TTL (fake clock on gate).
AC-08 tag-filter rejects non-match.
AC-09 wildcard channel (empty Tags) receives
every alert.
AC-10 channel error doesn't block other channels
(per-channel + aggregate failure counters).
AC-11 Start subscribes to BOTH event kinds.
AC-12 Stop drains slow sends + post-Stop publishes
ignored.
AC-14 all four metric counters increment under
compound scenarios.
source_test.go AC-13 internal/alertrouter (core, not subpackages)
imports no external notification SDKs
(slack-go, sendgrid, mailgun, twilio,
PagerDuty SDK, opsgenie, gomail, etc.).
AST-based import scan.
Verification
go vet ./... clean
golangci-lint clean
govulncheck clean
go test -race -count=1 PASS (1.10s)
specter parse PASS (system-alert-router@1.0.0)
specter check PASS (32 specs)
specter coverage system-alert-router 15/15 (100%)
Spec: app/specs/system/alert-router.spec.yaml
2412b01 to
16f2dde
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Slice B.3b — the alert router. Subscribes to the event bus (B.3a),
translates events to typed Alerts, dedupes per
(alert_type, host_id, rule_id)within a configurable TTL (default 60min), and dispatchesmatching alerts to registered channels. Concrete channel
implementations (Slack, email, webhook) live in subpackages so the
core router has no external SDK dependencies — this PR ships the
interface + the test fake.
Stack
Branched off
feat/slice-b-b3a-event-bus. Includes two upstreamcommits that PR #423 also carries — a gofmt cleanup and a
spec-coverage fix linking AC-12 to C-01. When #423 merges, this PR
rebases cleanly with the alertrouter commit on top.
Spec
app/specs/system/alert-router.spec.yaml(status: approved)15 ACs across 9 constraints, all at 100% coverage.
(HeartbeatPulse{reachable=false} → host_unreachable High;
recovery → host_recovered Info; DriftDetected{major} → High;
minor → Medium; improvement → Info)
in-flight Channel.Send with 10s timeout
notification SDKs (slack-go, sendgrid, mailgun, twilio, PagerDuty,
opsgenie, gomail, etc.)
ChannelFailureCount)
Files
app/specs/system/alert-router.spec.yamlapp/internal/alertrouter/doc.goapp/internal/alertrouter/types.goapp/internal/alertrouter/dedup.goapp/internal/alertrouter/router.goapp/internal/alertrouter/metrics.goapp/internal/alertrouter/{router,types,source}_test.goVerification
Test plan
system-alert-routerreaches 100% inspecter coverage