feat(daemon): wire Slice B into cmd/openwatch serve (eventbus + alertrouter + liveness loop)#430
Merged
Merged
Conversation
…router + liveness loop) Lands the system-daemon-orchestration spec (v1.0.0) and its implementation: eventbus + alertrouter (with stdout channel) + the liveness probe loop are now instantiated and started during openwatch serve boot, in an order that guarantees the alert router subscribes BEFORE any producer publishes. Spec additions New: system-daemon-orchestration v1.0.0 (7 ACs over 5 constraints). Scope deliberately trimmed from the draft pulled out of #429 — this PR delivers what's actually wired and tested. Worker subcommand, scheduler dispatcher tick, drift detector loop, and live Kensa binding remain follow-ups with their own specs. Amend: system-liveness-loop 1.0.0 -> 1.1.0 - C-10/AC-16/AC-17 Service.Run(ctx) blocking loop walks the active host inventory at the configured interval. - C-11/AC-18 Run skips hosts whose host_backoff_state .suppress_until is in the future. - AC-19 Run returns within 2s of ctx cancel; in-flight probes finish naturally. - C-12/AC-20/AC-21/AC-22 publishes typed HeartbeatPulse to the eventbus on every state transition; no publish on steady-state. - C-13/AC-23 NewService(pool, emit, bus) — nil bus is valid; audit emission still fires. Amend: system-drift-detector 1.0.0 -> 1.1.0 - C-09/AC-15/AC-16/AC-17 DetectForScan publishes DriftDetected to the eventbus on every non-stable detection; per-severity counts match the audit detail produced from the same Report. - C-10/AC-18 NewService(pool, emit, thresholds, bus) — nil bus is valid. Implementation internal/eventbus, internal/alertrouter — unchanged. internal/alertrouter/channels/stdout — NEW subpackage. - Channel.Send writes via slog.InfoContext at INFO with structured attributes (alert_type, severity, host_id, etc.). - Operators see fired alerts in `journalctl -u openwatch -g alertrouter.alert.sent`. - Zero external SDK deps — preserves system-alert-router AC-13 "no external SDKs in core" invariant for the boot-default channel. internal/liveness - NewService gained *eventbus.Bus parameter (v1.1.0 C-13). - Service.Run(ctx) NEW — blocking loop calling tick() at the configured interval; tick walks active hosts and calls ProbeHost. WithInterval seam for tests. - publishHeartbeat fires on state transition (the same trigger as the audit emit). - listProbeTargets SQL LEFT-JOINs host_backoff_state and excludes hosts whose suppress_until is in the future. internal/drift - NewService gained *eventbus.Bus parameter (v1.1.0 C-10). - publishDrift fires alongside emitDriftDetected when Kind is non-stable. cmd/openwatch/main.go - cmdServe instantiates: bus -> alertrouter (with stdout channel registered, wildcard Tags) -> Start -> liveSvc -> go liveSvc.Run -> server.Run. - Shutdown order is reverse: router.Stop after srv.Run returns; bus.Shutdown + audit.Shutdown via defer in reverse order. - Source-inspection tests (cmd/openwatch/source_test.go) assert the textual boot-sequence ordering in main.go matches the spec — eyeballing the file proves correctness, no runtime wrapper refactor needed in v1.0.0. Tests (5 specs at 100%, all under -race) system-daemon-orchestration 7/7 system-liveness-loop 23/23 (was 15/15 — +8 v1.1.0 ACs) system-drift-detector 18/18 (was 14/14 — +4 v1.1.0 ACs) All AC-16/17/18/19 (liveness Run) tested with Postgres + a recording probe that counts per-tick invocations. AC-20/21/22 (HeartbeatPulse) tested with an in-memory bus + subscriber that asserts presence/absence and field values. AC-15/16/17 (DriftDetected) mirror the same pattern. Verification go vet ./... clean go test -race -count=1 ./... PASS (full tree, ~4 min with Postgres) specter parse PASS (40 specs) specter check PASS specter sync PASS (all 40 specs at threshold) Not in this PR (each its own follow-up that takes a future spec as its contract): - openwatch worker subcommand (needs system-worker-subcommand spec) - scheduler dispatcher tick inside serve (needs DEK accessor + policy.Schedules loader) - Drift detector tick (worker calls DetectForScan inline) - Live Kensa binding (system-kensa-executor v2 AC-18) - Slack / email / webhook channel implementations
33d781f to
8578d7a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The Slice B packages have been libraries with no runtime wiring since they were merged. `openwatch serve` boots HTTP and the existing in-process diagnostics worker — nothing else. This PR turns three of those libraries into actual runtime subsystems:
After this PR, `openwatch serve` runs the alert router and the liveness probe loop. Booting against a fleet with hosts produces a tickable liveness sweep; transitions fire to the bus; the alert router fans them out to the stdout channel (visible via `journalctl`).
Spec changes
NEW: `system-daemon-orchestration` v1.0.0 (7 ACs)
The trimmed version of the spec pulled out of #429. Scope: what's actually wired and tested in this PR.
Amended: `system-liveness-loop` v1.0.0 → v1.1.0 (+8 ACs)
Amended: `system-drift-detector` v1.0.0 → v1.1.0 (+4 ACs)
Code
Verification
```
go vet ./... clean
go test -race -count=1 ./... PASS (full tree, ~4 min with Postgres)
specter parse PASS (40 specs)
specter check PASS
specter sync PASS (all 40 specs at threshold)
```
Per-spec coverage:
Not in this PR
Each becomes its own follow-up that takes a future spec as its contract:
Test plan