Skip to content

feat(daemon): wire Slice B into cmd/openwatch serve (eventbus + alertrouter + liveness loop)#430

Merged
remyluslosius merged 1 commit into
mainfrom
feat/daemon-orchestration
May 30, 2026
Merged

feat(daemon): wire Slice B into cmd/openwatch serve (eventbus + alertrouter + liveness loop)#430
remyluslosius merged 1 commit into
mainfrom
feat/daemon-orchestration

Conversation

@remyluslosius
Copy link
Copy Markdown
Contributor

Why

The Slice B packages have been libraries with no runtime wiring since they were merged. `openwatch serve` boots HTTP and the existing in-process diagnostics worker — nothing else. This PR turns three of those libraries into actual runtime subsystems:

  • `internal/eventbus` — in-process pub/sub
  • `internal/alertrouter` — alert dispatcher with a new stdout channel
  • `internal/liveness` — probe loop, now publishing `HeartbeatPulse` to the bus

After this PR, `openwatch serve` runs the alert router and the liveness probe loop. Booting against a fleet with hosts produces a tickable liveness sweep; transitions fire to the bus; the alert router fans them out to the stdout channel (visible via `journalctl`).

Spec changes

NEW: `system-daemon-orchestration` v1.0.0 (7 ACs)

The trimmed version of the spec pulled out of #429. Scope: what's actually wired and tested in this PR.

Amended: `system-liveness-loop` v1.0.0 → v1.1.0 (+8 ACs)

  • `C-10` / AC-16/17 — `Service.Run(ctx)` blocking loop walks active hosts
  • `C-11` / AC-18 — skips hosts whose `host_backoff_state.suppress_until` is in the future
  • AC-19 — returns within 2s of ctx cancel
  • `C-12` / AC-20/21/22 — publishes `HeartbeatPulse` on state transition; no publish on steady state
  • `C-13` / AC-23 — `NewService(pool, emit, bus)`; nil bus is valid

Amended: `system-drift-detector` v1.0.0 → v1.1.0 (+4 ACs)

  • `C-09` / AC-15/16/17 — publishes `DriftDetected` on non-stable detection; per-severity counts match the audit detail
  • `C-10` / AC-18 — `NewService(pool, emit, thresholds, bus)`; nil bus is valid

Code

Path Change
`internal/alertrouter/channels/stdout/` NEW — slog-based channel; zero external deps
`internal/liveness/service.go` `NewService` gains `*Bus`; new `Service.Run` blocking loop; `publishHeartbeat` on transition; `listProbeTargets` SQL respects backoff
`internal/drift/service.go` `NewService` gains `*Bus`; `publishDrift` on non-stable kind
`cmd/openwatch/main.go` `cmdServe` now boots bus → router → register stdout (wildcard) → router.Start → liveness.NewService → `go liveSvc.Run` → server.Run. Reverse-order shutdown via deferred bus.Shutdown + audit.Shutdown + post-server router.Stop
`cmd/openwatch/source_test.go` NEW source-inspection tests for the daemon-orchestration ACs

Verification

```
go vet ./... clean
go test -race -count=1 ./... PASS (full tree, ~4 min with Postgres)
specter parse PASS (40 specs)
specter check PASS
specter sync PASS (all 40 specs at threshold)
```

Per-spec coverage:

  • `system-daemon-orchestration` 7/7 (NEW)
  • `system-liveness-loop` 23/23 (was 15/15 — +8 v1.1.0 ACs)
  • `system-drift-detector` 18/18 (was 14/14 — +4 v1.1.0 ACs)

Not in this PR

Each becomes its own follow-up that takes a future spec as its contract:

  • `openwatch worker` subcommand (needs `system-worker-subcommand` spec)
  • Scheduler dispatcher tick inside serve (needs DEK accessor + `policy.Schedules` loader)
  • Drift detector loop (worker calls DetectForScan inline post-scan)
  • Live Kensa binding (`system-kensa-executor` v2 AC-18)
  • Slack / email / webhook channel implementations

Test plan

  • CI: `Quality + security gates` pass (vet, lint, vuln, test-race, specter sync)
  • Spot-check: `openwatch serve` against a populated DB; observe `journalctl -u openwatch -g alertrouter.alert.sent` (will be empty until an alert fires, which requires either a liveness transition with a seeded host or the worker firing scans — neither flows in this PR; the channel itself is verified by unit test)

@remyluslosius remyluslosius enabled auto-merge (squash) May 30, 2026 15:45
…router + liveness loop)

Lands the system-daemon-orchestration spec (v1.0.0) and its
implementation: eventbus + alertrouter (with stdout channel) + the
liveness probe loop are now instantiated and started during
openwatch serve boot, in an order that guarantees the alert router
subscribes BEFORE any producer publishes.

Spec additions
  New: system-daemon-orchestration v1.0.0 (7 ACs over 5 constraints).
  Scope deliberately trimmed from the draft pulled out of #429 —
  this PR delivers what's actually wired and tested. Worker
  subcommand, scheduler dispatcher tick, drift detector loop, and
  live Kensa binding remain follow-ups with their own specs.

  Amend: system-liveness-loop 1.0.0 -> 1.1.0
    - C-10/AC-16/AC-17 Service.Run(ctx) blocking loop walks the
      active host inventory at the configured interval.
    - C-11/AC-18 Run skips hosts whose host_backoff_state
      .suppress_until is in the future.
    - AC-19 Run returns within 2s of ctx cancel; in-flight probes
      finish naturally.
    - C-12/AC-20/AC-21/AC-22 publishes typed HeartbeatPulse to the
      eventbus on every state transition; no publish on steady-state.
    - C-13/AC-23 NewService(pool, emit, bus) — nil bus is valid;
      audit emission still fires.

  Amend: system-drift-detector 1.0.0 -> 1.1.0
    - C-09/AC-15/AC-16/AC-17 DetectForScan publishes DriftDetected
      to the eventbus on every non-stable detection; per-severity
      counts match the audit detail produced from the same Report.
    - C-10/AC-18 NewService(pool, emit, thresholds, bus) — nil bus
      is valid.

Implementation
  internal/eventbus, internal/alertrouter — unchanged.

  internal/alertrouter/channels/stdout — NEW subpackage.
    - Channel.Send writes via slog.InfoContext at INFO with
      structured attributes (alert_type, severity, host_id, etc.).
    - Operators see fired alerts in
      `journalctl -u openwatch -g alertrouter.alert.sent`.
    - Zero external SDK deps — preserves system-alert-router AC-13
      "no external SDKs in core" invariant for the boot-default
      channel.

  internal/liveness
    - NewService gained *eventbus.Bus parameter (v1.1.0 C-13).
    - Service.Run(ctx) NEW — blocking loop calling tick() at the
      configured interval; tick walks active hosts and calls
      ProbeHost. WithInterval seam for tests.
    - publishHeartbeat fires on state transition (the same trigger
      as the audit emit).
    - listProbeTargets SQL LEFT-JOINs host_backoff_state and
      excludes hosts whose suppress_until is in the future.

  internal/drift
    - NewService gained *eventbus.Bus parameter (v1.1.0 C-10).
    - publishDrift fires alongside emitDriftDetected when Kind is
      non-stable.

  cmd/openwatch/main.go
    - cmdServe instantiates: bus -> alertrouter (with stdout channel
      registered, wildcard Tags) -> Start -> liveSvc -> go liveSvc.Run
      -> server.Run.
    - Shutdown order is reverse: router.Stop after srv.Run returns;
      bus.Shutdown + audit.Shutdown via defer in reverse order.
    - Source-inspection tests (cmd/openwatch/source_test.go) assert
      the textual boot-sequence ordering in main.go matches the
      spec — eyeballing the file proves correctness, no runtime
      wrapper refactor needed in v1.0.0.

Tests (5 specs at 100%, all under -race)
  system-daemon-orchestration  7/7
  system-liveness-loop         23/23 (was 15/15 — +8 v1.1.0 ACs)
  system-drift-detector        18/18 (was 14/14 — +4 v1.1.0 ACs)
  All AC-16/17/18/19 (liveness Run) tested with Postgres + a
  recording probe that counts per-tick invocations.
  AC-20/21/22 (HeartbeatPulse) tested with an in-memory bus +
  subscriber that asserts presence/absence and field values.
  AC-15/16/17 (DriftDetected) mirror the same pattern.

Verification
  go vet ./...                  clean
  go test -race -count=1 ./...  PASS (full tree, ~4 min with Postgres)
  specter parse                 PASS (40 specs)
  specter check                 PASS
  specter sync                  PASS (all 40 specs at threshold)

Not in this PR (each its own follow-up that takes a future spec as
its contract):
  - openwatch worker subcommand (needs system-worker-subcommand spec)
  - scheduler dispatcher tick inside serve (needs DEK accessor +
    policy.Schedules loader)
  - Drift detector tick (worker calls DetectForScan inline)
  - Live Kensa binding (system-kensa-executor v2 AC-18)
  - Slack / email / webhook channel implementations
@remyluslosius remyluslosius force-pushed the feat/daemon-orchestration branch from 33d781f to 8578d7a Compare May 30, 2026 15:47
@remyluslosius remyluslosius merged commit 744c702 into main May 30, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant