Skip to content

Commit bdcfb10

Browse files
feat(slice-bc): framework-at-query-time + runtime wiring specs (+code corrections) (#429)
Five new specs, two v2 amendments, two API addendums, and the matching code corrections to make scheduler + kensa-executor materially implement v2.0.0 in main. The runtime wiring itself (openwatch worker subcommand, daemon orchestration, live Kensa binding) and findings UI are spec'd but not implemented in this PR — those land in follow-up PRs that take these specs as their contract. Architectural correction Kensa's API is Kensa.Scan(ctx, host, rules, opts...) — the caller passes the rules; framework identity is per-rule metadata in rule.References, not a scan input. OpenWatch's v1 spec wrongly modeled scans as framework-scoped: - scheduler.JobPayload carried a FrameworkID HMAC-bound field. - executor.Run took a framework parameter. - Fleet/host APIs had no framework slicing surface. v2 corrects this: jobs are per-host (no framework), executor runs the full applicable rule corpus, framework is a query-time projection on host_rule_state.framework_refs JSONB. New specs (3) system-daemon-orchestration cmd/openwatch serve / migrate / worker / check-config boot+shutdown ordering; 13 ACs; EmitFunc DI invariant; alertrouter MUST subscribe before bus has publishers. system-worker-subcommand SKIP-LOCKED claim, HMAC verify, transient-vs-permanent retry policy, advisory-lock per-host concurrency guard, graceful drain on SIGTERM; 13 ACs; references system-job-queue for retry/dead-letter semantics. frontend-findings-ui First frontend spec; route /hosts/{id} with three regions (identity+liveness, compliance, recent transactions), optional ?framework= URL param, role-gated, axe-core clean; 15 ACs; framework-of-implementation agnostic. Spec amendments (2) system-scheduler 1.0.0 -> 2.0.0 - C-11 HMAC inputs: (host_id, policy_version, enqueued_at). - C-12 NEW: NewService has no defaultFramework param; JobPayload has no FrameworkID field. Source-inspection enforces. - AC-06 reworded; AC-16 NEW (source-inspection). system-kensa-executor 1.0.0 -> 2.0.0 - C-12 Run signature: Run(ctx, hostID, policyVersion). - C-13 Production scanFunc binds pkg/kensa.Default; unwiredScanFunc remains test-only //nolint:unused. - C-14 Result.Outcomes covers every rule Kensa returned; RuleOutcome.FrameworkRefs flattened from rule References. - AC-01 reworded; AC-06 drops framework_unsupported reason; AC-17/18 NEW (source-inspection). API spec addendums (2) api-fleet-observability 1.0.0 -> 1.1.0 - C-07/08 NEW: optional ?framework= filter on every endpoint; exact-key match on framework_refs JSONB; absent / empty value is unfiltered (identical to v1.0.0). - AC-14/15/16/17 NEW (framework filter); AC-18 = renumbered v1.0.0 AC-14 (POST returns 405). api-hosts 1.1.0 -> 1.2.0 - C-07/08 NEW: ?framework= filters compliance_summary; liveness unaffected. - AC-17/18 NEW (framework filter; CIS-only host returns zeros when STIG requested). Code corrections (signature changes, no behavior drift on unfiltered paths) internal/scheduler - JobPayload drops FrameworkID; Encode layout updated. v1 payloads now fail Verify (intended migration behavior — pre-v2 queued jobs are dead-lettered). - NewService drops defaultFramework parameter. - Dispatch body map drops framework_id key. - source_test.go NEW: AC-16 source-inspection. internal/kensa - Result drops FrameworkID (per-rule FrameworkRefs stays). - ScanFunc, Run, emitStarted/Completed/Failure, reportHostKeyUnknown/EvidenceOversize/KensaError all drop framework param. - unwiredScanFunc annotated //nolint:unused per AC-18 — the production binding lands with the worker subcommand PR. - source_test.go: AC-17/18 NEW (Run / ScanFunc signature inspection). internal/fleetrollup - options.go NEW: WithFramework functional option. - service.go: 4 methods now accept optional opts; SQL uses "$N::text IS NULL OR framework_refs ? $N" for unbranched filtering. Empty string -> NULL -> unfiltered. api/openapi.yaml - All 5 fleet endpoints + GET /hosts/{id} accept ?framework=. - Re-ran make generate-api to refresh server.gen.go. internal/server - fleet_handlers.go: 4 handlers thread Framework param into fleetrollup via frameworkOpts() helper. - fleet_helpers.go: frameworkOpts() converts *string to []fleetrollup.Option. - hosts_handlers.go: GetHostByID now takes GetHostByIDParams; threads framework into loadHostComplianceSummary. - hosts_enrichment.go: loadHostComplianceSummary takes framework; SQL filters via JSONB ?-operator. Tests All affected specs at 100% AC coverage: api-fleet-observability 18/18 (was 14/14) api-hosts 18/18 (was 16/16) system-scheduler 16/16 (was 15/15) system-kensa-executor 18/18 (was 16/16) system-fleet-rollup 13/13 (unchanged — already 100%) Existing test files updated for the new signatures; 4 new tests added for fleet AC-14/15/16/17 + 2 for hosts AC-17/18. Verification go vet ./... clean go test -race -count=1 PASS (full tree, with Postgres, 4 minutes) specter parse PASS (44 specs total — 3 new) specter check PASS specter coverage all touched specs at 100% Not in this PR (specs deliberately reference but implement nothing): - openwatch worker subcommand binary - Slice B package wiring into cmd/openwatch serve (scheduler tick, liveness loop, alertrouter) - Live Kensa binding (pkg/kensa.Default-backed ScanFunc) - Frontend code (only the spec is in this PR) Specs touched system-scheduler v1.0.0 -> v2.0.0 system-kensa-executor v1.0.0 -> v2.0.0 api-fleet-observability v1.0.0 -> v1.1.0 api-hosts v1.1.0 -> v1.2.0 system-daemon-orchestration NEW system-worker-subcommand NEW frontend-findings-ui NEW
1 parent dc8e42f commit bdcfb10

24 files changed

Lines changed: 1200 additions & 493 deletions

.secrets.baseline

Lines changed: 197 additions & 190 deletions
Large diffs are not rendered by default.

app/api/openapi.yaml

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -690,6 +690,10 @@ paths:
690690
x-required-permission: host:read
691691
parameters:
692692
- {name: id, in: path, required: true, schema: {type: string, format: uuid}}
693+
- name: framework
694+
in: query
695+
description: Filter compliance_summary to host_rule_state rows whose framework_refs contains this key (v1.2.0). Liveness is unaffected.
696+
schema: {type: string}
693697
responses:
694698
'200':
695699
description: Host detail (includes liveness + compliance_summary)
@@ -797,11 +801,16 @@ paths:
797801
'400':
798802
$ref: '#/components/responses/BadRequest'
799803

800-
# Spec: app/specs/api/fleet-observability.spec.yaml
804+
# Spec: app/specs/api/fleet-observability.spec.yaml (v1.1.0)
801805
/api/v1/fleet/score:
802806
get:
803807
operationId: getFleetScore
804808
summary: Fleet-wide compliance score (passing/total)
809+
parameters:
810+
- name: framework
811+
in: query
812+
description: Filter to host_rule_state rows whose framework_refs contains this key (v1.1.0).
813+
schema: {type: string}
805814
responses:
806815
'200':
807816
description: Fleet score
@@ -845,6 +854,10 @@ paths:
845854
operationId: getFleetTopFailingRules
846855
summary: Rules with the most failing hosts (descending)
847856
parameters:
857+
- name: framework
858+
in: query
859+
description: Filter to host_rule_state rows whose framework_refs contains this key (v1.1.0).
860+
schema: {type: string}
848861
- name: limit
849862
in: query
850863
schema: {type: integer, minimum: 1, maximum: 1000, default: 50}
@@ -871,6 +884,10 @@ paths:
871884
operationId: getFleetTopFailingHosts
872885
summary: Hosts with the most failing rules (descending)
873886
parameters:
887+
- name: framework
888+
in: query
889+
description: Filter to host_rule_state rows whose framework_refs contains this key (v1.1.0).
890+
schema: {type: string}
874891
- name: limit
875892
in: query
876893
schema: {type: integer, minimum: 1, maximum: 1000, default: 50}
@@ -897,6 +914,10 @@ paths:
897914
operationId: getFleetRecentChanges
898915
summary: Recent transactions (state changes), newest first
899916
parameters:
917+
- name: framework
918+
in: query
919+
description: Filter to transactions whose framework_refs contains this key (v1.1.0).
920+
schema: {type: string}
900921
- name: since
901922
in: query
902923
description: Filter to transactions strictly newer than this RFC3339 timestamp
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
package fleetrollup
2+
3+
// Option is a functional option for fleetrollup query methods.
4+
// Spec api-fleet-observability v1.1.0 C-07/C-08, api-hosts v1.2.0 C-07/C-08.
5+
type Option func(*queryOpts)
6+
7+
// queryOpts is the internal options bag.
8+
type queryOpts struct {
9+
// framework, when non-empty, filters results to rows whose
10+
// framework_refs JSONB contains the given key. Empty string =
11+
// unfiltered (legacy v1.0.0 behavior).
12+
framework string
13+
}
14+
15+
// WithFramework filters fleet aggregations and per-host queries to
16+
// rows whose framework_refs JSONB contains the given key (e.g.,
17+
// "cis_rhel9_v2.0.0"). Empty string is a no-op (unfiltered).
18+
//
19+
// The match is exact-key on the top-level JSONB object — no fuzzy
20+
// matching, no case folding.
21+
func WithFramework(framework string) Option {
22+
return func(o *queryOpts) {
23+
o.framework = framework
24+
}
25+
}
26+
27+
// applyOpts collects opts into a queryOpts bag.
28+
func applyOpts(opts []Option) queryOpts {
29+
var o queryOpts
30+
for _, fn := range opts {
31+
fn(&o)
32+
}
33+
return o
34+
}

app/internal/fleetrollup/service.go

Lines changed: 44 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -29,14 +29,23 @@ func NewService(pool *pgxpool.Pool) *Service {
2929
//
3030
// On an empty fleet, returns Score{0, 0} with nil error (NOT
3131
// pgx.ErrNoRows). Spec AC-01 / AC-02 / AC-03.
32-
func (s *Service) FleetComplianceScore(ctx context.Context) (Score, error) {
32+
//
33+
// WithFramework filters to rows whose framework_refs JSONB contains
34+
// the given key (api-fleet-observability v1.1.0 AC-14).
35+
func (s *Service) FleetComplianceScore(ctx context.Context, opts ...Option) (Score, error) {
36+
o := applyOpts(opts)
37+
38+
// Single SQL covers both the unfiltered (framework="") and
39+
// framework-filtered paths via NULLIF + ? operator. PostgreSQL
40+
// short-circuits to TRUE when $1 is NULL (no framework given).
3341
const q = `
3442
SELECT
3543
COUNT(*) FILTER (WHERE current_status = 'pass') AS passing,
3644
COUNT(*) FILTER (WHERE current_status IN ('pass','fail')) AS evaluations
37-
FROM host_rule_state`
45+
FROM host_rule_state
46+
WHERE ($1::text IS NULL OR framework_refs ? $1)`
3847
var passing, evaluations int64
39-
if err := s.pool.QueryRow(ctx, q).Scan(&passing, &evaluations); err != nil {
48+
if err := s.pool.QueryRow(ctx, q, nullableFramework(o.framework)).Scan(&passing, &evaluations); err != nil {
4049
if errors.Is(err, pgx.ErrNoRows) {
4150
// Filtered COUNT never returns NoRows but defend anyway.
4251
return Score{}, nil
@@ -52,6 +61,16 @@ func (s *Service) FleetComplianceScore(ctx context.Context) (Score, error) {
5261
}, nil
5362
}
5463

64+
// nullableFramework returns nil for the empty string (so the query's
65+
// "$1::text IS NULL OR …" short-circuits to TRUE = unfiltered) or the
66+
// string otherwise. Keeps the SQL constant across both code paths.
67+
func nullableFramework(framework string) any {
68+
if framework == "" {
69+
return nil
70+
}
71+
return framework
72+
}
73+
5574
// FleetLiveness returns host counts by reachability status. The four
5675
// buckets sum to the count of active (deleted_at IS NULL) hosts. Hosts
5776
// that have a row in `hosts` but no row in `host_liveness` are counted
@@ -79,19 +98,24 @@ func (s *Service) FleetLiveness(ctx context.Context) (LivenessRollup, error) {
7998
// descending order. limit is coerced to [0, MaxLimit]. A coerced
8099
// limit of 0 returns an empty slice with nil error (no query
81100
// executed). Spec AC-05 / AC-06 / AC-10.
82-
func (s *Service) TopFailingRules(ctx context.Context, limit int) ([]RuleFailureRollup, error) {
101+
//
102+
// WithFramework filters to rows whose framework_refs JSONB contains
103+
// the given key (api-fleet-observability v1.1.0 AC-15).
104+
func (s *Service) TopFailingRules(ctx context.Context, limit int, opts ...Option) ([]RuleFailureRollup, error) {
83105
n := clampLimit(limit)
84106
if n == 0 {
85107
return []RuleFailureRollup{}, nil
86108
}
109+
o := applyOpts(opts)
87110
const q = `
88111
SELECT rule_id, COUNT(*)::BIGINT AS failing_host_count
89112
FROM host_rule_state
90113
WHERE current_status = 'fail'
114+
AND ($2::text IS NULL OR framework_refs ? $2)
91115
GROUP BY rule_id
92116
ORDER BY failing_host_count DESC, rule_id ASC
93117
LIMIT $1`
94-
rows, err := s.pool.Query(ctx, q, n)
118+
rows, err := s.pool.Query(ctx, q, n, nullableFramework(o.framework))
95119
if err != nil {
96120
return nil, fmt.Errorf("fleetrollup: TopFailingRules: %w", err)
97121
}
@@ -113,19 +137,24 @@ func (s *Service) TopFailingRules(ctx context.Context, limit int) ([]RuleFailure
113137

114138
// TopFailingHosts returns the hosts with the most failing rules, in
115139
// descending order. limit is coerced to [0, MaxLimit]. Spec AC-07 / AC-10.
116-
func (s *Service) TopFailingHosts(ctx context.Context, limit int) ([]HostFailureRollup, error) {
140+
//
141+
// WithFramework filters to rows whose framework_refs JSONB contains
142+
// the given key (api-fleet-observability v1.1.0).
143+
func (s *Service) TopFailingHosts(ctx context.Context, limit int, opts ...Option) ([]HostFailureRollup, error) {
117144
n := clampLimit(limit)
118145
if n == 0 {
119146
return []HostFailureRollup{}, nil
120147
}
148+
o := applyOpts(opts)
121149
const q = `
122150
SELECT host_id, COUNT(*)::BIGINT AS failing_rule_count
123151
FROM host_rule_state
124152
WHERE current_status = 'fail'
153+
AND ($2::text IS NULL OR framework_refs ? $2)
125154
GROUP BY host_id
126155
ORDER BY failing_rule_count DESC, host_id ASC
127156
LIMIT $1`
128-
rows, err := s.pool.Query(ctx, q, n)
157+
rows, err := s.pool.Query(ctx, q, n, nullableFramework(o.framework))
129158
if err != nil {
130159
return nil, fmt.Errorf("fleetrollup: TopFailingHosts: %w", err)
131160
}
@@ -149,25 +178,29 @@ func (s *Service) TopFailingHosts(ctx context.Context, limit int) ([]HostFailure
149178
// occurred_at DESC. since filters to rows strictly newer than the
150179
// given timestamp. Pass time.Time{} (the zero value) to disable the
151180
// cursor. limit is coerced to [0, MaxLimit]. Spec AC-08 / AC-10.
152-
func (s *Service) RecentChanges(ctx context.Context, since time.Time, limit int) ([]TransactionRollup, error) {
181+
//
182+
// WithFramework filters to transactions whose framework_refs JSONB
183+
// contains the given key (api-fleet-observability v1.1.0 AC-16).
184+
func (s *Service) RecentChanges(ctx context.Context, since time.Time, limit int, opts ...Option) ([]TransactionRollup, error) {
153185
n := clampLimit(limit)
154186
if n == 0 {
155187
return []TransactionRollup{}, nil
156188
}
189+
o := applyOpts(opts)
157190
// The "$2::timestamptz IS NULL" idiom lets us encode "no cursor"
158-
// without branching the SQL. Pass NULL for since to skip the
159-
// filter, otherwise apply the strict >.
191+
// without branching the SQL. Same trick for framework via $3::text.
160192
const q = `
161193
SELECT id, host_id, rule_id, status, COALESCE(severity, ''), change_kind, occurred_at
162194
FROM transactions
163195
WHERE ($2::timestamptz IS NULL OR occurred_at > $2)
196+
AND ($3::text IS NULL OR framework_refs ? $3)
164197
ORDER BY occurred_at DESC
165198
LIMIT $1`
166199
var sinceParam any
167200
if !since.IsZero() {
168201
sinceParam = since
169202
}
170-
rows, err := s.pool.Query(ctx, q, n, sinceParam)
203+
rows, err := s.pool.Query(ctx, q, n, sinceParam, nullableFramework(o.framework))
171204
if err != nil {
172205
return nil, fmt.Errorf("fleetrollup: RecentChanges: %w", err)
173206
}

0 commit comments

Comments
 (0)