Skip to content

Commit cc52996

Browse files
Merge pull request #159 from kuudori/HYPERFLEET-930
HYPERFLEET-930 - chore: update claude.md context
2 parents c67d056 + c1791af commit cc52996

2 files changed

Lines changed: 172 additions & 177 deletions

File tree

AGENTS.md

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# CLAUDE.md
2+
3+
## Project Identity
4+
5+
HyperFleet Sentinel is a **Kubernetes resource watcher** that polls the HyperFleet API for cluster/nodepool updates, makes orchestration decisions via CEL-based decision logic, and publishes CloudEvents to message brokers. Stateless, horizontally scalable via label-based sharding, delegates all state persistence to the API.
6+
7+
- **Language**: Go 1.25 (see `go.mod`)
8+
- **Messaging**: Broker abstraction (RabbitMQ, GCP Pub/Sub, Stub)
9+
- **API Client**: Generated from [hyperfleet-api-spec](https://github.com/openshift-hyperfleet/hyperfleet-api-spec) — see [openapi/README.md](openapi/README.md)
10+
- **Deployment**: Helm chart in `charts/`
11+
12+
Sentinel is one component in the HyperFleet control plane:
13+
- **API** — persists cluster/nodepool state (source of truth)
14+
- **Sentinel** — watches API, decides when resources need reconciliation, publishes events
15+
- **Adapters** — consume events, execute provisioning/deprovisioning, report back to API
16+
- **Broker** (RabbitMQ or Pub/Sub) — decouples Sentinel from adapters
17+
18+
## Critical First Steps
19+
20+
**Generated OpenAPI client is NOT committed to git.** Before any build, test, or development task:
21+
22+
```bash
23+
make generate # Extracts OpenAPI spec from hyperfleet-api-spec module and generates Go client
24+
```
25+
26+
Setup sequence for a fresh clone:
27+
1. `make generate` — generate OpenAPI client in `pkg/api/openapi/`
28+
2. `make download` — fetch Go dependencies
29+
3. `make build` — build `bin/sentinel` binary
30+
4. `make test` — verify unit tests pass
31+
32+
## Verification
33+
34+
| Command | What it does |
35+
|---|---|
36+
| `make verify` | go vet + format check (fast) |
37+
| `make lint` | golangci-lint (comprehensive) |
38+
| `make test` | all tests (`./...`), writes `coverage.out` profile |
39+
| `make test-unit` | unit tests only — specific internal/ and pkg/ packages |
40+
| `make test-integration` | integration tests with testcontainers (Docker required) |
41+
| `make test-coverage` | runs `make test` then opens HTML coverage report |
42+
| `make test-helm` | Helm chart lint + template validation (10 scenarios) |
43+
| `make test-all` | test + test-integration + test-helm + lint |
44+
45+
Quick feedback: `make verify && make test-unit`. Full pre-push: `make test-all`.
46+
47+
**PR pre-flight order:**
48+
1. `make generate`
49+
2. `make fmt`
50+
3. `make lint`
51+
4. `make test-unit`
52+
5. `make test-integration` — if broker/API changes
53+
6. `make test-helm` — if chart changes
54+
7. Update CHANGELOG.md if the change is user-visible
55+
56+
## Source of Truth
57+
58+
| Topic | Where to look |
59+
|---|---|
60+
| Configuration reference | [docs/config.md](docs/config.md) |
61+
| Metrics definitions | [docs/metrics.md](docs/metrics.md), `internal/metrics/` |
62+
| Local/GKE deployment | [docs/running-sentinel.md](docs/running-sentinel.md) |
63+
| Multi-instance sharding | [docs/multi-instance-deployment.md](docs/multi-instance-deployment.md) |
64+
| Alerts and runbooks | [docs/alerts.md](docs/alerts.md), [docs/runbook.md](docs/runbook.md) |
65+
| Helm values | [charts/values.yaml](charts/values.yaml) |
66+
| Contributing and setup | [CONTRIBUTING.md](CONTRIBUTING.md) |
67+
| OpenAPI client generation | [openapi/README.md](openapi/README.md) |
68+
| Example configs | `configs/dev-example.yaml`, `configs/rabbitmq-example.yaml`, `configs/gcp-pubsub-example.yaml` |
69+
| Broker configuration | `broker.yaml` (loaded by hyperfleet-broker; override path via `BROKER_CONFIG_FILE` env var) |
70+
| CloudEvents / CEL payloads | `internal/payload/` |
71+
| Resource profiling | [docs/resource-profiling.md](docs/resource-profiling.md) |
72+
73+
## Architecture Context
74+
75+
Sentinel's job: **decide when**, not **execute how**. It can be killed and restarted at any time without data loss — this is what makes label-based sharding safe. The `message_decision` config uses CEL expressions to decide when to publish — see `DefaultMessageDecision()` in `internal/config/config.go` for default expressions.
76+
77+
### Key Internal Patterns
78+
- **Config validation fails fast**`Validate()` returns error at startup, `LoadConfig()` propagates to main which exits non-zero
79+
- **Context propagation**`context.Context` threaded through all calls with correlation keys (OpID, TraceID, SpanID, DecisionReason)
80+
- **Health probes**`/healthz` (liveness: stale poll detection), `/readyz` (readiness: broker + first successful poll)
81+
82+
## Code Conventions
83+
84+
### Commit Messages
85+
Format: `HYPERFLEET-### - type: description`
86+
87+
Example:
88+
```
89+
HYPERFLEET-427 - feat: add standard metrics labels
90+
91+
Adds resource_type and resource_selector labels to all
92+
Prometheus metrics for consistent querying.
93+
94+
Co-Authored-By: Claude <noreply@anthropic.com>
95+
```
96+
Co-Authored-By trailer required on all Claude-assisted commits.
97+
98+
### Configuration
99+
- Config struct in `internal/config/config.go` — YAML struct tags, validation via `Validate()`
100+
- All durations use `time.Duration` with YAML `duration` format (e.g., `5s`, `30m`)
101+
- Config precedence (highest wins): CLI flags > env vars (`HYPERFLEET_*`) > YAML file > defaults
102+
- Broker credentials handled separately via `broker.yaml` (or `BROKER_CONFIG_FILE` env var)
103+
104+
### CLI Commands
105+
- `sentinel serve --config config.yaml` — run the service
106+
- `sentinel config-dump --config config.yaml` — print merged config (debug precedence issues)
107+
- `sentinel version` — print version, commit, build date
108+
- Run `sentinel serve --help` for full flag list
109+
110+
### Error Handling
111+
- Log at boundaries (main service loop), not deep in call stack
112+
113+
### Logging
114+
- Custom structured logger in `pkg/logger/` — stdlib only, no external deps
115+
- Interface: `logger.HyperFleetLogger` with `Info()`, `Error()`, `Warn()`, `Debug()`, `V(level)` (verbosity), `Extra()`
116+
- Create via `logger.NewHyperFleetLogger()` — uses global config
117+
- Chaining: `logger.Extra("key", val).Extra("key2", val2).Info("msg")`
118+
- **IMPORTANT: always use `pkg/logger`, never `log/slog` directly**
119+
120+
### CloudEvents Payloads
121+
`message_data` config uses CEL expressions, not static values:
122+
```yaml
123+
message_data:
124+
id: resource.id
125+
kind: resource.kind
126+
href: resource.href
127+
```
128+
CEL context:
129+
- `resource` — cluster/nodepool object from API (id, kind, href, generation, status, labels, etc.)
130+
- `reason` — decision reason string from engine (e.g., `"message decision matched"`, `"message decision result is false"`)
131+
- `condition("Type")` — custom function to look up resource status condition by type name
132+
- `now` — current timestamp
133+
- `timestamp()`, `duration()` — standard CEL time functions
134+
135+
### Testing
136+
- Table-driven tests with plain `if` assertions — no testify
137+
- Mocking via simple interface implementations (e.g., MockPublisher), no gomock
138+
- Unit tests live alongside code: `foo_test.go` next to `foo.go`
139+
- Integration tests in `test/integration/` with `//go:build integration` tag
140+
- Prometheus metrics verified with `prometheus/testutil`
141+
- Run single test: `go test -run TestDecisionEngine ./internal/engine/...`
142+
143+
## Git Workflow
144+
145+
- Branch from `main`, PR back to `main`
146+
- Branch naming: `HYPERFLEET-###-short-description`
147+
- Pre-commit hooks: run `make install-hooks` to install — enforces commit message format (`hyperfleet-commitlint`), Go formatting, linting, and vet
148+
149+
## Project Boundaries
150+
151+
**DO NOT**:
152+
- Add business logic to Sentinel — orchestration decisions only, execution belongs in adapters
153+
- Store state in Sentinel — it is stateless, API is source of truth
154+
- Hardcode the resource polling interval — always use `poll_interval` from config for the main sentinel loop; adding a second resource polling loop bypasses the single-ticker backpressure model
155+
156+
**DO**:
157+
- Update `hyperfleet-api-spec` version in `go.mod` and run `make generate` when API spec changes
158+
- New exported functions require unit tests; new broker/API interactions require integration tests
159+
- Add metrics when adding observable behavior — see [docs/metrics.md](docs/metrics.md) for conventions
160+
- Convention: `message_data` should include `id`, `kind`, `href` fields (not enforced by validation, but expected by downstream adapters) — see `configs/dev-example.yaml`
161+
- Use broker abstraction (`hyperfleet-broker`) — never import RabbitMQ/Pub/Sub clients directly
162+
163+
## Gotchas
164+
165+
- **`make generate` is mandatory** — build and tests fail without it; generated code is gitignored
166+
- **`pkg/api/openapi/` is read-only** — never hand-edit, always regenerate
167+
- **Broker config comes from `broker.yaml`** (or `BROKER_CONFIG_FILE` env var), not sentinel YAML config — handled by hyperfleet-broker library
168+
- **CEL expressions in `message_data` are compiled at startup** — syntax errors fail fast, but semantic errors (wrong field names on resource) surface at evaluation time
169+
- **Metrics labels must include `resource_type` and `resource_selector`** — see [docs/metrics.md](docs/metrics.md) for naming conventions
170+
- **Metrics use `sync.Once` registration** — call `ResetSentinelMetrics()` in tests to avoid duplicate registration panics
171+
- **No testify** — project uses plain Go assertions and table-driven tests; don't introduce testify

CLAUDE.md

Lines changed: 1 addition & 177 deletions
Original file line numberDiff line numberDiff line change
@@ -1,177 +1 @@
1-
# CLAUDE.md
2-
3-
## Project Identity
4-
5-
HyperFleet Sentinel is a **Kubernetes resource watcher** that polls the HyperFleet API for cluster/nodepool updates, makes orchestration decisions based on max age intervals, and publishes CloudEvents to message brokers. It is stateless, horizontally scalable via label-based sharding, and delegates all state persistence to the API.
6-
7-
- **Language**: Go 1.25+
8-
- **Messaging**: Broker abstraction supporting RabbitMQ, GCP Pub/Sub, and Stub implementations
9-
- **API Client**: Generated from the [hyperfleet-api-spec](https://github.com/openshift-hyperfleet/hyperfleet-api-spec) Go module — see [openapi/README.md](openapi/README.md)
10-
- **Deployment**: Helm chart with PodMonitoring (GKE) and ServiceMonitor (Prometheus Operator)
11-
12-
## Critical First Steps
13-
14-
**Generated OpenAPI client is NOT committed to git.** Before any build, test, or development task:
15-
16-
```bash
17-
make generate # Extracts OpenAPI spec from hyperfleet-api-spec module and generates Go client
18-
```
19-
20-
Setup sequence for a fresh clone:
21-
1. `make generate` — generate OpenAPI client in `pkg/api/openapi/`
22-
2. `make download` — fetch Go dependencies
23-
3. `make build` — build `bin/sentinel` binary
24-
4. `make test` — verify unit tests pass
25-
26-
## Verification Commands
27-
28-
| Command | What it does |
29-
|---|---|
30-
| `make verify` | go vet + format check (fast) |
31-
| `make lint` | golangci-lint (comprehensive) |
32-
| `make test` | unit tests only (no external deps) |
33-
| `make test-integration` | integration tests with testcontainers (RabbitMQ, Pub/Sub) |
34-
| `make test-helm` | Helm chart lint and validation |
35-
| `make test-all` | lint + unit + integration + helm tests |
36-
37-
Use `make verify && make test` for fast local feedback. Use `make test-all` before pushing.
38-
39-
## Code Conventions
40-
41-
### Commit Messages
42-
Format: `HYPERFLEET-### - type: description`
43-
44-
Example:
45-
```
46-
HYPERFLEET-427 - feat: add standard metrics labels
47-
48-
Adds resource_type and resource_selector labels to all
49-
Prometheus metrics for consistent querying.
50-
51-
Co-Authored-By: Claude <noreply@anthropic.com>
52-
```
53-
54-
### Import Ordering
55-
1. Standard library
56-
2. External packages (`github.com/google/cel-go`, `github.com/prometheus/client_golang`)
57-
3. HyperFleet packages (`github.com/openshift-hyperfleet/hyperfleet-broker`, etc.)
58-
4. Internal packages (`github.com/openshift-hyperfleet/hyperfleet-sentinel/internal/...`)
59-
60-
### Configuration
61-
- Config lives in `internal/config/config.go` — struct tags for YAML, validation via `Validate()`
62-
- All durations use `time.Duration` with YAML `duration` format (e.g., `5s`, `30m`)
63-
- Environment variables override YAML only for broker credentials (via hyperfleet-broker library)
64-
- Config validation fails fast at startup — never run with invalid config
65-
66-
### Error Handling
67-
- Errors propagate with context: `fmt.Errorf("failed to poll API: %w", err)`
68-
- Log errors at the boundary (main service loop), not deep in call stack
69-
- Use structured logging: `logger.Error("msg", "key", value, "error", err)`
70-
71-
### Metrics
72-
- All metrics defined in `pkg/metrics/metrics.go` — use Prometheus client conventions
73-
- Standard labels on all metrics: `resource_type`, `resource_selector`
74-
- Counter: `_total` suffix (e.g., `hyperfleet_sentinel_events_published_total`)
75-
- Gauge: no suffix (e.g., `hyperfleet_sentinel_pending_resources`)
76-
- Histogram: `_seconds` suffix (e.g., `hyperfleet_sentinel_poll_duration_seconds`)
77-
78-
### Testing
79-
- Unit tests: mock external dependencies (API client, broker), fast, deterministic
80-
- Integration tests: testcontainers for real RabbitMQ/Pub/Sub, slower, covers end-to-end flows
81-
- Test file naming: `*_test.go` alongside implementation
82-
- Integration tests: `test/integration/*_test.go` with build tag `//go:build integration`
83-
84-
### CloudEvents Structure
85-
Events use CEL expressions from `message_data` config to build payloads:
86-
```yaml
87-
message_data:
88-
id: resource.id # CEL expressions, not static values
89-
kind: resource.kind
90-
href: resource.href
91-
generation: resource.generation
92-
```
93-
94-
CEL context includes:
95-
- `resource` — the cluster/nodepool object from API
96-
- `reason` — decision string ("not_reconciled", "reconciled_stale", "reconciled_fresh")
97-
98-
## Project Boundaries
99-
100-
**DO NOT**:
101-
- Modify generated code in `pkg/api/openapi/` — regenerate via `make generate` instead
102-
- Add dependencies without checking licenses (`go-licenses` reports in CI)
103-
- Commit broker credentials or GCP service account keys
104-
- Add business logic to Sentinel — orchestration decisions only, execution belongs in adapters
105-
- Store state in Sentinel — it is stateless, API is the source of truth
106-
- Poll faster than API can handle — respect backpressure and rate limits
107-
108-
**DO**:
109-
- Update `hyperfleet-api-spec` version in `go.mod` and run `make generate` when the API spec changes
110-
- Add tests for new features (unit + integration if broker/API interaction)
111-
- Update Prometheus metrics when adding observable behaviors
112-
- Update CHANGELOG.md for user-visible changes
113-
- Follow the ObjectReference pattern for CloudEvents payloads (id, kind, href)
114-
- Use broker abstraction (`hyperfleet-broker`) — never import RabbitMQ/Pub/Sub clients directly
115-
116-
## Architecture Context
117-
118-
Sentinel is one component in the HyperFleet control plane:
119-
- **API** persists cluster/nodepool state (source of truth)
120-
- **Sentinel** watches API, decides when resources need reconciliation, publishes events
121-
- **Adapters** consume events, execute provisioning/deprovisioning, report status back to API
122-
- **Broker** (RabbitMQ or Pub/Sub) decouples Sentinel from adapters
123-
124-
Sentinel's job: **decide when**, not **execute how**. Max age intervals define "when":
125-
- `max_age_not_reconciled`: poll frequently for unstable resources
126-
- `max_age_reconciled`: poll infrequently for stable resources
127-
128-
## Local Development
129-
130-
```bash
131-
# 1. Start HyperFleet API (see hyperfleet-api repo) and RabbitMQ
132-
docker run -d -p 5672:5672 rabbitmq:3-management
133-
134-
# 2. Configure (see configs/dev-example.yaml and broker.yaml for templates)
135-
# 3. Run Sentinel
136-
./bin/sentinel serve --config config.yaml
137-
138-
# Watch events at http://localhost:15672 (guest/guest)
139-
```
140-
141-
For detailed local/GKE deployment, see [docs/running-sentinel.md](docs/running-sentinel.md).
142-
143-
## Helm Chart
144-
145-
Chart lives in `charts/` with values for:
146-
- Multiple Sentinel instances with different `resource_selector` (sharding)
147-
- Monitoring: PodMonitoring (GKE/GMP) or ServiceMonitor (Prometheus Operator)
148-
- Broker config via ConfigMap (type, topic) + Secret (credentials)
149-
150-
Example: deploy 2 Sentinels watching different shards:
151-
```bash
152-
helm install sentinel-shard-1 ./charts \
153-
--set config.resourceSelector[0].label=shard \
154-
--set config.resourceSelector[0].value=1 \
155-
--set broker.topic=hyperfleet-prod-clusters
156-
157-
helm install sentinel-shard-2 ./charts \
158-
--set config.resourceSelector[0].label=shard \
159-
--set config.resourceSelector[0].value=2 \
160-
--set broker.topic=hyperfleet-prod-clusters
161-
```
162-
163-
Both read from the same API and publish to the same topic, but watch different label-filtered subsets.
164-
165-
## Validation Checklist
166-
167-
Before submitting a PR:
168-
1. `make generate` — ensure OpenAPI client is current
169-
2. `make fmt` — format code
170-
3. `make verify` — vet and format check
171-
4. `make lint` — pass golangci-lint
172-
5. `make test` — pass unit tests
173-
6. `make test-integration` — pass integration tests (if broker/API changes)
174-
7. `make test-helm` — validate Helm chart
175-
8. Update CHANGELOG.md for user-visible changes
176-
9. Add metrics if new observable behavior
177-
10. Commit message follows `HYPERFLEET-### - type: description` format
1+
@AGENTS.md

0 commit comments

Comments
 (0)