|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +## Project Identity |
| 4 | + |
| 5 | +HyperFleet Sentinel is a **Kubernetes resource watcher** that polls the HyperFleet API for cluster/nodepool updates, makes orchestration decisions via CEL-based decision logic, and publishes CloudEvents to message brokers. Stateless, horizontally scalable via label-based sharding, delegates all state persistence to the API. |
| 6 | + |
| 7 | +- **Language**: Go 1.25 (see `go.mod`) |
| 8 | +- **Messaging**: Broker abstraction (RabbitMQ, GCP Pub/Sub, Stub) |
| 9 | +- **API Client**: Generated from [hyperfleet-api-spec](https://github.com/openshift-hyperfleet/hyperfleet-api-spec) — see [openapi/README.md](openapi/README.md) |
| 10 | +- **Deployment**: Helm chart in `charts/` |
| 11 | + |
| 12 | +Sentinel is one component in the HyperFleet control plane: |
| 13 | +- **API** — persists cluster/nodepool state (source of truth) |
| 14 | +- **Sentinel** — watches API, decides when resources need reconciliation, publishes events |
| 15 | +- **Adapters** — consume events, execute provisioning/deprovisioning, report back to API |
| 16 | +- **Broker** (RabbitMQ or Pub/Sub) — decouples Sentinel from adapters |
| 17 | + |
| 18 | +## Critical First Steps |
| 19 | + |
| 20 | +**Generated OpenAPI client is NOT committed to git.** Before any build, test, or development task: |
| 21 | + |
| 22 | +```bash |
| 23 | +make generate # Extracts OpenAPI spec from hyperfleet-api-spec module and generates Go client |
| 24 | +``` |
| 25 | + |
| 26 | +Setup sequence for a fresh clone: |
| 27 | +1. `make generate` — generate OpenAPI client in `pkg/api/openapi/` |
| 28 | +2. `make download` — fetch Go dependencies |
| 29 | +3. `make build` — build `bin/sentinel` binary |
| 30 | +4. `make test` — verify unit tests pass |
| 31 | + |
| 32 | +## Verification |
| 33 | + |
| 34 | +| Command | What it does | |
| 35 | +|---|---| |
| 36 | +| `make verify` | go vet + format check (fast) | |
| 37 | +| `make lint` | golangci-lint (comprehensive) | |
| 38 | +| `make test` | all tests (`./...`), writes `coverage.out` profile | |
| 39 | +| `make test-unit` | unit tests only — specific internal/ and pkg/ packages | |
| 40 | +| `make test-integration` | integration tests with testcontainers (Docker required) | |
| 41 | +| `make test-coverage` | runs `make test` then opens HTML coverage report | |
| 42 | +| `make test-helm` | Helm chart lint + template validation (10 scenarios) | |
| 43 | +| `make test-all` | test + test-integration + test-helm + lint | |
| 44 | + |
| 45 | +Quick feedback: `make verify && make test-unit`. Full pre-push: `make test-all`. |
| 46 | + |
| 47 | +**PR pre-flight order:** |
| 48 | +1. `make generate` |
| 49 | +2. `make fmt` |
| 50 | +3. `make lint` |
| 51 | +4. `make test-unit` |
| 52 | +5. `make test-integration` — if broker/API changes |
| 53 | +6. `make test-helm` — if chart changes |
| 54 | +7. Update CHANGELOG.md if the change is user-visible |
| 55 | + |
| 56 | +## Source of Truth |
| 57 | + |
| 58 | +| Topic | Where to look | |
| 59 | +|---|---| |
| 60 | +| Configuration reference | [docs/config.md](docs/config.md) | |
| 61 | +| Metrics definitions | [docs/metrics.md](docs/metrics.md), `internal/metrics/` | |
| 62 | +| Local/GKE deployment | [docs/running-sentinel.md](docs/running-sentinel.md) | |
| 63 | +| Multi-instance sharding | [docs/multi-instance-deployment.md](docs/multi-instance-deployment.md) | |
| 64 | +| Alerts and runbooks | [docs/alerts.md](docs/alerts.md), [docs/runbook.md](docs/runbook.md) | |
| 65 | +| Helm values | [charts/values.yaml](charts/values.yaml) | |
| 66 | +| Contributing and setup | [CONTRIBUTING.md](CONTRIBUTING.md) | |
| 67 | +| OpenAPI client generation | [openapi/README.md](openapi/README.md) | |
| 68 | +| Example configs | `configs/dev-example.yaml`, `configs/rabbitmq-example.yaml`, `configs/gcp-pubsub-example.yaml` | |
| 69 | +| Broker configuration | `broker.yaml` (loaded by hyperfleet-broker; override path via `BROKER_CONFIG_FILE` env var) | |
| 70 | +| CloudEvents / CEL payloads | `internal/payload/` | |
| 71 | +| Resource profiling | [docs/resource-profiling.md](docs/resource-profiling.md) | |
| 72 | + |
| 73 | +## Architecture Context |
| 74 | + |
| 75 | +Sentinel's job: **decide when**, not **execute how**. It can be killed and restarted at any time without data loss — this is what makes label-based sharding safe. The `message_decision` config uses CEL expressions to decide when to publish — see `DefaultMessageDecision()` in `internal/config/config.go` for default expressions. |
| 76 | + |
| 77 | +### Key Internal Patterns |
| 78 | +- **Config validation fails fast** — `Validate()` returns error at startup, `LoadConfig()` propagates to main which exits non-zero |
| 79 | +- **Context propagation** — `context.Context` threaded through all calls with correlation keys (OpID, TraceID, SpanID, DecisionReason) |
| 80 | +- **Health probes** — `/healthz` (liveness: stale poll detection), `/readyz` (readiness: broker + first successful poll) |
| 81 | + |
| 82 | +## Code Conventions |
| 83 | + |
| 84 | +### Commit Messages |
| 85 | +Format: `HYPERFLEET-### - type: description` |
| 86 | + |
| 87 | +Example: |
| 88 | +``` |
| 89 | +HYPERFLEET-427 - feat: add standard metrics labels |
| 90 | +
|
| 91 | +Adds resource_type and resource_selector labels to all |
| 92 | +Prometheus metrics for consistent querying. |
| 93 | +
|
| 94 | +Co-Authored-By: Claude <noreply@anthropic.com> |
| 95 | +``` |
| 96 | +Co-Authored-By trailer required on all Claude-assisted commits. |
| 97 | + |
| 98 | +### Configuration |
| 99 | +- Config struct in `internal/config/config.go` — YAML struct tags, validation via `Validate()` |
| 100 | +- All durations use `time.Duration` with YAML `duration` format (e.g., `5s`, `30m`) |
| 101 | +- Config precedence (highest wins): CLI flags > env vars (`HYPERFLEET_*`) > YAML file > defaults |
| 102 | +- Broker credentials handled separately via `broker.yaml` (or `BROKER_CONFIG_FILE` env var) |
| 103 | + |
| 104 | +### CLI Commands |
| 105 | +- `sentinel serve --config config.yaml` — run the service |
| 106 | +- `sentinel config-dump --config config.yaml` — print merged config (debug precedence issues) |
| 107 | +- `sentinel version` — print version, commit, build date |
| 108 | +- Run `sentinel serve --help` for full flag list |
| 109 | + |
| 110 | +### Error Handling |
| 111 | +- Log at boundaries (main service loop), not deep in call stack |
| 112 | + |
| 113 | +### Logging |
| 114 | +- Custom structured logger in `pkg/logger/` — stdlib only, no external deps |
| 115 | +- Interface: `logger.HyperFleetLogger` with `Info()`, `Error()`, `Warn()`, `Debug()`, `V(level)` (verbosity), `Extra()` |
| 116 | +- Create via `logger.NewHyperFleetLogger()` — uses global config |
| 117 | +- Chaining: `logger.Extra("key", val).Extra("key2", val2).Info("msg")` |
| 118 | +- **IMPORTANT: always use `pkg/logger`, never `log/slog` directly** |
| 119 | + |
| 120 | +### CloudEvents Payloads |
| 121 | +`message_data` config uses CEL expressions, not static values: |
| 122 | +```yaml |
| 123 | +message_data: |
| 124 | + id: resource.id |
| 125 | + kind: resource.kind |
| 126 | + href: resource.href |
| 127 | +``` |
| 128 | +CEL context: |
| 129 | +- `resource` — cluster/nodepool object from API (id, kind, href, generation, status, labels, etc.) |
| 130 | +- `reason` — decision reason string from engine (e.g., `"message decision matched"`, `"message decision result is false"`) |
| 131 | +- `condition("Type")` — custom function to look up resource status condition by type name |
| 132 | +- `now` — current timestamp |
| 133 | +- `timestamp()`, `duration()` — standard CEL time functions |
| 134 | + |
| 135 | +### Testing |
| 136 | +- Table-driven tests with plain `if` assertions — no testify |
| 137 | +- Mocking via simple interface implementations (e.g., MockPublisher), no gomock |
| 138 | +- Unit tests live alongside code: `foo_test.go` next to `foo.go` |
| 139 | +- Integration tests in `test/integration/` with `//go:build integration` tag |
| 140 | +- Prometheus metrics verified with `prometheus/testutil` |
| 141 | +- Run single test: `go test -run TestDecisionEngine ./internal/engine/...` |
| 142 | + |
| 143 | +## Git Workflow |
| 144 | + |
| 145 | +- Branch from `main`, PR back to `main` |
| 146 | +- Branch naming: `HYPERFLEET-###-short-description` |
| 147 | +- Pre-commit hooks: run `make install-hooks` to install — enforces commit message format (`hyperfleet-commitlint`), Go formatting, linting, and vet |
| 148 | + |
| 149 | +## Project Boundaries |
| 150 | + |
| 151 | +**DO NOT**: |
| 152 | +- Add business logic to Sentinel — orchestration decisions only, execution belongs in adapters |
| 153 | +- Store state in Sentinel — it is stateless, API is source of truth |
| 154 | +- Hardcode the resource polling interval — always use `poll_interval` from config for the main sentinel loop; adding a second resource polling loop bypasses the single-ticker backpressure model |
| 155 | + |
| 156 | +**DO**: |
| 157 | +- Update `hyperfleet-api-spec` version in `go.mod` and run `make generate` when API spec changes |
| 158 | +- New exported functions require unit tests; new broker/API interactions require integration tests |
| 159 | +- Add metrics when adding observable behavior — see [docs/metrics.md](docs/metrics.md) for conventions |
| 160 | +- Convention: `message_data` should include `id`, `kind`, `href` fields (not enforced by validation, but expected by downstream adapters) — see `configs/dev-example.yaml` |
| 161 | +- Use broker abstraction (`hyperfleet-broker`) — never import RabbitMQ/Pub/Sub clients directly |
| 162 | + |
| 163 | +## Gotchas |
| 164 | + |
| 165 | +- **`make generate` is mandatory** — build and tests fail without it; generated code is gitignored |
| 166 | +- **`pkg/api/openapi/` is read-only** — never hand-edit, always regenerate |
| 167 | +- **Broker config comes from `broker.yaml`** (or `BROKER_CONFIG_FILE` env var), not sentinel YAML config — handled by hyperfleet-broker library |
| 168 | +- **CEL expressions in `message_data` are compiled at startup** — syntax errors fail fast, but semantic errors (wrong field names on resource) surface at evaluation time |
| 169 | +- **Metrics labels must include `resource_type` and `resource_selector`** — see [docs/metrics.md](docs/metrics.md) for naming conventions |
| 170 | +- **Metrics use `sync.Once` registration** — call `ResetSentinelMetrics()` in tests to avoid duplicate registration panics |
| 171 | +- **No testify** — project uses plain Go assertions and table-driven tests; don't introduce testify |
0 commit comments