From c1791aff9bdb3627b44f99b8945babfb294c350b Mon Sep 17 00:00:00 2001 From: Dmitrii Andreev Date: Thu, 21 May 2026 14:01:23 -0500 Subject: [PATCH] HYPERFLEET-930 - chore: update claude.md context --- AGENTS.md | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++ CLAUDE.md | 178 +----------------------------------------------------- 2 files changed, 172 insertions(+), 177 deletions(-) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..bfb3831 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,171 @@ +# CLAUDE.md + +## Project Identity + +HyperFleet Sentinel is a **Kubernetes resource watcher** that polls the HyperFleet API for cluster/nodepool updates, makes orchestration decisions via CEL-based decision logic, and publishes CloudEvents to message brokers. Stateless, horizontally scalable via label-based sharding, delegates all state persistence to the API. + +- **Language**: Go 1.25 (see `go.mod`) +- **Messaging**: Broker abstraction (RabbitMQ, GCP Pub/Sub, Stub) +- **API Client**: Generated from [hyperfleet-api-spec](https://github.com/openshift-hyperfleet/hyperfleet-api-spec) — see [openapi/README.md](openapi/README.md) +- **Deployment**: Helm chart in `charts/` + +Sentinel is one component in the HyperFleet control plane: +- **API** — persists cluster/nodepool state (source of truth) +- **Sentinel** — watches API, decides when resources need reconciliation, publishes events +- **Adapters** — consume events, execute provisioning/deprovisioning, report back to API +- **Broker** (RabbitMQ or Pub/Sub) — decouples Sentinel from adapters + +## Critical First Steps + +**Generated OpenAPI client is NOT committed to git.** Before any build, test, or development task: + +```bash +make generate # Extracts OpenAPI spec from hyperfleet-api-spec module and generates Go client +``` + +Setup sequence for a fresh clone: +1. `make generate` — generate OpenAPI client in `pkg/api/openapi/` +2. `make download` — fetch Go dependencies +3. `make build` — build `bin/sentinel` binary +4. `make test` — verify unit tests pass + +## Verification + +| Command | What it does | +|---|---| +| `make verify` | go vet + format check (fast) | +| `make lint` | golangci-lint (comprehensive) | +| `make test` | all tests (`./...`), writes `coverage.out` profile | +| `make test-unit` | unit tests only — specific internal/ and pkg/ packages | +| `make test-integration` | integration tests with testcontainers (Docker required) | +| `make test-coverage` | runs `make test` then opens HTML coverage report | +| `make test-helm` | Helm chart lint + template validation (10 scenarios) | +| `make test-all` | test + test-integration + test-helm + lint | + +Quick feedback: `make verify && make test-unit`. Full pre-push: `make test-all`. + +**PR pre-flight order:** +1. `make generate` +2. `make fmt` +3. `make lint` +4. `make test-unit` +5. `make test-integration` — if broker/API changes +6. `make test-helm` — if chart changes +7. Update CHANGELOG.md if the change is user-visible + +## Source of Truth + +| Topic | Where to look | +|---|---| +| Configuration reference | [docs/config.md](docs/config.md) | +| Metrics definitions | [docs/metrics.md](docs/metrics.md), `internal/metrics/` | +| Local/GKE deployment | [docs/running-sentinel.md](docs/running-sentinel.md) | +| Multi-instance sharding | [docs/multi-instance-deployment.md](docs/multi-instance-deployment.md) | +| Alerts and runbooks | [docs/alerts.md](docs/alerts.md), [docs/runbook.md](docs/runbook.md) | +| Helm values | [charts/values.yaml](charts/values.yaml) | +| Contributing and setup | [CONTRIBUTING.md](CONTRIBUTING.md) | +| OpenAPI client generation | [openapi/README.md](openapi/README.md) | +| Example configs | `configs/dev-example.yaml`, `configs/rabbitmq-example.yaml`, `configs/gcp-pubsub-example.yaml` | +| Broker configuration | `broker.yaml` (loaded by hyperfleet-broker; override path via `BROKER_CONFIG_FILE` env var) | +| CloudEvents / CEL payloads | `internal/payload/` | +| Resource profiling | [docs/resource-profiling.md](docs/resource-profiling.md) | + +## Architecture Context + +Sentinel's job: **decide when**, not **execute how**. It can be killed and restarted at any time without data loss — this is what makes label-based sharding safe. The `message_decision` config uses CEL expressions to decide when to publish — see `DefaultMessageDecision()` in `internal/config/config.go` for default expressions. + +### Key Internal Patterns +- **Config validation fails fast** — `Validate()` returns error at startup, `LoadConfig()` propagates to main which exits non-zero +- **Context propagation** — `context.Context` threaded through all calls with correlation keys (OpID, TraceID, SpanID, DecisionReason) +- **Health probes** — `/healthz` (liveness: stale poll detection), `/readyz` (readiness: broker + first successful poll) + +## Code Conventions + +### Commit Messages +Format: `HYPERFLEET-### - type: description` + +Example: +``` +HYPERFLEET-427 - feat: add standard metrics labels + +Adds resource_type and resource_selector labels to all +Prometheus metrics for consistent querying. + +Co-Authored-By: Claude +``` +Co-Authored-By trailer required on all Claude-assisted commits. + +### Configuration +- Config struct in `internal/config/config.go` — YAML struct tags, validation via `Validate()` +- All durations use `time.Duration` with YAML `duration` format (e.g., `5s`, `30m`) +- Config precedence (highest wins): CLI flags > env vars (`HYPERFLEET_*`) > YAML file > defaults +- Broker credentials handled separately via `broker.yaml` (or `BROKER_CONFIG_FILE` env var) + +### CLI Commands +- `sentinel serve --config config.yaml` — run the service +- `sentinel config-dump --config config.yaml` — print merged config (debug precedence issues) +- `sentinel version` — print version, commit, build date +- Run `sentinel serve --help` for full flag list + +### Error Handling +- Log at boundaries (main service loop), not deep in call stack + +### Logging +- Custom structured logger in `pkg/logger/` — stdlib only, no external deps +- Interface: `logger.HyperFleetLogger` with `Info()`, `Error()`, `Warn()`, `Debug()`, `V(level)` (verbosity), `Extra()` +- Create via `logger.NewHyperFleetLogger()` — uses global config +- Chaining: `logger.Extra("key", val).Extra("key2", val2).Info("msg")` +- **IMPORTANT: always use `pkg/logger`, never `log/slog` directly** + +### CloudEvents Payloads +`message_data` config uses CEL expressions, not static values: +```yaml +message_data: + id: resource.id + kind: resource.kind + href: resource.href +``` +CEL context: +- `resource` — cluster/nodepool object from API (id, kind, href, generation, status, labels, etc.) +- `reason` — decision reason string from engine (e.g., `"message decision matched"`, `"message decision result is false"`) +- `condition("Type")` — custom function to look up resource status condition by type name +- `now` — current timestamp +- `timestamp()`, `duration()` — standard CEL time functions + +### Testing +- Table-driven tests with plain `if` assertions — no testify +- Mocking via simple interface implementations (e.g., MockPublisher), no gomock +- Unit tests live alongside code: `foo_test.go` next to `foo.go` +- Integration tests in `test/integration/` with `//go:build integration` tag +- Prometheus metrics verified with `prometheus/testutil` +- Run single test: `go test -run TestDecisionEngine ./internal/engine/...` + +## Git Workflow + +- Branch from `main`, PR back to `main` +- Branch naming: `HYPERFLEET-###-short-description` +- Pre-commit hooks: run `make install-hooks` to install — enforces commit message format (`hyperfleet-commitlint`), Go formatting, linting, and vet + +## Project Boundaries + +**DO NOT**: +- Add business logic to Sentinel — orchestration decisions only, execution belongs in adapters +- Store state in Sentinel — it is stateless, API is source of truth +- Hardcode the resource polling interval — always use `poll_interval` from config for the main sentinel loop; adding a second resource polling loop bypasses the single-ticker backpressure model + +**DO**: +- Update `hyperfleet-api-spec` version in `go.mod` and run `make generate` when API spec changes +- New exported functions require unit tests; new broker/API interactions require integration tests +- Add metrics when adding observable behavior — see [docs/metrics.md](docs/metrics.md) for conventions +- Convention: `message_data` should include `id`, `kind`, `href` fields (not enforced by validation, but expected by downstream adapters) — see `configs/dev-example.yaml` +- Use broker abstraction (`hyperfleet-broker`) — never import RabbitMQ/Pub/Sub clients directly + +## Gotchas + +- **`make generate` is mandatory** — build and tests fail without it; generated code is gitignored +- **`pkg/api/openapi/` is read-only** — never hand-edit, always regenerate +- **Broker config comes from `broker.yaml`** (or `BROKER_CONFIG_FILE` env var), not sentinel YAML config — handled by hyperfleet-broker library +- **CEL expressions in `message_data` are compiled at startup** — syntax errors fail fast, but semantic errors (wrong field names on resource) surface at evaluation time +- **Metrics labels must include `resource_type` and `resource_selector`** — see [docs/metrics.md](docs/metrics.md) for naming conventions +- **Metrics use `sync.Once` registration** — call `ResetSentinelMetrics()` in tests to avoid duplicate registration panics +- **No testify** — project uses plain Go assertions and table-driven tests; don't introduce testify diff --git a/CLAUDE.md b/CLAUDE.md index b436fc5..43c994c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,177 +1 @@ -# CLAUDE.md - -## Project Identity - -HyperFleet Sentinel is a **Kubernetes resource watcher** that polls the HyperFleet API for cluster/nodepool updates, makes orchestration decisions based on max age intervals, and publishes CloudEvents to message brokers. It is stateless, horizontally scalable via label-based sharding, and delegates all state persistence to the API. - -- **Language**: Go 1.25+ -- **Messaging**: Broker abstraction supporting RabbitMQ, GCP Pub/Sub, and Stub implementations -- **API Client**: Generated from the [hyperfleet-api-spec](https://github.com/openshift-hyperfleet/hyperfleet-api-spec) Go module — see [openapi/README.md](openapi/README.md) -- **Deployment**: Helm chart with PodMonitoring (GKE) and ServiceMonitor (Prometheus Operator) - -## Critical First Steps - -**Generated OpenAPI client is NOT committed to git.** Before any build, test, or development task: - -```bash -make generate # Extracts OpenAPI spec from hyperfleet-api-spec module and generates Go client -``` - -Setup sequence for a fresh clone: -1. `make generate` — generate OpenAPI client in `pkg/api/openapi/` -2. `make download` — fetch Go dependencies -3. `make build` — build `bin/sentinel` binary -4. `make test` — verify unit tests pass - -## Verification Commands - -| Command | What it does | -|---|---| -| `make verify` | go vet + format check (fast) | -| `make lint` | golangci-lint (comprehensive) | -| `make test` | unit tests only (no external deps) | -| `make test-integration` | integration tests with testcontainers (RabbitMQ, Pub/Sub) | -| `make test-helm` | Helm chart lint and validation | -| `make test-all` | lint + unit + integration + helm tests | - -Use `make verify && make test` for fast local feedback. Use `make test-all` before pushing. - -## Code Conventions - -### Commit Messages -Format: `HYPERFLEET-### - type: description` - -Example: -``` -HYPERFLEET-427 - feat: add standard metrics labels - -Adds resource_type and resource_selector labels to all -Prometheus metrics for consistent querying. - -Co-Authored-By: Claude -``` - -### Import Ordering -1. Standard library -2. External packages (`github.com/google/cel-go`, `github.com/prometheus/client_golang`) -3. HyperFleet packages (`github.com/openshift-hyperfleet/hyperfleet-broker`, etc.) -4. Internal packages (`github.com/openshift-hyperfleet/hyperfleet-sentinel/internal/...`) - -### Configuration -- Config lives in `internal/config/config.go` — struct tags for YAML, validation via `Validate()` -- All durations use `time.Duration` with YAML `duration` format (e.g., `5s`, `30m`) -- Environment variables override YAML only for broker credentials (via hyperfleet-broker library) -- Config validation fails fast at startup — never run with invalid config - -### Error Handling -- Errors propagate with context: `fmt.Errorf("failed to poll API: %w", err)` -- Log errors at the boundary (main service loop), not deep in call stack -- Use structured logging: `logger.Error("msg", "key", value, "error", err)` - -### Metrics -- All metrics defined in `pkg/metrics/metrics.go` — use Prometheus client conventions -- Standard labels on all metrics: `resource_type`, `resource_selector` -- Counter: `_total` suffix (e.g., `hyperfleet_sentinel_events_published_total`) -- Gauge: no suffix (e.g., `hyperfleet_sentinel_pending_resources`) -- Histogram: `_seconds` suffix (e.g., `hyperfleet_sentinel_poll_duration_seconds`) - -### Testing -- Unit tests: mock external dependencies (API client, broker), fast, deterministic -- Integration tests: testcontainers for real RabbitMQ/Pub/Sub, slower, covers end-to-end flows -- Test file naming: `*_test.go` alongside implementation -- Integration tests: `test/integration/*_test.go` with build tag `//go:build integration` - -### CloudEvents Structure -Events use CEL expressions from `message_data` config to build payloads: -```yaml -message_data: - id: resource.id # CEL expressions, not static values - kind: resource.kind - href: resource.href - generation: resource.generation -``` - -CEL context includes: -- `resource` — the cluster/nodepool object from API -- `reason` — decision string ("not_reconciled", "reconciled_stale", "reconciled_fresh") - -## Project Boundaries - -**DO NOT**: -- Modify generated code in `pkg/api/openapi/` — regenerate via `make generate` instead -- Add dependencies without checking licenses (`go-licenses` reports in CI) -- Commit broker credentials or GCP service account keys -- Add business logic to Sentinel — orchestration decisions only, execution belongs in adapters -- Store state in Sentinel — it is stateless, API is the source of truth -- Poll faster than API can handle — respect backpressure and rate limits - -**DO**: -- Update `hyperfleet-api-spec` version in `go.mod` and run `make generate` when the API spec changes -- Add tests for new features (unit + integration if broker/API interaction) -- Update Prometheus metrics when adding observable behaviors -- Update CHANGELOG.md for user-visible changes -- Follow the ObjectReference pattern for CloudEvents payloads (id, kind, href) -- Use broker abstraction (`hyperfleet-broker`) — never import RabbitMQ/Pub/Sub clients directly - -## Architecture Context - -Sentinel is one component in the HyperFleet control plane: -- **API** persists cluster/nodepool state (source of truth) -- **Sentinel** watches API, decides when resources need reconciliation, publishes events -- **Adapters** consume events, execute provisioning/deprovisioning, report status back to API -- **Broker** (RabbitMQ or Pub/Sub) decouples Sentinel from adapters - -Sentinel's job: **decide when**, not **execute how**. Max age intervals define "when": -- `max_age_not_reconciled`: poll frequently for unstable resources -- `max_age_reconciled`: poll infrequently for stable resources - -## Local Development - -```bash -# 1. Start HyperFleet API (see hyperfleet-api repo) and RabbitMQ -docker run -d -p 5672:5672 rabbitmq:3-management - -# 2. Configure (see configs/dev-example.yaml and broker.yaml for templates) -# 3. Run Sentinel -./bin/sentinel serve --config config.yaml - -# Watch events at http://localhost:15672 (guest/guest) -``` - -For detailed local/GKE deployment, see [docs/running-sentinel.md](docs/running-sentinel.md). - -## Helm Chart - -Chart lives in `charts/` with values for: -- Multiple Sentinel instances with different `resource_selector` (sharding) -- Monitoring: PodMonitoring (GKE/GMP) or ServiceMonitor (Prometheus Operator) -- Broker config via ConfigMap (type, topic) + Secret (credentials) - -Example: deploy 2 Sentinels watching different shards: -```bash -helm install sentinel-shard-1 ./charts \ - --set config.resourceSelector[0].label=shard \ - --set config.resourceSelector[0].value=1 \ - --set broker.topic=hyperfleet-prod-clusters - -helm install sentinel-shard-2 ./charts \ - --set config.resourceSelector[0].label=shard \ - --set config.resourceSelector[0].value=2 \ - --set broker.topic=hyperfleet-prod-clusters -``` - -Both read from the same API and publish to the same topic, but watch different label-filtered subsets. - -## Validation Checklist - -Before submitting a PR: -1. `make generate` — ensure OpenAPI client is current -2. `make fmt` — format code -3. `make verify` — vet and format check -4. `make lint` — pass golangci-lint -5. `make test` — pass unit tests -6. `make test-integration` — pass integration tests (if broker/API changes) -7. `make test-helm` — validate Helm chart -8. Update CHANGELOG.md for user-visible changes -9. Add metrics if new observable behavior -10. Commit message follows `HYPERFLEET-### - type: description` format +@AGENTS.md