|
1 | | -# HyperFleet Infrastructure — Agent Guide |
| 1 | +# AGENTS.md |
2 | 2 |
|
3 | | -IaC repo for HyperFleet dev environments. Terraform provisions GCP resources (GKE, Pub/Sub, VPC). Helm deploys components to Kubernetes. No application code. |
| 3 | +## What this repo is |
4 | 4 |
|
| 5 | +Pure infrastructure-as-code. No application code, no compiled artifacts. Provisions HyperFleet dev environments using **Makefile + Helmfile + Terraform**. |
| 6 | + |
| 7 | +`make help` is the canonical entry point — all developer operations go through it. |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Validation / CI commands |
| 12 | + |
| 13 | +```bash |
| 14 | +make ci-validate # validate terraform + lint helm + lint shellcheck |
| 15 | +make ci-dry-run # ci-validate + validate maestro |
| 16 | +``` |
| 17 | + |
| 18 | +Run `ci-validate` before proposing changes. `ci-dry-run` is the full pre-merge check. |
| 19 | + |
| 20 | +Individual checks: |
| 21 | +```bash |
| 22 | +make validate-terraform # terraform init (no backend) + fmt check + validate |
| 23 | +make lint-helm # helm lint all charts under helm/*/ |
| 24 | +make lint-shellcheck # shellcheck all *.sh |
| 25 | +make validate-maestro # renders maestro chart to /dev/null |
| 26 | +``` |
| 27 | + |
| 28 | +Template/dry-run all four Helmfile environments explicitly: |
| 29 | +```bash |
| 30 | +# environment specific |
| 31 | +HELMFILE_ENV=<env> make template-helmfile |
| 32 | +# example: |
| 33 | +HELMFILE_ENV=gcp make template-helmfile |
| 34 | +``` |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## Terraform formatting |
| 39 | + |
| 40 | +Terraform lock file (`terraform/.terraform.lock.hcl`) is **gitignored** — do not commit it. |
| 41 | + |
| 42 | +Format check runs from `terraform/`: |
| 43 | +```bash |
| 44 | +terraform fmt -check -recursive -diff |
| 45 | +terraform fmt -recursive # auto-fix |
| 46 | +``` |
| 47 | + |
| 48 | +Pinned version: `terraform 1.13.1` (asdf, `.tool-versions`). Providers: `hashicorp/google 5.45.2`, `hashicorp/google-beta 5.45.2`, `hashicorp/local 2.9.0`. |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## Generated values — must exist before helmfile deploy |
| 53 | + |
| 54 | +**Do not edit files in `generated-values-from-terraform/` or `generated-values-rabbitmq/` — both directories are auto-generated and gitignored.** |
| 55 | + |
| 56 | +| Env | How values are generated | Required before | |
| 57 | +|-----|--------------------------|-----------------| |
| 58 | +| `gcp` | `make install-terraform` (Terraform writes via `local_file`) | `make install-hyperfleet` | |
| 59 | +| `kind` | `make generate-rabbitmq-values` (shell script) | `make install-hyperfleet` | |
| 60 | +| `e2e-gcp` / `e2e-kind` | Not needed — broker configs hardcoded in helmfile | — | |
| 61 | + |
| 62 | +Helmfile will fail silently or render incorrectly if these files are missing. |
| 63 | + |
| 64 | +`make clean-generated` removes both directories. |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +## Environment variable loading |
| 69 | + |
| 70 | +The Makefile selects the env file based on `HELMFILE_ENV`: |
| 71 | +- contains `gcp` → sources `env.gcp` |
| 72 | +- does not contain `gcp` → sources `env.kind` (so `kind`, `e2e-kind`, etc.) |
| 73 | + |
| 74 | +All variables in those files use `?=`, so **any variable can be overridden on the CLI** and the env file value is ignored: |
| 75 | + |
| 76 | +```bash |
| 77 | +HELMFILE_ENV=kind NAMESPACE=my-namespace REGISTRY=quay.io make install-hyperfleet |
5 | 78 | ``` |
6 | | -Terraform → GKE cluster + Pub/Sub |
7 | | - ↓ |
8 | | -scripts/tf-helm-values.sh → generates broker config YAML |
9 | | - ↓ |
10 | | -Helm charts (via helm-git plugin) → deploy to Kubernetes |
11 | | - ├── API |
12 | | - ├── Sentinels (clusters, nodepools) |
13 | | - ├── Adapters (1, 2, 3) |
14 | | - └── Maestro (server + agent, separate namespace) |
| 79 | + |
| 80 | +For persistent personal overrides, pass variables on the CLI or set them in your shell environment before invoking make. |
| 81 | + |
| 82 | +Key variables: |
| 83 | + |
| 84 | +| Variable | GCP default | kind default | Notes | |
| 85 | +|----------|------------|--------------|-------| |
| 86 | +| `HELMFILE_ENV` | `gcp` | `kind` | Also `e2e-gcp`, `e2e-kind` | |
| 87 | +| `NAMESPACE` | `hyperfleet` | `hyperfleet-local` | e2e envs use `hyperfleet-e2e[-$USER]` | |
| 88 | +| `REGISTRY` | `registry.ci.openshift.org` | `localhost` | | |
| 89 | +| `TF_ENV` | `dev` | N/A | Selects `envs/gke/<TF_ENV>.tfvars` | |
| 90 | +| `BROKER_TYPE` | `googlepubsub` | `rabbitmq` | | |
| 91 | +| `API_IMAGE_TAG` | `latest` | `local` | | |
| 92 | +| `IMAGE_PULL_POLICY` | `Always` | `IfNotPresent` | | |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +## Terraform per-developer setup (GCP, one-time) |
| 97 | + |
| 98 | +```bash |
| 99 | +cd terraform |
| 100 | +cp envs/gke/dev.tfvars.example envs/gke/dev-<username>.tfvars |
| 101 | +cp envs/gke/dev.tfbackend.example envs/gke/dev-<username>.tfbackend |
| 102 | +# Set developer_name = "<username>" in tfvars |
| 103 | +# Set prefix = "terraform/state/dev-<username>" in tfbackend |
15 | 104 | ``` |
16 | 105 |
|
17 | | -## Verification |
| 106 | +These files are gitignored — never commit personal tfvars/tfbackend. Remote state uses GCS bucket `hyperfleet-terraform-state`. |
18 | 107 |
|
19 | | -Run `make help` for all targets. Use these for validation: |
| 108 | +--- |
20 | 109 |
|
21 | | -| Target | What it does | Needs cluster? | |
22 | | -|--------|-------------|----------------| |
23 | | -| `make validate-terraform` | `terraform init -backend=false`, `fmt -check`, `validate` | No | |
24 | | -| `make lint-helm` | `helm lint` all charts | No | |
25 | | -| `make lint-shellcheck` | shellcheck all `.sh` files (requires `shellcheck`; skips silently outside CI if missing) | No | |
26 | | -| `make ci-validate` | All three above combined | No | |
27 | | -| `make validate-helm-charts` | `helm template` render all charts (current `BROKER_TYPE` only) | No | |
28 | | -| `make ci-dry-run` | `ci-validate` + `validate-helm-charts` for both broker types | No | |
29 | | -| `make install-api DRY_RUN=true` | Helm dry-run a single component against live cluster | Yes | |
| 110 | +## Helm charts and dependencies |
30 | 111 |
|
31 | | -**IMPORTANT:** Do NOT use `make install-all DRY_RUN=true` for validation — it runs Terraform first and fails without backend access. Use `make ci-dry-run` for offline validation. |
| 112 | +Two local charts under `helm/`: |
| 113 | +- `helm/maestro/` — umbrella chart; dependencies pulled from `github.com/openshift-online/maestro` via `helm-git` plugin at `ref=main` |
| 114 | +- `helm/rabbitmq/` — dev-only, NOT production-ready (no StatefulSet, hardcoded `guest/guest`) |
32 | 115 |
|
33 | | -**Pre-commit order:** |
| 116 | +`helm/maestro/charts/` is gitignored; `Chart.lock` is committed. The `install-maestro` target runs `helm dependency update` automatically. |
34 | 117 |
|
| 118 | +**Required Helm plugins** (not standard): |
35 | 119 | ```bash |
36 | | -cd terraform && terraform fmt -recursive # auto-fix terraform formatting |
37 | | -make ci-dry-run # full offline validation |
| 120 | +helm plugin install https://github.com/aslafy-z/helm-git |
| 121 | +helm plugin install https://github.com/databus23/helm-diff --verify=false |
38 | 122 | ``` |
39 | 123 |
|
40 | | -## Source of Truth |
| 124 | +--- |
| 125 | + |
| 126 | +## Helmfile environments |
| 127 | + |
| 128 | +Four environments, two broker backends: |
41 | 129 |
|
42 | | -| Topic | File | |
43 | | -|-------|------| |
44 | | -| All make targets and variables | `Makefile` (run `make help`) | |
45 | | -| Terraform variables and defaults | `terraform/variables.tf` | |
46 | | -| Terraform architecture and setup | `terraform/README.md` | |
47 | | -| Development setup and workflow | `CONTRIBUTING.md` | |
48 | | -| Commit message format | `CONTRIBUTING.md` → "Commit Standards" | |
49 | | -| Helm values generation logic | `scripts/tf-helm-values.sh` | |
50 | | -| Chart dependencies and sources | `helm/*/Chart.yaml` | |
51 | | -| Terraform version pin | `.tool-versions` | |
52 | | -| Repo structure and quick start | `README.md` | |
| 130 | +| `HELMFILE_ENV` | Backend | Notes | |
| 131 | +|----------------|---------|-------| |
| 132 | +| `gcp` | Google Pub/Sub | Requires Terraform-generated values | |
| 133 | +| `kind` | RabbitMQ | Requires script-generated values | |
| 134 | +| `e2e-gcp` | Google Pub/Sub | Hardcoded configs, uses `$NAMESPACE` | |
| 135 | +| `e2e-kind` | RabbitMQ | Hardcoded configs, uses `$NAMESPACE` | |
53 | 136 |
|
54 | | -## Two Deployment Paths |
| 137 | +Helmfile uses Go template syntax (`.gotmpl` extension) throughout. |
55 | 138 |
|
56 | | -1. **GCP + Google Pub/Sub** (default): `make install-all` — runs Terraform, configures kubectl, generates Pub/Sub broker config, deploys everything. |
57 | | -2. **RabbitMQ** (any Kubernetes): `make install-all-rabbitmq` — no Terraform, deploys RabbitMQ manifest, generates RabbitMQ broker config, deploys everything. |
| 139 | +--- |
58 | 140 |
|
59 | | -## Key Makefile Defaults |
| 141 | +## Sibling repos |
60 | 142 |
|
61 | | -Most commonly overridden variables with actual defaults. Run `make help` for the complete list: |
| 143 | +Helm charts for `hyperfleet-api`, `hyperfleet-sentinel`, and `hyperfleet-adapter` live in their respective sibling repos and are pulled at deploy time via `helm-git`. The `CHART_ORG` and `API_CHART_REF` variables control which org/ref is used. |
62 | 144 |
|
63 | | -| Variable | Default | |
64 | | -|----------|---------| |
65 | | -| `NAMESPACE` | `hyperfleet` | |
66 | | -| `MAESTRO_NS` | `maestro` | |
67 | | -| `BROKER_TYPE` | `googlepubsub` | |
68 | | -| `REGISTRY` | `registry.ci.openshift.org` | |
69 | | -| `*_IMAGE_TAG` | `latest` | |
70 | | -| `*_CHART_REF` | `main` (API, Sentinel, Adapter only — Maestro has no `CHART_REF` variable) | |
71 | | -| `CHART_ORG` | `openshift-hyperfleet` | |
72 | | -| `GCP_PROJECT_ID` | `hcm-hyperfleet` | |
73 | | -| `TF_ENV` | `dev` | |
| 145 | +For kind image builds, `PROJECTS_DIR` must point to the parent directory containing those repos (default: `~/openshift-hyperfleet`). |
74 | 146 |
|
75 | | -To pin Maestro's chart version, edit `helm/maestro/Chart.yaml` directly. |
| 147 | +--- |
76 | 148 |
|
77 | | -## Conventions |
| 149 | +## No CI workflows in this repo |
78 | 150 |
|
79 | | -### Makefile |
80 | | -- Prerequisite checks use `check-*` naming prefix |
| 151 | +There is no `.github/workflows/`. CI is managed by **Prow** (OpenShift CI). The `ci-validate`, `ci-dry-run`, `ci-test`, and `ci-cleanup` Make targets are designed to be called by Prow. PR approval is enforced via `OWNERS`. |
81 | 152 |
|
82 | | -### Helm Charts |
83 | | -- Charts are umbrella charts — actual chart source lives in component repos, pulled via helm-git plugin |
84 | | -- `set-chart-ref` macro in Makefile rewrites `Chart.yaml` repository URL and `?ref=` during install and validation targets — do not manually edit the `?ref=` parameter in api, sentinel, or adapter charts |
85 | | -- Maestro chart ref is NOT managed by `set-chart-ref` or `CHART_ORG` — its chart source is hardcoded to `openshift-online/maestro` |
86 | | -- `CHART_ORG` and `*_CHART_REF` control which GitHub org/ref api/sentinel/adapter charts are pulled from |
| 153 | +--- |
87 | 154 |
|
88 | | -### Terraform |
89 | | -- All variables need `description` in `variables.tf` |
90 | | -- All outputs need `description` in `outputs.tf` |
91 | | -- Backend config files: `dev-*.tfvars`, `dev-*.tfbackend`, `dev.tfvars`, and `dev.tfbackend` in `terraform/envs/gke/` are gitignored. Exception: `dev-prow.*` files are committed (shared CI cluster config) |
92 | | -- Copy from `.example` files for personal configs |
| 155 | +## Common gotchas |
93 | 156 |
|
94 | | -## Boundaries |
| 157 | +**`generate-rabbitmq-values` only works for `HELMFILE_ENV=kind`** |
| 158 | +Running it for any other env silently no-ops. E2E envs (`e2e-kind`, `e2e-gcp`) have broker configs hardcoded in helmfile and need no generated files. |
95 | 159 |
|
96 | | -**IMPORTANT: Do NOT** |
97 | | -- Edit files in `generated-values-from-terraform/` — they are created by `scripts/tf-helm-values.sh` and overwritten on each run |
98 | | -- Commit personal `dev-*.tfvars` or `dev-*.tfbackend` files (gitignored, but `dev-prow.*` is an exception) |
99 | | -- Hardcode GCP project IDs — use `GCP_PROJECT_ID` variable |
100 | | -- Add Makefile targets without `## Description` comment (powers `make help`) and `.PHONY` declaration |
101 | | -- Add external tool dependencies without a matching `check-*` prerequisite target |
102 | | -- Use `make install-all` for offline validation (use `make ci-dry-run`) |
| 160 | +**`check-kubectl-context` enforces context shape, not just env** |
| 161 | +`make install-hyperfleet` (and most helmfile targets) run `check-kubectl-context`, which hard-fails if your current kubectl context doesn't contain `gke_` (for gcp envs) or `kind-` (for kind envs). Switching `HELMFILE_ENV` without switching your kubeconfig context will fail immediately. |
103 | 162 |
|
104 | | -## Gotchas |
| 163 | +**`install-maestro` installs the AppliedManifestWorks CRD manually** |
| 164 | +The upstream Maestro Helm chart CRD install is broken. `install-maestro` works around this by applying the CRD directly from `open-cluster-management-io/api` before the chart, and sets `--set agent.installWorkCRDs=false`. Do not remove or reorder these steps. |
105 | 165 |
|
106 | | -1. **helm-git plugin required**: Helm charts pull from GitHub repos via `git+https://` URLs. Without the helm-git plugin, `helm dependency update` fails silently or with cryptic errors. Check with `make check-helm`. |
| 166 | +**Terraform state lock is always disabled** |
| 167 | +`make install-terraform` and `make destroy-terraform` both pass `-lock=false`. If a previous apply left `terraform/errored.tfstate` (currently present in this repo), resolve it before re-running — Terraform may use it as a fallback. |
107 | 168 |
|
108 | | -2. **`set-chart-ref` modifies Chart.yaml in-place**: The Makefile's `set-chart-ref` macro rewrites the `?ref=` parameter and org in `helm/{api,sentinel-*,adapter*}/Chart.yaml` during install and validation targets (`install-*`, `validate-helm-charts`, `ci-dry-run`). These changes show up as dirty in `git status`. This is by design — chart refs are pinned at runtime, not at commit time. Maestro charts are not affected. |
| 169 | +**`validate-terraform` uses no backend** |
| 170 | +`make validate-terraform` runs `terraform init -backend=false`. It validates syntax only; it does not test connectivity to GCS or check that provider credentials work. |
109 | 171 |
|
110 | | -3. **Maestro uses a separate namespace**: Maestro deploys to `$(MAESTRO_NS)` (default: `maestro`), not `$(NAMESPACE)`. Both namespaces are created automatically by `check-namespace`/`check-maestro-namespace`. |
| 172 | +**`shellcheck` is silently skipped locally but required in CI** |
| 173 | +`make lint-shellcheck` skips without error if `shellcheck` is not installed. In CI (`$CI` set), it hard-fails instead. Install it locally to catch issues before push: `brew install shellcheck`. |
111 | 174 |
|
112 | | -4. **`DRY_RUN` only affects Helm**: The `DRY_RUN=true` flag adds `--dry-run` to Helm commands only. Terraform targets (`install-terraform`, `get-credentials`) ignore it. Aggregate targets like `install-all` still run Terraform even with `DRY_RUN=true`. |
| 175 | +**`helm/maestro/charts/` is gitignored; `Chart.lock` is not** |
| 176 | +Running `helm dependency update helm/maestro` is required before any Maestro install. The `install-maestro` target does this automatically, but running `helm install` or `helm template` on the chart directly will fail if `charts/` is absent. |
113 | 177 |
|
114 | | -5. **Adapter config files are `--set-file` not `--values`**: Adapter install targets use `--set-file` for `adapter-config.yaml` and `adapter-task-config.yaml`. These are loaded as string values, not merged as Helm values. |
| 178 | +**E2E environments share env files with their base environments** |
| 179 | +`HELMFILE_ENV=e2e-kind` sources `env.kind`, and `HELMFILE_ENV=e2e-gcp` sources `env.gcp` — no separate `env.e2e-*` files exist. The Makefile uses substring matching (`findstring gcp`) to choose between the two env files. The distinction between base and e2e environments is in Helmfile only (different adapter configs, hardcoded broker settings). |
| 180 | + |
| 181 | +--- |
| 182 | + |
| 183 | +## Commit message format |
| 184 | + |
| 185 | +``` |
| 186 | +HYPERFLEET-XXX - <type>: <subject> |
| 187 | +``` |
115 | 188 |
|
116 | | -6. **Generated values are conditional**: Install targets only pass `--values $(GENERATED_DIR)/file.yaml` if the file exists (`$(wildcard ...)`). If you skip `tf-helm-values`, components install without broker config. |
| 189 | +Types: `feat`, `fix`, `docs`, `refactor`, `chore`, `test`. No semver releases — infra changes deploy from `main`. |
0 commit comments