|
| 1 | +# AGENTS.md |
| 2 | + |
| 3 | +This document provides guidance for AI coding agents working with the cluster-ingress-operator repository. |
| 4 | + |
| 5 | +## 1. SYSTEM CONTEXT & TOPOLOGY |
| 6 | +* **Domain:** North-South network traffic management (Layer 7 & Layer 4). |
| 7 | +* **Prerequisites:** CNI (Cluster Network Operator), Pod IP routing, internal CoreDNS. |
| 8 | +* **Dual Architecture Mode:** |
| 9 | + 1. **Legacy/Primary:** HAProxy orchestration via `openshift3/ose-haproxy-router`. |
| 10 | + 2. **Modern/GWAPI:** Gateway API via OpenShift Service Mesh (OSSM / Istio / Envoy). |
| 11 | + |
| 12 | +## 2. STRICT NAMESPACE BOUNDARIES |
| 13 | +* **`openshift-ingress-operator`**: Execution environment for operator logic, control loops, and metrics endpoints. **DO NOT** deploy data-plane operands here. |
| 14 | +* **`openshift-ingress`**: Execution environment for all data-plane workloads. HAProxy pods, `haproxy.cfg` ConfigMaps, TLS secrets, Envoy proxies, and Services **MUST** be strictly confined to this namespace. |
| 15 | + |
| 16 | +## 3. CUSTOM RESOURCE DEFINITION (CRD) MATRIX |
| 17 | + |
| 18 | +### Core Configuration |
| 19 | +* **`Ingress`** (`config.openshift.io/v1`, Cluster-scoped): Defines cluster-wide routing defaults and toggles. |
| 20 | +* **`IngressController`** (`operator.openshift.io/v1`, Namespace-scoped): Primary interface for legacy HAProxy. Controls replicas, deployment affinity, and endpoint publishing. |
| 21 | +* **`DNSRecord`** (`ingress.operator.openshift.io/v1`, Namespace-scoped): Internal abstraction for cloud DNS manipulation. |
| 22 | + |
| 23 | +### Gateway API (GWAPI) |
| 24 | +* **`GatewayClass`**: Trigger resource. Detecting `openshift.io/gateway-controller/v1` initiates OSSM deployment. |
| 25 | +* **`Gateway`**: Defines external proxy instances / ports. Mapped to Envoy proxies. |
| 26 | +* **`HTTPRoute`**: Advanced routing, traffic weighting, and header matching to backend services. |
| 27 | + |
| 28 | +## 4. DIRECTORY TOPOGRAPHY |
| 29 | +* `pkg/operator/controller/ingress/`: Legacy core. Reconciles `IngressController` and manages HAProxy deployment, load balancer services, and status loops. |
| 30 | +* `pkg/operator/controller/dns/`: Fulfills `DNSRecord` resources. Interfaces with AWS/GCP/Azure SDKs. |
| 31 | +* `pkg/operator/controller/gatewayapi/`: GWAPI meta-controller. Manages CRD lifecycles and launches dependent watchers. |
| 32 | +* `pkg/operator/controller/gatewayclass/`: Provisions OSSM and generates the `ServiceMeshControlPlane`. |
| 33 | +* `pkg/operator/controller/certificate/`: Manages automated certificate rotation and expiry. |
| 34 | + |
| 35 | +## 5. TACTICAL DIRECTIVES & CONSTRAINTS |
| 36 | +* **RULE 1: No Silent Failures.** If a cloud API times out or a version conflict occurs, set the expected error condition in `.status.conditions` (`expectedCondition` in `status.go`) and mark the operator as `Degraded`. |
| 37 | +* **RULE 2: OSSM Meta-Management for GWAPI.** The operator does not write Envoy configs directly. It manages the OpenShift Service Mesh Operator by generating a `ServiceMeshControlPlane` (SMCP) resource. |
| 38 | +* **RULE 3: Decouple DNS for GWAPI.** Directly generate `DNSRecord` objects for GWAPI endpoints; do not tie external endpoints for Envoy to the legacy HAProxy `IngressController`. |
| 39 | +* **RULE 4: Finalizer Management.** Explicitly handle the removal of finalizers (e.g., `ingress.openshift.io/operator`) during deletion workflows in `load_balancer_service.go` to prevent infinite `Terminating` states. |
| 40 | +* **RULE 5: Cloud Credential Operator (CCO) Awareness.** Gracefully degrade and report HTTP 403 Forbidden errors if assumed roles lack permissions in "manual mode" where dynamic IAM credentials are disabled. |
| 41 | +* **RULE 6: Immutable HAProxy Template.** Do not inject arbitrary, unsupported HAProxy directives into the base router template. |
| 42 | + |
| 43 | +## 6. EXPLICIT NON-GOALS |
| 44 | +* Managing underlying cloud infrastructure subnets or legacy security groups. |
| 45 | +* Fixing application-level protocol downgrade failures (e.g., HTTP/2 to WebSocket). |
| 46 | +* Generating default TLS certificates for Envoy proxies. |
| 47 | + |
| 48 | +## Project Structure and Repository Layout |
| 49 | + |
| 50 | +``` |
| 51 | +cluster-ingress-operator/ |
| 52 | +├── cmd/ingress-operator/ # Main entry point |
| 53 | +│ ├── main.go # CLI commands (start, render) |
| 54 | +│ ├── render.go # Manifest rendering command |
| 55 | +│ └── start.go # Operator startup and metrics registration |
| 56 | +├── pkg/ |
| 57 | +│ ├── operator/ |
| 58 | +│ │ ├── operator.go # Controller registration |
| 59 | +│ │ ├── controller/ # Reconciliation controllers (sub-packages per controller) |
| 60 | +│ │ ├── client/ # Kubernetes client setup with custom schemes |
| 61 | +│ │ └── config/ # Operator configuration structure |
| 62 | +│ ├── dns/ # DNS provider implementations |
| 63 | +│ │ ├── aws/ # AWS Route 53 |
| 64 | +│ │ ├── azure/ # Azure DNS |
| 65 | +│ │ ├── gcp/ # Google Cloud DNS |
| 66 | +│ │ ├── ibm/ # IBM Cloud DNS (public and private) |
| 67 | +│ │ └── split/ # Meta-provider routing between public/private |
| 68 | +│ └── manifests/ # Kubernetes object manifests used by controllers |
| 69 | +├── manifests/ # CVO manifests (CRDs, RBAC, monitoring) — instantiated by CVO, not used by operator directly |
| 70 | +├── test/ |
| 71 | +│ └── e2e/ # End-to-end integration tests |
| 72 | +├── hack/ # Development and CI scripts |
| 73 | +├── Makefile # Build automation |
| 74 | +└── HACKING.md # Developer documentation |
| 75 | +``` |
| 76 | + |
| 77 | +`pkg/manifests/` contains asset loading utilities (`manifests.go`) that bind Go templates to Kubernetes objects for controller use. |
| 78 | + |
| 79 | +## Feature Development |
| 80 | + |
| 81 | +### Adding a New Controller |
| 82 | + |
| 83 | +Controllers follow this pattern: |
| 84 | + |
| 85 | +1. Create a package in `pkg/operator/controller/<name>/` |
| 86 | +2. Define a `reconciler` struct (for example — not all controllers need all fields; customize to required fields): |
| 87 | + ```go |
| 88 | + type reconciler struct { |
| 89 | + client client.Client |
| 90 | + recorder record.EventRecorder |
| 91 | + cache cache.Cache |
| 92 | + operatorNamespace string |
| 93 | + operandNamespace string |
| 94 | + } |
| 95 | + ``` |
| 96 | +3. Implement `New()` factory function to create controller and register watches |
| 97 | +4. Implement `Reconcile()` method with idempotent `ensure*()` functions. Controllers delegate logic to `ensure<Resource>` methods that handle creation/update of specific resources (e.g., `ensureIngressController`, `ensureIngressDeleted`). |
| 98 | +5. Register the controller in `pkg/operator/operator.go`. Metrics (if any) are registered in `cmd/ingress-operator/start.go`. |
| 99 | + |
| 100 | +See `pkg/operator/controller/ingress/controller_test.go` as a reference for controller test patterns. |
| 101 | + |
| 102 | +### Existing Controllers |
| 103 | + |
| 104 | +Located in `pkg/operator/controller/`: |
| 105 | + |
| 106 | +| Controller | Purpose | |
| 107 | +|------------|---------| |
| 108 | +| `ingress` | Main controller for IngressController resources | |
| 109 | +| `canary` | Health check canary for ingress controllers | |
| 110 | +| `certificate` | TLS certificate management | |
| 111 | +| `certificate-publisher` | Publishes router certs to openshift-config-managed | |
| 112 | +| `clientca-configmap` | Syncs client CA configmaps between namespaces | |
| 113 | +| `configurable-route` | Manages custom route configuration | |
| 114 | +| `crl` | Certificate Revocation List management (deprecated since 4.14, pending removal — NE-2491) | |
| 115 | +| `dns` | DNS record management | |
| 116 | +| `gatewayapi` | Gateway API CRD management | |
| 117 | +| `gatewayclass` | Istio/OSSM installation for Gateway API | |
| 118 | +| `gateway-labeler` | Labels Gateway resources | |
| 119 | +| `gateway-service-dns` | DNS for Gateway services | |
| 120 | +| `ingressclass` | IngressClass resource management | |
| 121 | +| `monitoring-dashboard` | Monitoring dashboard creation | |
| 122 | +| `route-metrics` | Route metrics collection | |
| 123 | +| `status` | ClusterOperator status management | |
| 124 | +| `sync-http-error-code-configmap` | HTTP error code page sync | |
| 125 | + |
| 126 | +### DNS Providers |
| 127 | + |
| 128 | +Located in `pkg/dns/`: |
| 129 | + |
| 130 | +| Provider | Description | |
| 131 | +|----------|-------------| |
| 132 | +| `aws` | AWS Route 53 DNS | |
| 133 | +| `azure` | Azure DNS (with workload identity support) | |
| 134 | +| `gcp` | Google Cloud DNS | |
| 135 | +| `ibm` | IBM Cloud DNS (public CIS and private DNS Services) | |
| 136 | +| `split` | Meta-provider routing between public/private providers | |
| 137 | +| `(fake)` | No-op provider for testing (defined in `pkg/dns/dns.go`) | |
| 138 | + |
| 139 | +DNS providers implement the `dns.Provider` interface: |
| 140 | +- `Ensure(record, zone)` - Create or update DNS record |
| 141 | +- `Delete(record, zone)` - Remove DNS record |
| 142 | +- `Replace(record, zone)` - Replace existing record |
| 143 | + |
| 144 | +## Building |
| 145 | + |
| 146 | +```bash |
| 147 | +make build # Build the operator binary (depends on generate) |
| 148 | +``` |
| 149 | + |
| 150 | +- Uses vendored dependencies (`-mod=vendor`) |
| 151 | +- Requires `CGO_ENABLED=1` |
| 152 | + |
| 153 | +## Running |
| 154 | + |
| 155 | +### Prerequisites |
| 156 | + |
| 157 | +- An OpenShift cluster |
| 158 | +- Admin-scoped `KUBECONFIG` |
| 159 | + |
| 160 | +### Local Execution |
| 161 | + |
| 162 | +```bash |
| 163 | +make run-local # Run operator locally |
| 164 | +ENABLE_CANARY=true make run-local # With canary enabled |
| 165 | +``` |
| 166 | + |
| 167 | +### Remote Deployment |
| 168 | + |
| 169 | +See [HACKING.md](HACKING.md) for: |
| 170 | +- Building and deploying to cluster (`make release-local`) |
| 171 | +- Remote builds on cluster (`make buildconfig`, `make cluster-build`) |
| 172 | + |
| 173 | +## Tests |
| 174 | + |
| 175 | +### Running Tests |
| 176 | + |
| 177 | +```bash |
| 178 | +make test # Run unit tests |
| 179 | +make test-e2e # Run all e2e tests |
| 180 | +make test-e2e TEST="^TestRouter$" # Run specific test |
| 181 | +make test-e2e-list # List available tests |
| 182 | +make gatewayapi-conformance # Gateway API conformance tests |
| 183 | +``` |
| 184 | + |
| 185 | +### Test Framework |
| 186 | + |
| 187 | +- Standard Go testing package |
| 188 | +- `github.com/stretchr/testify/assert` for assertions (e.g., `pkg/dns/aws/dns_test.go`) |
| 189 | +- `google/go-cmp` for deep comparisons |
| 190 | + |
| 191 | +### Test Patterns |
| 192 | + |
| 193 | +- **Table-driven tests**: Use for testing multiple scenarios |
| 194 | +- **Subtests**: Use `t.Run()` with descriptive names for nested tests |
| 195 | +- **Test naming conventions**: |
| 196 | + - `Test_foo` — general test for function `foo` |
| 197 | + - `TestFooFunctionality` — test for specific functionality in `foo` |
| 198 | + |
| 199 | +### Test Organization |
| 200 | + |
| 201 | +- **Unit tests**: Alongside source files as `*_test.go` |
| 202 | +- **E2E tests**: In `test/e2e/` with build tag `// +build e2e` |
| 203 | +- **Parallel tests** (~90): Run concurrently, independent of each other |
| 204 | +- **Serial tests** (~50): Run sequentially, modify cluster-wide resources |
| 205 | + |
| 206 | +### Assertions |
| 207 | + |
| 208 | +```go |
| 209 | +// Use testify/assert for assertions |
| 210 | +assert.NoError(t, err, "failed to create resource") |
| 211 | + |
| 212 | +// Use google/go-cmp for deep comparisons |
| 213 | +if diff := cmp.Diff(expected, actual); diff != "" { |
| 214 | + t.Errorf("mismatch (-want +got):\n%s", diff) |
| 215 | +} |
| 216 | +``` |
| 217 | + |
| 218 | +### Test Helpers |
| 219 | + |
| 220 | +**E2E Utilities** (`test/e2e/util_test.go`): |
| 221 | + |
| 222 | +| Helper | Purpose | |
| 223 | +|--------|---------| |
| 224 | +| `buildEchoPod()` | Creates socat-based echo server pod | |
| 225 | +| `buildCurlPod()` | Creates curl pod for HTTP testing | |
| 226 | +| `waitForHTTPClientCondition()` | Polls HTTP endpoint with retry | |
| 227 | + |
| 228 | +**Operator Test Helpers** (`test/e2e/operator_test.go`): |
| 229 | + |
| 230 | +| Helper | Purpose | |
| 231 | +|--------|---------| |
| 232 | +| `waitForIngressControllerCondition()` | Poll for expected conditions | |
| 233 | +| `waitForDeploymentComplete()` | Wait for deployment rollout | |
| 234 | +| `waitForAvailableReplicas()` | Wait for replica count | |
| 235 | +| `waitForClusterOperatorConditions()` | Poll ClusterOperator status | |
| 236 | +| `deleteIngressController()` | Clean up with timeout | |
| 237 | + |
| 238 | +## Linting |
| 239 | + |
| 240 | +```bash |
| 241 | +make verify |
| 242 | +``` |
| 243 | + |
| 244 | +Runs verification scripts: |
| 245 | +- `hack/verify-gofmt.sh` - gofmt |
| 246 | +- `hack/verify-generated-crd.sh` - Verifies CRDs under `manifests/` match CRDs under `vendor/` |
| 247 | +- `hack/verify-profile-manifests.sh` - Verifies profile-specific manifests (e.g., `02-deployment.yaml` for `ibm-cloud-managed`) are up to date |
| 248 | +- `hack/verify-deps.sh` - Verifies `go mod` vendoring is up to date (`go mod vendor` / `go mod tidy`) |
| 249 | + |
| 250 | +## Additional Makefile Targets |
| 251 | + |
| 252 | +| Target | Description | |
| 253 | +|--------|-------------| |
| 254 | +| `make generate` | Update embedded manifests (operator namespace, ingresscontrollers CRD) used by `ingress-operator render` | |
| 255 | +| `make crd` | Generate CRD YAML files | |
| 256 | +| `make release-local` | Build image and deployment manifests | |
| 257 | +| `make uninstall` | Remove operator from cluster | |
| 258 | +| `make buildconfig` | Create OpenShift BuildConfig for remote builds | |
| 259 | +| `make cluster-build` | Trigger remote cluster build | |
| 260 | +| `make clean` | Remove binaries and generated files | |
| 261 | + |
| 262 | +## Dependencies |
| 263 | + |
| 264 | +Dependencies are vendored. After modifying `go.mod`: |
| 265 | + |
| 266 | +```bash |
| 267 | +go mod tidy |
| 268 | +go mod vendor |
| 269 | +``` |
| 270 | + |
| 271 | +Dependencies: |
| 272 | +- `sigs.k8s.io/controller-runtime` |
| 273 | +- `k8s.io/client-go` |
| 274 | +- `sigs.k8s.io/gateway-api` |
| 275 | +- `github.com/openshift/api` |
| 276 | + |
| 277 | +## Coding Style |
| 278 | + |
| 279 | +### Go Version |
| 280 | + |
| 281 | +Go version is specified in `go.mod`. |
| 282 | + |
| 283 | +### Code Organization |
| 284 | + |
| 285 | +- Controllers in `pkg/operator/controller/<name>/` |
| 286 | +- Each controller has `controller.go`, functional files (e.g., `deployment.go`), and corresponding `*_test.go` files |
| 287 | + |
| 288 | +### Namespace Constants |
| 289 | + |
| 290 | +Defined in `pkg/operator/controller/`: |
| 291 | + |
| 292 | +```go |
| 293 | +DefaultOperatorNamespace = "openshift-ingress-operator" |
| 294 | +DefaultOperandNamespace = "openshift-ingress" |
| 295 | +DefaultCanaryNamespace = "openshift-ingress-canary" |
| 296 | +GlobalMachineSpecifiedConfigNamespace = "openshift-config-managed" |
| 297 | +GlobalUserSpecifiedConfigNamespace = "openshift-config" |
| 298 | +``` |
| 299 | + |
| 300 | +### Naming Functions |
| 301 | + |
| 302 | +Defined in `pkg/operator/controller/names.go`: |
| 303 | + |
| 304 | +| Function | Returns | |
| 305 | +|----------|---------| |
| 306 | +| `RouterDeploymentName(ic)` | `router-<name>` in openshift-ingress | |
| 307 | +| `LoadBalancerServiceName(ic)` | `router-<name>` service | |
| 308 | +| `NodePortServiceName(ic)` | `router-nodeport-<name>` service | |
| 309 | +| `IngressClassName(name)` | `openshift-<name>` IngressClass | |
| 310 | +| `CanaryDaemonSetName()` | Canary daemonset name | |
| 311 | +| `ClientCAConfigMapName(ic)` | `router-client-ca-<name>` | |
| 312 | +| `CRLConfigMapName(ic)` | `router-client-ca-crl-<name>` (deprecated — see crl controller) | |
| 313 | + |
| 314 | +### Important Annotations and Labels |
| 315 | + |
| 316 | +Defined as constants in `pkg/operator/controller/names.go`: |
| 317 | + |
| 318 | +| Constant | Value | Purpose | |
| 319 | +|----------|-------|---------| |
| 320 | +| `IngressOperatorOwnedAnnotation` | `ingress.operator.openshift.io/owned` | Marks a resource as owned by the ingress operator (used on subscriptions) | |
| 321 | +| `ControllerDeploymentLabel` | `ingresscontroller.operator.openshift.io/deployment-ingresscontroller` | Identifies a deployment as an ingress controller; value is the IC name | |
| 322 | +| `ControllerDeploymentHashLabel` | `ingresscontroller.operator.openshift.io/hash` | Identifies an ingress controller deployment's generation (used for affinity/anti-affinity) | |
| 323 | +| `CanaryDaemonSetLabel` | `ingresscanary.operator.openshift.io/daemonset-ingresscanary` | Identifies a daemonset as an ingress canary daemonset; value is the owning canary controller name | |
| 324 | + |
| 325 | +### Feature Gates |
| 326 | + |
| 327 | +Controllers check these feature gates (from `github.com/openshift/api/features`): |
| 328 | +- `features.FeatureGateGatewayAPI` — Gateway API support |
| 329 | +- `features.FeatureGateGatewayAPIController` — Gateway API controller |
| 330 | +- `features.FeatureGateAzureWorkloadIdentity` — Azure workload identity |
| 331 | +- `features.FeatureGateIngressControllerDynamicConfigurationManager` — Dynamic configuration management |
| 332 | +- `features.FeatureGateRouteExternalCertificate` — External route certificates (being removed) |
| 333 | + |
| 334 | +### Error Handling |
| 335 | + |
| 336 | +- Return errors with context: `fmt.Errorf("failed to create deployment: %w", err)` |
| 337 | +- Use `%w` for wrapped error values to allow `errors.Is`/`errors.As` unwrapping |
| 338 | +- Aggregate errors when multiple operations can fail independently |
| 339 | +- Use `k8s.io/apimachinery/pkg/util/errors` for error aggregation |
| 340 | + |
| 341 | +### Logging |
| 342 | + |
| 343 | +- Use structured logging via `go-logr/logr` |
| 344 | +- Include relevant context (namespace, name, resource type) |
| 345 | +- Follow [Kubernetes logging conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md#message-style-guidelines) |
| 346 | + |
| 347 | +### Formatting |
| 348 | + |
| 349 | +Go formatting is enforced: |
| 350 | + |
| 351 | +```bash |
| 352 | +hack/verify-gofmt.sh |
| 353 | +``` |
| 354 | + |
| 355 | +## Distribution Methods |
| 356 | + |
| 357 | +### Container Images |
| 358 | + |
| 359 | +| Dockerfile | Description | |
| 360 | +|------------|-------------| |
| 361 | +| `Dockerfile` | Default build | |
| 362 | +| `Dockerfile.rhel7` | RHEL 7 variant (outdated, may be removed) | |
| 363 | +| `Dockerfile.ubi` | UBI (Universal Base Image) variant | |
| 364 | + |
| 365 | +### Deployment |
| 366 | + |
| 367 | +- Deployed as part of OpenShift installation |
| 368 | +- Runs in `openshift-ingress-operator` namespace |
| 369 | +- Managed by Cluster Version Operator (CVO) |
| 370 | + |
| 371 | +## Contribution Conventions |
| 372 | + |
| 373 | +- Commit messages should reference the Jira ticket: `NE-XXXX: description` |
| 374 | +- PRs should have logical, atomic commits |
| 375 | +- Test coverage is expected for new features and bug fixes |
0 commit comments