Skip to content

Commit 1aec6b8

Browse files
Merge pull request #1341 from Thealisyed/addingagentmd
NE-2390: Adding AGENTS.md file
2 parents f40071e + 28f66b0 commit 1aec6b8

1 file changed

Lines changed: 375 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 375 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,375 @@
1+
# AGENTS.md
2+
3+
This document provides guidance for AI coding agents working with the cluster-ingress-operator repository.
4+
5+
## 1. SYSTEM CONTEXT & TOPOLOGY
6+
* **Domain:** North-South network traffic management (Layer 7 & Layer 4).
7+
* **Prerequisites:** CNI (Cluster Network Operator), Pod IP routing, internal CoreDNS.
8+
* **Dual Architecture Mode:**
9+
1. **Legacy/Primary:** HAProxy orchestration via `openshift3/ose-haproxy-router`.
10+
2. **Modern/GWAPI:** Gateway API via OpenShift Service Mesh (OSSM / Istio / Envoy).
11+
12+
## 2. STRICT NAMESPACE BOUNDARIES
13+
* **`openshift-ingress-operator`**: Execution environment for operator logic, control loops, and metrics endpoints. **DO NOT** deploy data-plane operands here.
14+
* **`openshift-ingress`**: Execution environment for all data-plane workloads. HAProxy pods, `haproxy.cfg` ConfigMaps, TLS secrets, Envoy proxies, and Services **MUST** be strictly confined to this namespace.
15+
16+
## 3. CUSTOM RESOURCE DEFINITION (CRD) MATRIX
17+
18+
### Core Configuration
19+
* **`Ingress`** (`config.openshift.io/v1`, Cluster-scoped): Defines cluster-wide routing defaults and toggles.
20+
* **`IngressController`** (`operator.openshift.io/v1`, Namespace-scoped): Primary interface for legacy HAProxy. Controls replicas, deployment affinity, and endpoint publishing.
21+
* **`DNSRecord`** (`ingress.operator.openshift.io/v1`, Namespace-scoped): Internal abstraction for cloud DNS manipulation.
22+
23+
### Gateway API (GWAPI)
24+
* **`GatewayClass`**: Trigger resource. Detecting `openshift.io/gateway-controller/v1` initiates OSSM deployment.
25+
* **`Gateway`**: Defines external proxy instances / ports. Mapped to Envoy proxies.
26+
* **`HTTPRoute`**: Advanced routing, traffic weighting, and header matching to backend services.
27+
28+
## 4. DIRECTORY TOPOGRAPHY
29+
* `pkg/operator/controller/ingress/`: Legacy core. Reconciles `IngressController` and manages HAProxy deployment, load balancer services, and status loops.
30+
* `pkg/operator/controller/dns/`: Fulfills `DNSRecord` resources. Interfaces with AWS/GCP/Azure SDKs.
31+
* `pkg/operator/controller/gatewayapi/`: GWAPI meta-controller. Manages CRD lifecycles and launches dependent watchers.
32+
* `pkg/operator/controller/gatewayclass/`: Provisions OSSM and generates the `ServiceMeshControlPlane`.
33+
* `pkg/operator/controller/certificate/`: Manages automated certificate rotation and expiry.
34+
35+
## 5. TACTICAL DIRECTIVES & CONSTRAINTS
36+
* **RULE 1: No Silent Failures.** If a cloud API times out or a version conflict occurs, set the expected error condition in `.status.conditions` (`expectedCondition` in `status.go`) and mark the operator as `Degraded`.
37+
* **RULE 2: OSSM Meta-Management for GWAPI.** The operator does not write Envoy configs directly. It manages the OpenShift Service Mesh Operator by generating a `ServiceMeshControlPlane` (SMCP) resource.
38+
* **RULE 3: Decouple DNS for GWAPI.** Directly generate `DNSRecord` objects for GWAPI endpoints; do not tie external endpoints for Envoy to the legacy HAProxy `IngressController`.
39+
* **RULE 4: Finalizer Management.** Explicitly handle the removal of finalizers (e.g., `ingress.openshift.io/operator`) during deletion workflows in `load_balancer_service.go` to prevent infinite `Terminating` states.
40+
* **RULE 5: Cloud Credential Operator (CCO) Awareness.** Gracefully degrade and report HTTP 403 Forbidden errors if assumed roles lack permissions in "manual mode" where dynamic IAM credentials are disabled.
41+
* **RULE 6: Immutable HAProxy Template.** Do not inject arbitrary, unsupported HAProxy directives into the base router template.
42+
43+
## 6. EXPLICIT NON-GOALS
44+
* Managing underlying cloud infrastructure subnets or legacy security groups.
45+
* Fixing application-level protocol downgrade failures (e.g., HTTP/2 to WebSocket).
46+
* Generating default TLS certificates for Envoy proxies.
47+
48+
## Project Structure and Repository Layout
49+
50+
```
51+
cluster-ingress-operator/
52+
├── cmd/ingress-operator/ # Main entry point
53+
│ ├── main.go # CLI commands (start, render)
54+
│ ├── render.go # Manifest rendering command
55+
│ └── start.go # Operator startup and metrics registration
56+
├── pkg/
57+
│ ├── operator/
58+
│ │ ├── operator.go # Controller registration
59+
│ │ ├── controller/ # Reconciliation controllers (sub-packages per controller)
60+
│ │ ├── client/ # Kubernetes client setup with custom schemes
61+
│ │ └── config/ # Operator configuration structure
62+
│ ├── dns/ # DNS provider implementations
63+
│ │ ├── aws/ # AWS Route 53
64+
│ │ ├── azure/ # Azure DNS
65+
│ │ ├── gcp/ # Google Cloud DNS
66+
│ │ ├── ibm/ # IBM Cloud DNS (public and private)
67+
│ │ └── split/ # Meta-provider routing between public/private
68+
│ └── manifests/ # Kubernetes object manifests used by controllers
69+
├── manifests/ # CVO manifests (CRDs, RBAC, monitoring) — instantiated by CVO, not used by operator directly
70+
├── test/
71+
│ └── e2e/ # End-to-end integration tests
72+
├── hack/ # Development and CI scripts
73+
├── Makefile # Build automation
74+
└── HACKING.md # Developer documentation
75+
```
76+
77+
`pkg/manifests/` contains asset loading utilities (`manifests.go`) that bind Go templates to Kubernetes objects for controller use.
78+
79+
## Feature Development
80+
81+
### Adding a New Controller
82+
83+
Controllers follow this pattern:
84+
85+
1. Create a package in `pkg/operator/controller/<name>/`
86+
2. Define a `reconciler` struct (for example — not all controllers need all fields; customize to required fields):
87+
```go
88+
type reconciler struct {
89+
client client.Client
90+
recorder record.EventRecorder
91+
cache cache.Cache
92+
operatorNamespace string
93+
operandNamespace string
94+
}
95+
```
96+
3. Implement `New()` factory function to create controller and register watches
97+
4. Implement `Reconcile()` method with idempotent `ensure*()` functions. Controllers delegate logic to `ensure<Resource>` methods that handle creation/update of specific resources (e.g., `ensureIngressController`, `ensureIngressDeleted`).
98+
5. Register the controller in `pkg/operator/operator.go`. Metrics (if any) are registered in `cmd/ingress-operator/start.go`.
99+
100+
See `pkg/operator/controller/ingress/controller_test.go` as a reference for controller test patterns.
101+
102+
### Existing Controllers
103+
104+
Located in `pkg/operator/controller/`:
105+
106+
| Controller | Purpose |
107+
|------------|---------|
108+
| `ingress` | Main controller for IngressController resources |
109+
| `canary` | Health check canary for ingress controllers |
110+
| `certificate` | TLS certificate management |
111+
| `certificate-publisher` | Publishes router certs to openshift-config-managed |
112+
| `clientca-configmap` | Syncs client CA configmaps between namespaces |
113+
| `configurable-route` | Manages custom route configuration |
114+
| `crl` | Certificate Revocation List management (deprecated since 4.14, pending removal — NE-2491) |
115+
| `dns` | DNS record management |
116+
| `gatewayapi` | Gateway API CRD management |
117+
| `gatewayclass` | Istio/OSSM installation for Gateway API |
118+
| `gateway-labeler` | Labels Gateway resources |
119+
| `gateway-service-dns` | DNS for Gateway services |
120+
| `ingressclass` | IngressClass resource management |
121+
| `monitoring-dashboard` | Monitoring dashboard creation |
122+
| `route-metrics` | Route metrics collection |
123+
| `status` | ClusterOperator status management |
124+
| `sync-http-error-code-configmap` | HTTP error code page sync |
125+
126+
### DNS Providers
127+
128+
Located in `pkg/dns/`:
129+
130+
| Provider | Description |
131+
|----------|-------------|
132+
| `aws` | AWS Route 53 DNS |
133+
| `azure` | Azure DNS (with workload identity support) |
134+
| `gcp` | Google Cloud DNS |
135+
| `ibm` | IBM Cloud DNS (public CIS and private DNS Services) |
136+
| `split` | Meta-provider routing between public/private providers |
137+
| `(fake)` | No-op provider for testing (defined in `pkg/dns/dns.go`) |
138+
139+
DNS providers implement the `dns.Provider` interface:
140+
- `Ensure(record, zone)` - Create or update DNS record
141+
- `Delete(record, zone)` - Remove DNS record
142+
- `Replace(record, zone)` - Replace existing record
143+
144+
## Building
145+
146+
```bash
147+
make build # Build the operator binary (depends on generate)
148+
```
149+
150+
- Uses vendored dependencies (`-mod=vendor`)
151+
- Requires `CGO_ENABLED=1`
152+
153+
## Running
154+
155+
### Prerequisites
156+
157+
- An OpenShift cluster
158+
- Admin-scoped `KUBECONFIG`
159+
160+
### Local Execution
161+
162+
```bash
163+
make run-local # Run operator locally
164+
ENABLE_CANARY=true make run-local # With canary enabled
165+
```
166+
167+
### Remote Deployment
168+
169+
See [HACKING.md](HACKING.md) for:
170+
- Building and deploying to cluster (`make release-local`)
171+
- Remote builds on cluster (`make buildconfig`, `make cluster-build`)
172+
173+
## Tests
174+
175+
### Running Tests
176+
177+
```bash
178+
make test # Run unit tests
179+
make test-e2e # Run all e2e tests
180+
make test-e2e TEST="^TestRouter$" # Run specific test
181+
make test-e2e-list # List available tests
182+
make gatewayapi-conformance # Gateway API conformance tests
183+
```
184+
185+
### Test Framework
186+
187+
- Standard Go testing package
188+
- `github.com/stretchr/testify/assert` for assertions (e.g., `pkg/dns/aws/dns_test.go`)
189+
- `google/go-cmp` for deep comparisons
190+
191+
### Test Patterns
192+
193+
- **Table-driven tests**: Use for testing multiple scenarios
194+
- **Subtests**: Use `t.Run()` with descriptive names for nested tests
195+
- **Test naming conventions**:
196+
- `Test_foo` — general test for function `foo`
197+
- `TestFooFunctionality` — test for specific functionality in `foo`
198+
199+
### Test Organization
200+
201+
- **Unit tests**: Alongside source files as `*_test.go`
202+
- **E2E tests**: In `test/e2e/` with build tag `// +build e2e`
203+
- **Parallel tests** (~90): Run concurrently, independent of each other
204+
- **Serial tests** (~50): Run sequentially, modify cluster-wide resources
205+
206+
### Assertions
207+
208+
```go
209+
// Use testify/assert for assertions
210+
assert.NoError(t, err, "failed to create resource")
211+
212+
// Use google/go-cmp for deep comparisons
213+
if diff := cmp.Diff(expected, actual); diff != "" {
214+
t.Errorf("mismatch (-want +got):\n%s", diff)
215+
}
216+
```
217+
218+
### Test Helpers
219+
220+
**E2E Utilities** (`test/e2e/util_test.go`):
221+
222+
| Helper | Purpose |
223+
|--------|---------|
224+
| `buildEchoPod()` | Creates socat-based echo server pod |
225+
| `buildCurlPod()` | Creates curl pod for HTTP testing |
226+
| `waitForHTTPClientCondition()` | Polls HTTP endpoint with retry |
227+
228+
**Operator Test Helpers** (`test/e2e/operator_test.go`):
229+
230+
| Helper | Purpose |
231+
|--------|---------|
232+
| `waitForIngressControllerCondition()` | Poll for expected conditions |
233+
| `waitForDeploymentComplete()` | Wait for deployment rollout |
234+
| `waitForAvailableReplicas()` | Wait for replica count |
235+
| `waitForClusterOperatorConditions()` | Poll ClusterOperator status |
236+
| `deleteIngressController()` | Clean up with timeout |
237+
238+
## Linting
239+
240+
```bash
241+
make verify
242+
```
243+
244+
Runs verification scripts:
245+
- `hack/verify-gofmt.sh` - gofmt
246+
- `hack/verify-generated-crd.sh` - Verifies CRDs under `manifests/` match CRDs under `vendor/`
247+
- `hack/verify-profile-manifests.sh` - Verifies profile-specific manifests (e.g., `02-deployment.yaml` for `ibm-cloud-managed`) are up to date
248+
- `hack/verify-deps.sh` - Verifies `go mod` vendoring is up to date (`go mod vendor` / `go mod tidy`)
249+
250+
## Additional Makefile Targets
251+
252+
| Target | Description |
253+
|--------|-------------|
254+
| `make generate` | Update embedded manifests (operator namespace, ingresscontrollers CRD) used by `ingress-operator render` |
255+
| `make crd` | Generate CRD YAML files |
256+
| `make release-local` | Build image and deployment manifests |
257+
| `make uninstall` | Remove operator from cluster |
258+
| `make buildconfig` | Create OpenShift BuildConfig for remote builds |
259+
| `make cluster-build` | Trigger remote cluster build |
260+
| `make clean` | Remove binaries and generated files |
261+
262+
## Dependencies
263+
264+
Dependencies are vendored. After modifying `go.mod`:
265+
266+
```bash
267+
go mod tidy
268+
go mod vendor
269+
```
270+
271+
Dependencies:
272+
- `sigs.k8s.io/controller-runtime`
273+
- `k8s.io/client-go`
274+
- `sigs.k8s.io/gateway-api`
275+
- `github.com/openshift/api`
276+
277+
## Coding Style
278+
279+
### Go Version
280+
281+
Go version is specified in `go.mod`.
282+
283+
### Code Organization
284+
285+
- Controllers in `pkg/operator/controller/<name>/`
286+
- Each controller has `controller.go`, functional files (e.g., `deployment.go`), and corresponding `*_test.go` files
287+
288+
### Namespace Constants
289+
290+
Defined in `pkg/operator/controller/`:
291+
292+
```go
293+
DefaultOperatorNamespace = "openshift-ingress-operator"
294+
DefaultOperandNamespace = "openshift-ingress"
295+
DefaultCanaryNamespace = "openshift-ingress-canary"
296+
GlobalMachineSpecifiedConfigNamespace = "openshift-config-managed"
297+
GlobalUserSpecifiedConfigNamespace = "openshift-config"
298+
```
299+
300+
### Naming Functions
301+
302+
Defined in `pkg/operator/controller/names.go`:
303+
304+
| Function | Returns |
305+
|----------|---------|
306+
| `RouterDeploymentName(ic)` | `router-<name>` in openshift-ingress |
307+
| `LoadBalancerServiceName(ic)` | `router-<name>` service |
308+
| `NodePortServiceName(ic)` | `router-nodeport-<name>` service |
309+
| `IngressClassName(name)` | `openshift-<name>` IngressClass |
310+
| `CanaryDaemonSetName()` | Canary daemonset name |
311+
| `ClientCAConfigMapName(ic)` | `router-client-ca-<name>` |
312+
| `CRLConfigMapName(ic)` | `router-client-ca-crl-<name>` (deprecated — see crl controller) |
313+
314+
### Important Annotations and Labels
315+
316+
Defined as constants in `pkg/operator/controller/names.go`:
317+
318+
| Constant | Value | Purpose |
319+
|----------|-------|---------|
320+
| `IngressOperatorOwnedAnnotation` | `ingress.operator.openshift.io/owned` | Marks a resource as owned by the ingress operator (used on subscriptions) |
321+
| `ControllerDeploymentLabel` | `ingresscontroller.operator.openshift.io/deployment-ingresscontroller` | Identifies a deployment as an ingress controller; value is the IC name |
322+
| `ControllerDeploymentHashLabel` | `ingresscontroller.operator.openshift.io/hash` | Identifies an ingress controller deployment's generation (used for affinity/anti-affinity) |
323+
| `CanaryDaemonSetLabel` | `ingresscanary.operator.openshift.io/daemonset-ingresscanary` | Identifies a daemonset as an ingress canary daemonset; value is the owning canary controller name |
324+
325+
### Feature Gates
326+
327+
Controllers check these feature gates (from `github.com/openshift/api/features`):
328+
- `features.FeatureGateGatewayAPI` — Gateway API support
329+
- `features.FeatureGateGatewayAPIController` — Gateway API controller
330+
- `features.FeatureGateAzureWorkloadIdentity` — Azure workload identity
331+
- `features.FeatureGateIngressControllerDynamicConfigurationManager` — Dynamic configuration management
332+
- `features.FeatureGateRouteExternalCertificate` — External route certificates (being removed)
333+
334+
### Error Handling
335+
336+
- Return errors with context: `fmt.Errorf("failed to create deployment: %w", err)`
337+
- Use `%w` for wrapped error values to allow `errors.Is`/`errors.As` unwrapping
338+
- Aggregate errors when multiple operations can fail independently
339+
- Use `k8s.io/apimachinery/pkg/util/errors` for error aggregation
340+
341+
### Logging
342+
343+
- Use structured logging via `go-logr/logr`
344+
- Include relevant context (namespace, name, resource type)
345+
- Follow [Kubernetes logging conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md#message-style-guidelines)
346+
347+
### Formatting
348+
349+
Go formatting is enforced:
350+
351+
```bash
352+
hack/verify-gofmt.sh
353+
```
354+
355+
## Distribution Methods
356+
357+
### Container Images
358+
359+
| Dockerfile | Description |
360+
|------------|-------------|
361+
| `Dockerfile` | Default build |
362+
| `Dockerfile.rhel7` | RHEL 7 variant (outdated, may be removed) |
363+
| `Dockerfile.ubi` | UBI (Universal Base Image) variant |
364+
365+
### Deployment
366+
367+
- Deployed as part of OpenShift installation
368+
- Runs in `openshift-ingress-operator` namespace
369+
- Managed by Cluster Version Operator (CVO)
370+
371+
## Contribution Conventions
372+
373+
- Commit messages should reference the Jira ticket: `NE-XXXX: description`
374+
- PRs should have logical, atomic commits
375+
- Test coverage is expected for new features and bug fixes

0 commit comments

Comments
 (0)