Add e2e tests for manual Gateway API service creation#1413
Conversation
Validate that users can pre-create a Kubernetes Service before a Gateway and that Istio adopts it without modification. This enables use cases like externalTrafficPolicy: Local and ClusterIP service types that are not safely configurable via Istio's ConfigMap patching mechanism. Tests cover LoadBalancer and ClusterIP service types, negative port regression guards (omitted ports are never added by Istio), listener addition immutability, duplicate service prevention, and managed label toggle behavior. Includes .GCC/ directory with the architectural plan, decision records, and agent reasoning logs used during development. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an ADR, test plan, and work log, plus new E2E tests and helpers to validate Istio’s manual service creation workflow for Gateway API. Registers two subtests: a pre-created LoadBalancer (with externalTrafficPolicy=Local) and a pre-created ClusterIP. Tests assert Istio does not mutate Service fields (type, externalTrafficPolicy, selector, ports) or create duplicate Services across Gateway creation, listener updates, and a managed-label toggle (LoadBalancer only). Adds ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
test/e2e/util_gatewayapi_test.go (1)
1569-1579: Use the operator’s canonical gateway label contract here.Line 1570, Line 1571, and Line 1577 re-encode the service adoption contract with string literals even though
pkg/operator/controller/names.go:82-91already defines the label keys, and the suite already derives the managed value fromoperatorcontroller.OpenShiftGatewayClassControllerNameintest/e2e/gateway_api_test.go:1274-1275. Reusing those avoids helper drift if the controller naming changes.♻️ Suggested refactor
func createGatewayService( t *testing.T, gatewayName string, gatewayClass string, namespace string, svctype corev1.ServiceType, ports []corev1.ServicePort, ) (*corev1.Service, error) { t.Helper() + managedLabelValue := strings.ReplaceAll(operatorcontroller.OpenShiftGatewayClassControllerName, "/", "-") svcDefinition := &corev1.Service{ ObjectMeta: metav1.ObjectMeta{ Name: fmt.Sprintf("%s-%s", gatewayName, gatewayClass), Namespace: namespace, Labels: map[string]string{ - "gateway.networking.k8s.io/gateway-name": gatewayName, - "gateway.istio.io/managed": "openshift.io-gateway-controller-v1", + operatorcontroller.GatewayNameLabelKey: gatewayName, + operatorcontroller.ManagedByIstioLabelKey: managedLabelValue, }, }, Spec: corev1.ServiceSpec{ Type: svctype, Ports: ports, Selector: map[string]string{ - "gateway.networking.k8s.io/gateway-name": gatewayName, + operatorcontroller.GatewayNameLabelKey: gatewayName, }, }, }As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/util_gatewayapi_test.go` around lines 1569 - 1579, Replace the hard-coded gateway label keys and managed value in the Service metadata/selector in util_gatewayapi_test.go with the operator's canonical constants from pkg/operator/controller/names.go and the existing controller name constant; specifically, stop using the string literals "gateway.networking.k8s.io/gateway-name" and "gateway.istio.io/managed" and instead reference the label key constants defined in that names.go file, and use operatorcontroller.OpenShiftGatewayClassControllerName for the managed label value so the service labels and selector remain consistent with the operator contract.test/e2e/gateway_api_test.go (1)
1596-1640: Re-check single-service ownership after the later reconciliations.After Line 1597, Line 1620, Line 1630, and Line 1761, the tests only
Getthe original Service again. They would still pass if Istio left that object untouched but created a second adopted Service during the listener update or managed-label toggle. Re-running the existing label-basedList+Len == 1assertion after each trigger would close that gap.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
Also applies to: 1760-1774
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/gateway_api_test.go` around lines 1596 - 1640, After each reconciliation trigger (after the Step 7 re-fetch, after removing the managed label, after re-adding the managed label, and after the final re-fetch) re-run the label-based kclient.List and assert exactly one Service exists to prevent a second adopted Service from being created; specifically, call kclient.List(ctx, &svcList, client.InNamespace(svc.Namespace), client.MatchingLabels(expectedSelector)) and use require.Len(t, svcList.Items, 1, "...") (referencing kclient, svc, svcList and expectedSelector from the diff) immediately after the existing Get/Update/Time.Sleep blocks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/gateway_api_test.go`:
- Around line 1524-1529: The Gateway listener with Name "https" is invalid
because it sets Protocol: "TLS" without a TLS block; either change that
listener's Protocol from "TLS" to "HTTP" if you only need an extra non-TLS
listener, or add a minimal TLS config (populate the TLS: field with required
fields such as Mode and a CertificateRef) to make it a valid HTTPS/TLS listener.
Update the same change in the other test function mentioned
(testGatewayAPIManualClusterIPService) so both listeners are valid Gateway API
listeners.
---
Nitpick comments:
In `@test/e2e/gateway_api_test.go`:
- Around line 1596-1640: After each reconciliation trigger (after the Step 7
re-fetch, after removing the managed label, after re-adding the managed label,
and after the final re-fetch) re-run the label-based kclient.List and assert
exactly one Service exists to prevent a second adopted Service from being
created; specifically, call kclient.List(ctx, &svcList,
client.InNamespace(svc.Namespace), client.MatchingLabels(expectedSelector)) and
use require.Len(t, svcList.Items, 1, "...") (referencing kclient, svc, svcList
and expectedSelector from the diff) immediately after the existing
Get/Update/Time.Sleep blocks.
In `@test/e2e/util_gatewayapi_test.go`:
- Around line 1569-1579: Replace the hard-coded gateway label keys and managed
value in the Service metadata/selector in util_gatewayapi_test.go with the
operator's canonical constants from pkg/operator/controller/names.go and the
existing controller name constant; specifically, stop using the string literals
"gateway.networking.k8s.io/gateway-name" and "gateway.istio.io/managed" and
instead reference the label key constants defined in that names.go file, and use
operatorcontroller.OpenShiftGatewayClassControllerName for the managed label
value so the service labels and selector remain consistent with the operator
contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 0bab70a4-c921-411d-98c1-1b06a0d26fff
📒 Files selected for processing (5)
.GCC/branches/manual-service-creation/decisions/001-manual-service-immutability.md.GCC/branches/manual-service-creation/log.md.GCC/branches/manual-service-creation/plan.mdtest/e2e/gateway_api_test.gotest/e2e/util_gatewayapi_test.go
Gateway API v1 requires a TLS configuration block for listeners with Protocol "TLS" or "HTTPS". Since the 8443 listener exists solely as a negative port regression guard (verifying Istio does not add omitted ports), using HTTP protocol is sufficient and avoids needing TLS certificate setup that is orthogonal to what is being tested. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/e2e/gateway_api_test.go (1)
1479-1642: Consider collapsing these into one helper/table-driven flow.These two tests share most of their setup and assertions, and the checks are already drifting. A shared helper for “create manual service → create gateway → mutate gateway → assert immutability” would make future invariant changes much easier to keep aligned.
Also applies to: 1648-1775
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/gateway_api_test.go` around lines 1479 - 1642, The test testGatewayAPIManualLBService and the similar test at 1648-1775 duplicate setup and assertions; extract a reusable helper (e.g., runManualLBImmutabilityTest) that encapsulates "createGatewayClass/createGatewayService → create Gateway → assertGatewaySuccessful → verify service immutability → updateGatewaySpecWithRetry → re-verify → toggle managed label" and then convert the two tests into a table-driven call into that helper with parameters for gatewayName, initial service ports, and listeners to add; reference createGatewayService, createGatewayClass, assertGatewaySuccessful, updateGatewaySpecWithRetry, and the service label logic when moving assertions into the helper so all checks (service type, ExternalTrafficPolicy, selector, ports, duplicate service list, and label toggle) are performed consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/gateway_api_test.go`:
- Around line 1564-1580: Post-mutation assertions are too weak: after each
mutation re-list services via kclient.List filtering by
client.MatchingLabels{"gateway.networking.k8s.io/gateway-name": gatewayName} and
client.InNamespace(operatorcontroller.DefaultOperandNamespace) (same as the
earlier list) and assert services.Items length remains 1, then fetch the Service
by svcKey (or from services.Items[0]) and compare its Spec.Ports against the
canonical port contract returned by defaultManualServicePorts() (compare port
numbers, names and targetPort where applicable), ignoring any NodePort values
when the Service type is LoadBalancer; apply this re-list+compare pattern
anywhere similar (e.g., around svcKey checks at the ranges you noted) to ensure
no duplicate Services or port rewrites slip through.
- Around line 1596-1597: Replace the fixed time.Sleep(10 * time.Second) calls in
gateway_api_test.go with condition-based polling: create a small helper (e.g.,
waitForGatewayStable or eventuallyAssertNoChange) that repeatedly checks the
observable invariant you care about (resource status, Istio configuration, or
the specific assertion currently checked after the sleep) and returns success
once the check passes, using a bounded timeout (e.g., wait.PollImmediate or
require.Eventually) and failing if the timeout elapses; then swap the four
occurrences of time.Sleep(10 * time.Second) with calls to that helper so tests
wait deterministically for reconciliation rather than sleeping.
---
Nitpick comments:
In `@test/e2e/gateway_api_test.go`:
- Around line 1479-1642: The test testGatewayAPIManualLBService and the similar
test at 1648-1775 duplicate setup and assertions; extract a reusable helper
(e.g., runManualLBImmutabilityTest) that encapsulates
"createGatewayClass/createGatewayService → create Gateway →
assertGatewaySuccessful → verify service immutability →
updateGatewaySpecWithRetry → re-verify → toggle managed label" and then convert
the two tests into a table-driven call into that helper with parameters for
gatewayName, initial service ports, and listeners to add; reference
createGatewayService, createGatewayClass, assertGatewaySuccessful,
updateGatewaySpecWithRetry, and the service label logic when moving assertions
into the helper so all checks (service type, ExternalTrafficPolicy, selector,
ports, duplicate service list, and label toggle) are performed consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 812cd200-f32f-435b-a853-ed98de22dd1e
📒 Files selected for processing (1)
test/e2e/gateway_api_test.go
Re-list services by gateway-name label after adding a listener and after the managed label toggle to verify Istio did not create a duplicate service in response to these mutations. Previously, the duplicate service check was only done once before mutations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
test/e2e/gateway_api_test.go (2)
1564-1570:⚠️ Potential issue | 🟠 MajorThese assertions still don't prove the manual Service stayed intact.
len==2plus “not 8443/8080” still passes if Istio rewrites the surviving80/15021ports (or their names/targetPorts) to a different 2-port shape, and re-fetching by name also misses delete/recreate with the same name. Compare againstdefaultManualServicePorts()after each mutation and assert the original Service UID stays unchanged.Also applies to: 1603-1609, 1645-1648, 1746-1752, 1785-1790
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/gateway_api_test.go` around lines 1564 - 1570, Current checks only verify count and absence of port 8443 but don't ensure the original Service shape or identity remained; update the test to (1) capture the original Service UID and ports (svc.ObjectMeta.UID and svc.Spec.Ports) before mutation, (2) after each mutation re-fetch the Service by name and assert the UID is unchanged and that the re-fetched svc.Spec.Ports exactly equals defaultManualServicePorts(), and (3) replace the len/NotEqual loops in the blocks around svc.Spec.Ports with these UID and full-port-equality assertions (use defaultManualServicePorts() and svc.ObjectMeta.UID as the identifying symbols).
1596-1597:⚠️ Potential issue | 🟠 MajorReplace the fixed sleeps with invariant-based polling.
These four sleeps add 40s of deterministic latency and still don't prove reconciliation happened. Poll until the single-Service / unchanged-spec assertions pass, then fail on timeout.
Also applies to: 1629-1630, 1639-1639, 1778-1779
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/gateway_api_test.go` around lines 1596 - 1597, Replace the fixed time.Sleep(10 * time.Second) calls in the test (the "Step 7: Wait briefly for any Istio reconciliation" block and the other occurrences around the single-Service / unchanged-spec assertions) with invariant-based polling: repeatedly re-fetch the Service/Resource and run the same assertions (the "single-Service" and "unchanged-spec" checks) inside a retry loop (e.g., wait.PollImmediate or a for loop with time.Sleep small intervals) until they pass or a configurable timeout elapses, and fail the test if the timeout is reached; apply the same pattern to the other instances referenced in the comment so the test waits for the assertion to become true instead of sleeping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/gateway_api_test.go`:
- Around line 1625-1639: After deleting the "gateway.istio.io/managed" label
from svc, fetch the Service (use kclient.Get with svcKey into svc) and assert
that svc.Labels does not contain "gateway.istio.io/managed" and that immutable
parts of the Service contract (e.g., svc.Spec.ClusterIP, svc.Spec.Ports,
svc.Spec.Type or other fields your tests rely on) are unchanged before you
re-add the label; if any assertion fails, call t.Fatalf with a clear message so
the test fails early instead of masking changes when you later re-add the label
via kclient.Update.
---
Duplicate comments:
In `@test/e2e/gateway_api_test.go`:
- Around line 1564-1570: Current checks only verify count and absence of port
8443 but don't ensure the original Service shape or identity remained; update
the test to (1) capture the original Service UID and ports (svc.ObjectMeta.UID
and svc.Spec.Ports) before mutation, (2) after each mutation re-fetch the
Service by name and assert the UID is unchanged and that the re-fetched
svc.Spec.Ports exactly equals defaultManualServicePorts(), and (3) replace the
len/NotEqual loops in the blocks around svc.Spec.Ports with these UID and
full-port-equality assertions (use defaultManualServicePorts() and
svc.ObjectMeta.UID as the identifying symbols).
- Around line 1596-1597: Replace the fixed time.Sleep(10 * time.Second) calls in
the test (the "Step 7: Wait briefly for any Istio reconciliation" block and the
other occurrences around the single-Service / unchanged-spec assertions) with
invariant-based polling: repeatedly re-fetch the Service/Resource and run the
same assertions (the "single-Service" and "unchanged-spec" checks) inside a
retry loop (e.g., wait.PollImmediate or a for loop with time.Sleep small
intervals) until they pass or a configurable timeout elapses, and fail the test
if the timeout is reached; apply the same pattern to the other instances
referenced in the comment so the test waits for the assertion to become true
instead of sleeping.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 4b839882-3a24-4d4c-bfcd-dfcd290086c9
📒 Files selected for processing (1)
test/e2e/gateway_api_test.go
After removing the gateway.istio.io/managed label, verify the label is actually absent and assert that service type, externalTrafficPolicy, selector, and port count remain unchanged before re-adding the label. This prevents masking any Istio mutations that occur during the label-absent window. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@coderabbitai do you have any other critical or major finding? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me inspect the actual code changes to give you an accurate assessment. 🏁 Script executed: Length of output: 42261
🔴 Critical1. 2. Missing 🟠 Major3. 4. 5. 🟡 Minor (worth noting)6. Silent spec mismatch in |
Wrap the managed label removal and re-addition in retry loops (Get-mutate-Update with PollUntilContextTimeout) to handle 409 Conflict errors from concurrent reconciliation, matching the updateGatewaySpecWithRetry pattern used elsewhere. Also add missing selector assertion in the ClusterIP test after listener addition, and fix the AlreadyExists warning log to show both requested and existing service values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@coderabbitai do you have any other critical or major finding? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me do a deeper pass on the code to check for anything missed. 🏁 Script executed: Length of output: 31076 🏁 Script executed: Length of output: 32279
🟠 New Major7. if err := wait.PollUntilContextTimeout(context.TODO(), 2*time.Second, timeout, true, func(ctx context.Context) (bool, error) {If the API server is temporarily unreachable and the loop keeps retrying non- 🟡 New Notable (worth awareness, not blocking)8. Hardcoded Gateway names with no 9. Service and Gateway created directly in the live operator namespace To summarize: items 1–6 from my previous message remain the more impactful findings. Of the three new ones, #7 ( |
|
@coderabbitai but where the previous concerns fixed? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me check the current state of the code to answer that. 🏁 Script executed: Length of output: 14523
✅ Confirmed Fixed
🟡 Acknowledged via comment, not code-guarded
❌ Not Fixed
❓ Unclear (further inspection needed)
Bottom line: Of the 9 findings, only 1 was fully fixed in code ( |
|
@coderabbitai commenting on your findings:
|
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 265189
Looking at the current code, the
This is exactly the right approach for negative reconciliation too. The key insight is: use Istio's own status transitions as a synchronization barrier. When .GCC/ directory Understood, will wait for the follow-up. Agreed it should be removed.
You're right — I apologize for the incorrect previous assertion. It is called at Line 1747. My original concern about whether a ClusterIP-backed Gateway ever reaches
You're correct — |
|
@coderabbitai so any concerning finding on my PR, being major or critical? I dont care right now about minors and nitpicks |
|
Tip For best results, initiate chat on the files or code changes.
🟠
|
|
@coderabbitai I can confirm that Istio just sets Programmed=True in case there is a port alignment between Gateway listeners and the service, and in fact we verify it by later adding a non aligned port and checking it goes to False, and then adding the missing port and the Gateway status moving to true |
|
Tip For best results, initiate chat on the files or code changes.
With that confirmed, there are no remaining major or critical concerns in this PR. The status of all findings:
The PR looks good to merge once your team reviews the |
|
@coderabbitai based on this PR test, and given we want to claim a supported workflow for users, write a comment on how a user willing to customize their Gateway API service would follow the same approach:
Show the manifest for service+gateway, explain the caveats (eg.: need of sync between listeners and services ports and what happens when they are out of sync). Consider that this will be later added to user docs on how to pre provision a manual service for a gateway |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me pull the exact label keys, naming conventions, and port details from the code before writing the manifests. 🏁 Script executed: Length of output: 16200
Pre-provisioning a Kubernetes Service for Gateway API (Istio/OSSM)By default, Istio automatically creates and manages the Kubernetes
Naming and labeling contractThe Service must follow a strict naming and labeling convention so that Istio can associate it with the Gateway:
Port alignment requirement
In addition to your application ports, the Service must always include the Istio health-check/status port:
Scenario 1: LoadBalancer Service with
|
| Behavior | Guaranteed? |
|---|---|
Istio will not change spec.type |
✅ Yes |
Istio will not change externalTrafficPolicy |
✅ Yes |
Istio will not change spec.selector |
✅ Yes |
Istio will not add or remove ports from spec.ports |
✅ Yes |
| Istio will not create a duplicate Service | ✅ Yes |
| Istio will automatically add a missing listener port to the Service | ❌ No — user must add it |
Removing/re-adding the gateway.istio.io/managed label re-enables Istio management |
❌ No — label toggling has no effect; the Service remains user-owned |
Out-of-sync ports: what to expect
If you add a new listener to the Gateway but forget to add the corresponding port to the Service:
- Istio detects the mismatch.
- The Gateway's
Programmedcondition transitions toFalse. - Traffic to the new listener is not served.
- The Service is not modified by Istio.
To recover: add the missing port to the Service. Istio will reconcile and set Programmed=True again.
|
cc @rhamini3 |
|
/cc |
|
/assign |
|
/retest-required |
|
@rikatz: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
externalTrafficPolicy: Local) and ClusterIPgateway.istio.io/managedlabel toggle does not grant Istio permission to modify the serviceNew helpers (
util_gatewayapi_test.go)createGatewayService()— creates a service following the<gatewayName>-<gatewayClassName>naming contract with all required labels, selector, and portsdefaultManualServicePorts()— returns standard status-port (15021) + http (80) portsTest scenarios (
gateway_api_test.go)testGatewayAPIManualLBService: Pre-creates a LoadBalancer service withexternalTrafficPolicy: Local, creates a Gateway with listeners on ports 80 + 8443 (service omits 8443), asserts service immutability, adds a third listener (8080), asserts no mutation, toggles the managed label and asserts no mutation.testGatewayAPIManualClusterIPService: Same pattern with ClusterIP type — pre-creates service, asserts immutability, negative port guard, listener addition does not modify service.About the
.GCC/directoryThe
.GCC/branches/manual-service-creation/directory contains architectural artifacts produced during development:plan.md— Full implementation blueprint including test design, documentation guidance for tech writers (with security warnings about RBAC boundaries), and a 100-point verification rubriclog.md— Agent reasoning traces from architect, engineer, QA (scored 97/100), and security reviewdecisions/001-manual-service-immutability.md— ADR documenting the key design decision that Istio does not modify pre-created servicesThese artifacts provide traceability for the design decisions and can be removed if the team prefers not to keep them in-tree.
Test plan
testGatewayAPIManualLBServicepasses on a platform with managed DNStestGatewayAPIManualClusterIPServicepasses (no DNS requirement)go build -tags e2e ./test/e2e/passes)🤖 Generated with Claude Code