Skip to content

test: E2E test suite for ExternalProvider/ExternalModel CRDs#187

Open
yossiovadia wants to merge 2 commits into
opendatahub-io:mainfrom
yossiovadia:feat/e2e-new-crds
Open

test: E2E test suite for ExternalProvider/ExternalModel CRDs#187
yossiovadia wants to merge 2 commits into
opendatahub-io:mainfrom
yossiovadia:feat/e2e-new-crds

Conversation

@yossiovadia
Copy link
Copy Markdown
Contributor

@yossiovadia yossiovadia commented Apr 27, 2026

Summary

Automated E2E test suite validating the full ExternalProvider/ExternalModel pipeline end-to-end. Tests the complete stack: CRD creation → controller reconciliation → BBR plugin resolution → api-translation → llm-katan simulator response.

Depends on #182, #183, #184 (ExternalProvider CRD types, reconcilers, and BBR plugin integration).

Test coverage

38 test cases across 7 categories, all 5 providers:

Category Tests Providers Status
Smoke (HTTP 200) 5 openai, anthropic, azure, bedrock, vertex 5/5 pass
OpenAI format validation 5 all 5/5 pass
Tool calling 5 all 5/5 pass
Multimodal (images) 5 all 4/5 — anthropic fails (#185)
JSON mode 5 all 4/5 — anthropic fails (#186)
System prompts + multi-turn 10 all 10/10 pass
Error handling (malformed, empty, 404) 3 openai 3/3 pass

Known failures (pre-existing bugs, not regressions)

Both bugs exist on main and are unrelated to the ExternalProvider work. Discovered via these tests running on a local Kind deployment with llm-katan v0.15.0 simulator.

How to run

Requires a cluster with new CRDs + controller + updated BBR plugin deployed (e.g., via local-deploy.sh):

# Against maas-experimental cluster
kubectl config use-context kind-maas-experimental
go test ./test/e2e-new-crds/ -v -ginkgo.v -count=1 -timeout=10m

# Run only tier1 (smoke tests)
go test ./test/e2e-new-crds/ -v -ginkgo.v -count=1 -ginkgo.label-filter="tier1"

# Run only tool calling
go test ./test/e2e-new-crds/ -v -ginkgo.v -count=1 -ginkgo.label-filter="tool-calling"

Test plan

  • 36/38 pass on local Kind deployment (2 known failures filed as issues)
  • Does not modify existing test/e2e/ tests (CI/CD unaffected)
  • Self-contained — creates own namespace, resources, curl pod; cleans up on exit

Summary by CodeRabbit

  • Tests
    • Added comprehensive end-to-end test suite to validate provider integrations and gateway functionality across multiple request types and scenarios.

Automated E2E tests validating the full new-CRD pipeline:
ExternalProvider + ExternalModel → controller creates networking
resources → BBR plugin resolves → api-translation → inference response.

38 test cases across 7 categories, 5 providers:

- tier1: HTTP 200 smoke + OpenAI format validation (10 tests)
- tier2: tool calling (5), multimodal (5), JSON mode (5),
  system prompts (5), multi-turn (5), error handling (3)

Known failures (pre-existing bugs on main, not regressions):
- Anthropic multimodal: translator drops image metadata (opendatahub-io#185)
- Anthropic JSON mode: translator strips response_format (opendatahub-io#186)

Tests run against local Kind deployment with llm-katan simulator.
Requires: new CRDs + controller + updated BBR plugin deployed.
@openshift-ci openshift-ci Bot requested review from nerdalert and noyitz April 27, 2026 17:54
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 27, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yossiovadia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 27, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@yossiovadia has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 49 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ff7f8136-6fbb-4fd5-bc78-885caab431a4

📥 Commits

Reviewing files that changed from the base of the PR and between cbb0957 and 030d6d4.

📒 Files selected for processing (1)
  • test/e2e-new-crds/e2e_test.go
📝 Walkthrough

Walkthrough

Two new E2E test files are added to the test suite. The first file (e2e_suite_test.go) provides test infrastructure: reading environment variables for namespace and gateway configuration, initializing a Kubernetes client from kubeconfig, creating a test namespace, deploying a curl pod, provisioning ExternalProvider and ExternalModel resources, polling for HTTPRoute creation, and cleanup operations. The second file (e2e_test.go) defines provider configurations and implements test cases validating gateway responses across multiple request types (basic, tools, image, JSON mode, system prompt, multi-turn, raw body), parsing and asserting response structure conformance to OpenAI-style JSON schemas and error handling behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding an E2E test suite for the ExternalProvider/ExternalModel CRDs, which is the core of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
test/e2e-new-crds/e2e_suite_test.go (2)

162-199: Cache the *rest.Config instead of reloading kubeconfig on every exec.

execInPod is called inside gomega.Eventually polling loops across ~38 specs, so this re-parses kubeconfig hundreds of times and can also drift from the config used to build kubeClient in BeforeSuite (e.g. if KUBECONFIG mutates between calls). Hoist the loaded config into a package-level var alongside kubeClient.

Sketch
 var (
 	kubeClient  kubernetes.Interface
+	restConfig  *rest.Config
 	...
 )
@@ BeforeSuite
-	config, err := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(...).ClientConfig()
+	restConfig, err = clientcmd.NewNonInteractiveDeferredLoadingClientConfig(...).ClientConfig()
 	...
-	kubeClient, err = kubernetes.NewForConfig(config)
+	kubeClient, err = kubernetes.NewForConfig(restConfig)
@@ execInPod
-	config, err := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
-		clientcmd.NewDefaultClientConfigLoadingRules(), nil,
-	).ClientConfig()
-	if err != nil {
-		return "", err
-	}
 	...
-	executor, err := remotecommand.NewSPDYExecutor(config, "POST", req.URL())
+	executor, err := remotecommand.NewSPDYExecutor(restConfig, "POST", req.URL())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-new-crds/e2e_suite_test.go` around lines 162 - 199, The execInPod
function currently reloads kubeconfig on every call; instead, create a
package-level *rest.Config variable (e.g., restConfig) and initialize it once in
the same setup that creates kubeClient (BeforeSuite), then modify execInPod to
use that cached restConfig rather than calling
clientcmd.NewNonInteractiveDeferredLoadingClientConfig; update execInPod to
reference restConfig when calling remotecommand.NewSPDYExecutor and add a clear
error if restConfig is nil to help catch mis-initialization.

123-129: Use apierrors.IsAlreadyExists instead of strings.Contains(err.Error(), "already exists").

Substring matching on error text is locale/version-fragile and will silently swallow unrelated errors that happen to contain the phrase. The k8s API returns a typed StatusError for this case.

Proposed fix
+	apierrors "k8s.io/apimachinery/pkg/api/errors"
 ...
 func createNamespace(name string) {
 	ns := &corev1.Namespace{ObjectMeta: metav1.ObjectMeta{Name: name}}
 	_, err := kubeClient.CoreV1().Namespaces().Create(context.TODO(), ns, metav1.CreateOptions{})
-	if err != nil && !strings.Contains(err.Error(), "already exists") {
+	if err != nil && !apierrors.IsAlreadyExists(err) {
 		gomega.Expect(err).NotTo(gomega.HaveOccurred())
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-new-crds/e2e_suite_test.go` around lines 123 - 129, Replace the
fragile substring check in createNamespace by using the Kubernetes API error
helper: when kubeClient.CoreV1().Namespaces().Create(...) returns an error,
import "k8s.io/apimachinery/pkg/api/errors" and call
apierrors.IsAlreadyExists(err) to detect the already-exists case; only call
gomega.Expect(err).NotTo(gomega.HaveOccurred()) for errors that are not
apierrors.IsAlreadyExists(err). Ensure the import is added and the logic
references createNamespace and the Create call.
test/e2e-new-crds/e2e_test.go (3)

209-217: SplitN(..., 2) misparses responses with interim status lines.

curl -i prints every response section, including HTTP/1.1 100 Continue\r\n\r\n interim responses, followed by the real HTTP/1.1 200 OK\r\n...\r\n\r\nbody. The first \r\n\r\n split point lands after the 100 Continue, so parts[1] becomes another header block plus body and json.Unmarshal fails. It's also possible to get gateway-injected 100 Continue from Envoy/Istio under load.

Safer: split on the last \r\n\r\n, or scan for the final HTTP/ status line.

Proposed fix
-	parts := strings.SplitN(resp, "\r\n\r\n", 2)
-	if len(parts) < 2 {
+	idx := strings.LastIndex(resp, "\r\n\r\n")
+	if idx < 0 {
 		return nil, fmt.Errorf("no body separator found")
 	}
 	var result map[string]any
-	err := json.Unmarshal([]byte(strings.TrimSpace(parts[1])), &result)
+	err := json.Unmarshal([]byte(strings.TrimSpace(resp[idx+4:])), &result)
 	return result, err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-new-crds/e2e_test.go` around lines 209 - 217, The parseResponseBody
function misparses when interim HTTP status blocks (e.g., "HTTP/1.1 100
Continue") precede the real response; change parseResponseBody to locate the
final header/body separator instead of using strings.SplitN(...,2) — e.g., find
the last occurrence of "\r\n\r\n" (or scan for the last "HTTP/" status line and
take the subsequent separator) and unmarshal the substring after that separator;
update references in parseResponseBody to use that final-split logic so
json.Unmarshal receives only the real response body.

91-105: Don't drop the json.Marshal error.

If marshalling the body fails (e.g. someone later passes a value that isn't JSON-serializable), bodyBytes is nil and curl is invoked with -d "", producing a confusing 400 that looks like a server bug. Fail the test at the source.

Proposed fix
-func buildCurlCommand(modelName string, body map[string]any) []string {
-	bodyBytes, _ := json.Marshal(body)
+func buildCurlCommand(modelName string, body map[string]any) []string {
+	bodyBytes, err := json.Marshal(body)
+	gomega.Expect(err).NotTo(gomega.HaveOccurred(), "failed to marshal request body")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-new-crds/e2e_test.go` around lines 91 - 105, buildCurlCommand
currently drops the error from json.Marshal causing nil body to become an empty
-d and a misleading 400; change buildCurlCommand(modelName string, body
map[string]any) to return ([]string, error), check the json.Marshal error (i.e.
handle bodyBytes, err := json.Marshal(body)), and return nil, err on failure;
update all callers to handle the error (in tests call t.Fatalf or fail the test
when buildCurlCommand returns an error) so marshalling problems fail fast with a
clear message instead of sending an empty payload.

195-207: Remove unused filterProviders function.

Ripgrep found zero callers for this function across the entire codebase. Since it's test-only dead code with no impact on security or runtime behavior, drop it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-new-crds/e2e_test.go` around lines 195 - 207, Remove the unused test
helper function filterProviders: delete the entire function definition
(filterProviders(providers []Provider, excludeTypes ...string) []Provider) and
any references to it (there are none), and run the tests/build to verify nothing
else depends on the Provider signature; this is dead test-only code so simply
removing the function from e2e_test.go is sufficient.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e-new-crds/e2e_suite_test.go`:
- Around line 22-27: Replace the public sslip.io default in the constant
defaultSimulatorEndpoint with an in-cluster service hostname (e.g.,
"llm-katan.llm-katan.svc.cluster.local") so tests default to cluster-local
traffic; ensure any code that reads E2E_SIMULATOR_ENDPOINT treats it as an
opt-in override (fall back to defaultSimulatorEndpoint only when the env var is
set/empty as intended) so credentials and request bodies are not sent to an
external IP by default.

In `@test/e2e-new-crds/e2e_test.go`:
- Around line 24-30: Add a one-line explanatory comment immediately above the
simulatorKeys map to document that these hardcoded values are intentional
defaults from the llm-katan simulator; specifically insert the comment "// These
match the DEFAULT_API_KEYS in llm-katan's config.py." before the simulatorKeys
variable declaration to prevent accidental replacement with real keys (refer to
the simulatorKeys map symbol).
- Around line 285-305: The tests perform unchecked type assertions (e.g.,
result["choices"].([]any), choice := choices[0].(map[string]any), tc :=
toolCalls[0].(map[string]any), fn := tc["function"].(map[string]any)) which will
panic on unexpected payloads; update each of these sites in the spec around
parseResponseBody handling to use the comma-ok form (e.g., v, ok := ...;
gomega.Expect(ok).To(gomega.BeTrue(), "descriptive message")) or replace them
with Gomega matchers like HaveKeyWithValue/MatchKeys so failures produce
readable expectations rather than panics—apply the same change for every
unchecked assertion occurrences referenced in the comment (lines handling
choices, message/tool_calls, tc, fn and the other mentioned assertion sites).

---

Nitpick comments:
In `@test/e2e-new-crds/e2e_suite_test.go`:
- Around line 162-199: The execInPod function currently reloads kubeconfig on
every call; instead, create a package-level *rest.Config variable (e.g.,
restConfig) and initialize it once in the same setup that creates kubeClient
(BeforeSuite), then modify execInPod to use that cached restConfig rather than
calling clientcmd.NewNonInteractiveDeferredLoadingClientConfig; update execInPod
to reference restConfig when calling remotecommand.NewSPDYExecutor and add a
clear error if restConfig is nil to help catch mis-initialization.
- Around line 123-129: Replace the fragile substring check in createNamespace by
using the Kubernetes API error helper: when
kubeClient.CoreV1().Namespaces().Create(...) returns an error, import
"k8s.io/apimachinery/pkg/api/errors" and call apierrors.IsAlreadyExists(err) to
detect the already-exists case; only call
gomega.Expect(err).NotTo(gomega.HaveOccurred()) for errors that are not
apierrors.IsAlreadyExists(err). Ensure the import is added and the logic
references createNamespace and the Create call.

In `@test/e2e-new-crds/e2e_test.go`:
- Around line 209-217: The parseResponseBody function misparses when interim
HTTP status blocks (e.g., "HTTP/1.1 100 Continue") precede the real response;
change parseResponseBody to locate the final header/body separator instead of
using strings.SplitN(...,2) — e.g., find the last occurrence of "\r\n\r\n" (or
scan for the last "HTTP/" status line and take the subsequent separator) and
unmarshal the substring after that separator; update references in
parseResponseBody to use that final-split logic so json.Unmarshal receives only
the real response body.
- Around line 91-105: buildCurlCommand currently drops the error from
json.Marshal causing nil body to become an empty -d and a misleading 400; change
buildCurlCommand(modelName string, body map[string]any) to return ([]string,
error), check the json.Marshal error (i.e. handle bodyBytes, err :=
json.Marshal(body)), and return nil, err on failure; update all callers to
handle the error (in tests call t.Fatalf or fail the test when buildCurlCommand
returns an error) so marshalling problems fail fast with a clear message instead
of sending an empty payload.
- Around line 195-207: Remove the unused test helper function filterProviders:
delete the entire function definition (filterProviders(providers []Provider,
excludeTypes ...string) []Provider) and any references to it (there are none),
and run the tests/build to verify nothing else depends on the Provider
signature; this is dead test-only code so simply removing the function from
e2e_test.go is sufficient.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b903da1e-c9ad-47a5-8ef1-0a69e22e7e3b

📥 Commits

Reviewing files that changed from the base of the PR and between d338dc1 and cbb0957.

📒 Files selected for processing (2)
  • test/e2e-new-crds/e2e_suite_test.go
  • test/e2e-new-crds/e2e_test.go

Comment on lines +22 to +27
const (
defaultNs = "e2e-new-crds"
defaultSimulatorEndpoint = "3-13-21-181.sslip.io"
defaultGatewayName = "maas-default-gateway"
defaultGatewayNamespace = "istio-system"
)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What public host is 3.13.21.181 / 3-13-21-181.sslip.io and is sslip.io a wildcard DNS resolver?

💡 Result:

3.13.21.181 / 3-13-21-181.sslip.io resolves to the IP 3.13.21.181 using sslip.io’s IP-embedded DNS mapping format. Yes—sslip.io is a wildcard-style DNS resolver (often described as “wildcard DNS”) in the sense that any hostname of the form .sslip.io returns that embedded IP as an A/AAAA answer, for both local and public IPs (with a configurable “public” behavior in its server implementation).

Citations:


Move simulatorEndpoint default from external sslip.io endpoint to in-cluster service.

3-13-21-181.sslip.io resolves to 3.13.21.181 (a public IP). Any developer or CI pipeline that forgets to set E2E_SIMULATOR_ENDPOINT will send request bodies and the Authorization header (carrying llm-katan-*-key) to an IP outside your control—CWE-668 (Exposure of Resource to Wrong Sphere). Even if the keys are simulator-only, this is a hidden external dependency that fails silently instead of loudly.

Default to an in-cluster service (e.g., llm-katan.llm-katan.svc.cluster.local) and treat the env var as opt-in for off-cluster test runs. This prevents accidental credential exfiltration and makes missing endpoints fail explicitly.

Change
-	defaultSimulatorEndpoint = "3-13-21-181.sslip.io"
+	// In-cluster llm-katan service; override with E2E_SIMULATOR_ENDPOINT for off-cluster runs.
+	defaultSimulatorEndpoint = "llm-katan.llm-katan.svc.cluster.local"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const (
defaultNs = "e2e-new-crds"
defaultSimulatorEndpoint = "3-13-21-181.sslip.io"
defaultGatewayName = "maas-default-gateway"
defaultGatewayNamespace = "istio-system"
)
const (
defaultNs = "e2e-new-crds"
// In-cluster llm-katan service; override with E2E_SIMULATOR_ENDPOINT for off-cluster runs.
defaultSimulatorEndpoint = "llm-katan.llm-katan.svc.cluster.local"
defaultGatewayName = "maas-default-gateway"
defaultGatewayNamespace = "istio-system"
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-new-crds/e2e_suite_test.go` around lines 22 - 27, Replace the public
sslip.io default in the constant defaultSimulatorEndpoint with an in-cluster
service hostname (e.g., "llm-katan.llm-katan.svc.cluster.local") so tests
default to cluster-local traffic; ensure any code that reads
E2E_SIMULATOR_ENDPOINT treats it as an opt-in override (fall back to
defaultSimulatorEndpoint only when the env var is set/empty as intended) so
credentials and request bodies are not sent to an external IP by default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sslip.io endpoint is the llm-katan simulator running on AWS — it's external by design (simulates a real cloud provider). Using an in-cluster service would defeat the purpose of testing external model routing through Istio ServiceEntry + TLS origination. The endpoint is configurable via E2E_SIMULATOR_ENDPOINT env var.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossiovadia, understood — the external endpoint is architecturally necessary to exercise the full Istio ServiceEntry + TLS origination path, and replacing it with an in-cluster service would short-circuit exactly what the test is validating. With simulator-only keys and the E2E_SIMULATOR_ENDPOINT override already in place, the concern doesn't apply here. Thanks for the context.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

Comment on lines +24 to +30
var simulatorKeys = map[string]string{
"openai": "llm-katan-openai-key",
"anthropic": "llm-katan-anthropic-key",
"azure-openai": "llm-katan-azure-key",
"vertex-openai": "llm-katan-vertexai-key",
"bedrock-openai": "llm-katan-bedrock-key",
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm these strings only appear as llm-katan simulator defaults and aren't referenced as real secrets elsewhere.
rg -nP -C2 'llm-katan-(openai|anthropic|azure|vertexai|bedrock)-key'

Repository: opendatahub-io/ai-gateway-payload-processing

Length of output: 2314


Add explanatory comment pinning these keys to llm-katan simulator defaults.

The hardcoded keys match llm-katan v0.15.0 simulator defaults (confirmed in tools/simulator/README.md). However, unlike the sibling file test/e2e/e2e_test.go, this file lacks a comment documenting the source. Add:

// These match the DEFAULT_API_KEYS in llm-katan's config.py.

before the simulatorKeys map. Without this anchor, future maintainers may replace them with real keys for "convenience", creating CWE-798 exposure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-new-crds/e2e_test.go` around lines 24 - 30, Add a one-line
explanatory comment immediately above the simulatorKeys map to document that
these hardcoded values are intentional defaults from the llm-katan simulator;
specifically insert the comment "// These match the DEFAULT_API_KEYS in
llm-katan's config.py." before the simulatorKeys variable declaration to prevent
accidental replacement with real keys (refer to the simulatorKeys map symbol).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — there's already a comment on the existing E2E tests' key map (test/e2e/e2e_test.go:24). I'll add the same here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossiovadia, sounds good! For consistency, make sure the added comment explicitly ties the keys to the simulator defaults to prevent CWE-798 exposure, e.g.:

// These match the DEFAULT_API_KEYS in llm-katan's config.py (simulator test defaults — not real credentials).
var simulatorKeys = map[string]string{

Once it's committed, feel free to drop a note here and I'll mark this resolved.

Comment on lines +285 to +305
result, err := parseResponseBody(resp)
gomega.Expect(err).NotTo(gomega.HaveOccurred())

choices, ok := result["choices"].([]any)
gomega.Expect(ok).To(gomega.BeTrue())
gomega.Expect(len(choices)).To(gomega.BeNumerically(">", 0))

choice := choices[0].(map[string]any)
gomega.Expect(choice["finish_reason"]).To(gomega.Equal("tool_calls"),
fmt.Sprintf("Expected finish_reason=tool_calls for %s", p.ProviderType))

msg := choice["message"].(map[string]any)
toolCalls, ok := msg["tool_calls"].([]any)
gomega.Expect(ok).To(gomega.BeTrue(), "Expected tool_calls array in message")
gomega.Expect(len(toolCalls)).To(gomega.BeNumerically(">", 0))

tc := toolCalls[0].(map[string]any)
gomega.Expect(tc).To(gomega.HaveKey("id"))
fn := tc["function"].(map[string]any)
gomega.Expect(fn["name"]).To(gomega.Equal("get_weather"))
})
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unchecked type assertions panic instead of producing legible test failures.

Several spots in this file (lines 292, 296, 301, 303 here; also 331-332, 363-364, 411, 413-415 in later specs) do x := y.(map[string]any) / y.([]any) without the , ok form. If the gateway or upstream returns an unexpected payload (an error JSON, or a string for content instead of an object), the goroutine panics and you get a stack trace instead of "Expected ... for provider X".

Either guard each assertion with , ok and gomega.Expect(ok)..., or let Gomega type-coerce via gomega.HaveKeyWithValue / gomega.MatchKeys. Pattern to apply throughout:

Example (apply to every `.(...)` site, including 331-415)
-				choice := choices[0].(map[string]any)
+				choice, ok := choices[0].(map[string]any)
+				gomega.Expect(ok).To(gomega.BeTrue(), "choices[0] is not an object")
 				gomega.Expect(choice["finish_reason"]).To(gomega.Equal("tool_calls"), ...)
-				msg := choice["message"].(map[string]any)
+				msg, ok := choice["message"].(map[string]any)
+				gomega.Expect(ok).To(gomega.BeTrue(), "choice.message is not an object")
 				toolCalls, ok := msg["tool_calls"].([]any)
 				...
-				tc := toolCalls[0].(map[string]any)
+				tc, ok := toolCalls[0].(map[string]any)
+				gomega.Expect(ok).To(gomega.BeTrue(), "tool_calls[0] is not an object")
 				gomega.Expect(tc).To(gomega.HaveKey("id"))
-				fn := tc["function"].(map[string]any)
+				fn, ok := tc["function"].(map[string]any)
+				gomega.Expect(ok).To(gomega.BeTrue(), "tool_calls[0].function is not an object")
 				gomega.Expect(fn["name"]).To(gomega.Equal("get_weather"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
result, err := parseResponseBody(resp)
gomega.Expect(err).NotTo(gomega.HaveOccurred())
choices, ok := result["choices"].([]any)
gomega.Expect(ok).To(gomega.BeTrue())
gomega.Expect(len(choices)).To(gomega.BeNumerically(">", 0))
choice := choices[0].(map[string]any)
gomega.Expect(choice["finish_reason"]).To(gomega.Equal("tool_calls"),
fmt.Sprintf("Expected finish_reason=tool_calls for %s", p.ProviderType))
msg := choice["message"].(map[string]any)
toolCalls, ok := msg["tool_calls"].([]any)
gomega.Expect(ok).To(gomega.BeTrue(), "Expected tool_calls array in message")
gomega.Expect(len(toolCalls)).To(gomega.BeNumerically(">", 0))
tc := toolCalls[0].(map[string]any)
gomega.Expect(tc).To(gomega.HaveKey("id"))
fn := tc["function"].(map[string]any)
gomega.Expect(fn["name"]).To(gomega.Equal("get_weather"))
})
result, err := parseResponseBody(resp)
gomega.Expect(err).NotTo(gomega.HaveOccurred())
choices, ok := result["choices"].([]any)
gomega.Expect(ok).To(gomega.BeTrue())
gomega.Expect(len(choices)).To(gomega.BeNumerically(">", 0))
choice, ok := choices[0].(map[string]any)
gomega.Expect(ok).To(gomega.BeTrue(), "choices[0] is not an object")
gomega.Expect(choice["finish_reason"]).To(gomega.Equal("tool_calls"),
fmt.Sprintf("Expected finish_reason=tool_calls for %s", p.ProviderType))
msg, ok := choice["message"].(map[string]any)
gomega.Expect(ok).To(gomega.BeTrue(), "choice.message is not an object")
toolCalls, ok := msg["tool_calls"].([]any)
gomega.Expect(ok).To(gomega.BeTrue(), "Expected tool_calls array in message")
gomega.Expect(len(toolCalls)).To(gomega.BeNumerically(">", 0))
tc, ok := toolCalls[0].(map[string]any)
gomega.Expect(ok).To(gomega.BeTrue(), "tool_calls[0] is not an object")
gomega.Expect(tc).To(gomega.HaveKey("id"))
fn, ok := tc["function"].(map[string]any)
gomega.Expect(ok).To(gomega.BeTrue(), "tool_calls[0].function is not an object")
gomega.Expect(fn["name"]).To(gomega.Equal("get_weather"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-new-crds/e2e_test.go` around lines 285 - 305, The tests perform
unchecked type assertions (e.g., result["choices"].([]any), choice :=
choices[0].(map[string]any), tc := toolCalls[0].(map[string]any), fn :=
tc["function"].(map[string]any)) which will panic on unexpected payloads; update
each of these sites in the spec around parseResponseBody handling to use the
comma-ok form (e.g., v, ok := ...; gomega.Expect(ok).To(gomega.BeTrue(),
"descriptive message")) or replace them with Gomega matchers like
HaveKeyWithValue/MatchKeys so failures produce readable expectations rather than
panics—apply the same change for every unchecked assertion occurrences
referenced in the comment (lines handling choices, message/tool_calls, tc, fn
and the other mentioned assertion sites).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. These follow the same pattern as the existing E2E tests (test/e2e/e2e_test.go:200). In practice, a panic in a test produces a clear stack trace that points directly to the assertion. But I can wrap them in gomega Expect for cleaner output — will address.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossiovadia, agreed — the stack trace is usually sufficient for debugging, but one extra concern in Ginkgo: if the goroutine panic isn't recovered cleanly (e.g., in a parallel go test -p N run), it can abort the entire suite process rather than just failing the single It node, swallowing results from other specs. The gomega.Expect(ok) form lets Ginkgo's panic-recovery machinery report it as a normal spec failure and continue. The committable suggestion above covers all the affected lines — feel free to use it as-is when you get to it.

@yossiovadia
Copy link
Copy Markdown
Contributor Author

Update: The 2 Anthropic failures discovered by this test suite have been fixed:

Once those merge, this suite should go 38/38 green.

@yossiovadia
Copy link
Copy Markdown
Contributor Author

/cc @nirrozenbaum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant