Skip to content

test: E2E tests for multi-provider traffic splitting#214

Open
yossiovadia wants to merge 15 commits into
opendatahub-io:mainfrom
yossiovadia:feat/e2e-multi-provider-tests
Open

test: E2E tests for multi-provider traffic splitting#214
yossiovadia wants to merge 15 commits into
opendatahub-io:mainfrom
yossiovadia:feat/e2e-multi-provider-tests

Conversation

@yossiovadia
Copy link
Copy Markdown
Contributor

Summary

Adds multi-provider traffic splitting tests to the new-CRD E2E suite. Creates an ExternalModel with 80/20 weighted OpenAI/Anthropic providers and verifies traffic distribution, model field rewrite, and response format.

Depends on #213 (multi-provider weights) and #187 (new-CRD E2E suite).

Test cases

Test What it verifies
HTTP 200 Multi-provider model returns valid responses
Traffic distribution 50 requests split ~80/20 between OpenAI and Anthropic
Model field rewrite Response model is targetModel, not client-facing name

Verified

Implements inference.opendatahub.io/v1alpha1 API group with two CRDs
per Nir's ExternalProvider proposal:

- ExternalProvider: provider endpoint, credentials, and provider-specific
  config for per-provider settings (Vertex AI project/location/etc.)
- ExternalModel: client-facing model name with provider ref, targetModel,
  apiFormat. MaxItems=1 on externalProviderRefs for Phase 1.

Config field uses map[string]string rather than map[string]any:
- Typed validation in the CRD schema (no x-kubernetes-preserve-unknown-fields)
- Clean JSON round-trip (map[string]any converts numbers to float64)
- Covers all known use cases (project, location, endpoint are strings)
- Can be promoted to a typed struct if nested config is needed later

Adds Makefile targets: generate, manifests, verify-codegen.
- NameReference: add RFC 1123 pattern (lowercase alphanumeric + hyphens)
- Endpoint: require at least one dot to enforce FQDN (reject single-label)
- Bump controller-gen v0.16.4 → v0.20.1 (aligns with controller-runtime v0.23.3)
- Add regex validation tests for both patterns
Address Nir's review:
- apiFormat: changed from optional to required per API evolution rule
  (required → optional is non-breaking, opposite is breaking)
- Phase status: clarified that Pending/Ready/Failed reflect reconciliation
  state (missing provider, missing secret), not runtime request health
…binary

Adds the control-plane controller that watches the new CRDs and creates
Istio networking resources:

ExternalProvider reconciler:
- Creates shared Service, ServiceEntry, DestinationRule per provider
- Validates referenced Secret exists (Phase=Failed/SecretNotFound if missing)
- Idempotent writes, OwnerReference GC, self-healing via Owns(&Service{})

ExternalModel reconciler:
- Creates HTTPRoute per model, backend ref to provider's shared Service
- Cross-watches ExternalProviders — endpoint changes propagate automatically
- Self-healing via Owns(&HTTPRoute{})

Controller binary (cmd/controller):
- --gateway-name/--gateway-namespace flags (same pattern as MaaS controller)
- Health probes, leader election, SecurityContext hardening

Tests: 9 unit + 6 envtest (externalprovider), 3 unit + 5 envtest (externalmodel)
…-first, configurable timeout

1. Istio self-healing: add Watches() for ServiceEntry and DestinationRule
   with label predicate + EnqueueRequestForOwner, so external deletion
   triggers re-reconciliation (same as Service via Owns)

2. ExternalModel checks provider Phase: if provider is not Ready, model
   sets Phase=Failed instead of creating an HTTPRoute to an unconfigured
   endpoint

3. Validate-before-create: ExternalProvider reconciler validates secret
   exists before creating networking resources. Failed validation =
   no resources created, Phase=Failed

4. Configurable HTTPRoute timeout: --default-route-timeout flag (default
   300s) on controller binary, passed through to buildHTTPRoute

5. Leader election comment in sample deployment YAML
Replace maas.opendatahub.io ExternalModel watcher with two
inference.opendatahub.io watchers:
- ExternalProvider reconciler → providerInfoStore (endpoint, creds, config)
- ExternalModel reconciler → resolves provider ref, writes to modelInfoStore

Provider config (map[string]string) flows through to CycleState via
ProviderConfigKey for downstream plugins (api-translation).

ProcessRequest unchanged — same CycleState keys, same downstream behavior.

Breaking change: BBR no longer resolves maas.opendatahub.io ExternalModel CRs.
RBAC updated to inference.opendatahub.io in Helm templates.
1. Cross-watch: ExternalModel plugin reconciler now Watches() ExternalProvider
   changes via mapProviderToModels. When a provider is updated (e.g., credential
   rotation), all referencing models are re-reconciled so modelInfoStore stays
   current. Without this, stale credentials persist in the data plane until
   the ExternalModel CR is touched.

2. Propagation test: verifies that updating provider credentials in providerStore
   and re-reconciling the model picks up the new values in modelStore.
- Model reconciler: delete stale modelStore entry when externalProviderRefs
  is empty, malformed, or missing provider ref name
- Provider reconciler: delete stale providerStore entry when required fields
  become empty (valid→invalid spec transition)
- Log List errors in mapProviderToModels instead of silently returning nil
The api-translation plugin now prefers the apiFormat field (from
ExternalModel.spec.externalProviderRefs[].apiFormat) over the provider
type (from ExternalProvider.spec.provider) when selecting the request/
response translator.

This enables explicit API format control — e.g., a Bedrock provider
can serve both OpenAI-compatible and Anthropic-format models by
specifying different apiFormat values per model.

Flow: ExternalModel CR → plugin reconciler extracts apiFormat →
CycleState (APIFormatKey) → api-translation resolves translator.
Falls back to ProviderKey if APIFormatKey is not set.
Enables an ExternalModel to reference multiple ExternalProviders with
relative weights for traffic splitting.

CRD:
- Remove MaxItems=1 on externalProviderRefs
- Add Weight field (*int32, default=1, minimum=0)

Controller (ExternalModel reconciler):
- Resolves all provider refs, creates HTTPRoute with per-provider rules
- Rule 1: path-based match (Kuadrant compatibility)
- Per-provider rules: X-Selected-Provider header match with provider-
  specific backend ref and Host header for TLS SNI

BBR plugin (model-provider-resolver):
- Weighted random selection across provider refs (precomputed totalWeight)
- Sets X-Selected-Provider header for deterministic Envoy routing
- Rewrites body model field to selected provider's targetModel
- Model name validation: checks CR name (client-facing) not targetModel

Example:
  externalProviderRefs:
    - ref: {name: openai-provider}
      targetModel: gpt-4o
      apiFormat: openai
      weight: 80
    - ref: {name: bedrock-provider}
      targetModel: gpt-4o-bedrock
      apiFormat: bedrock-openai
      weight: 20

E2E verified: 14/6 OpenAI/Anthropic distribution over 20 requests
(configured 80/20).
Adds 3 test cases to the new-CRD E2E suite for multi-provider:
- HTTP 200 for multi-provider model
- Traffic distribution: verifies both providers receive traffic with
  correct weight ratio (80/20 OpenAI/Anthropic over 50 requests)
- Model field rewrite: verifies body.model is rewritten to targetModel

Also adds apiFormat to single-provider test setup (now required per
Nir's review on PR opendatahub-io#182).

Total: 41 tests (38 single-provider + 3 multi-provider).
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yossiovadia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Warning

Rate limit exceeded

@yossiovadia has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 39 minutes and 13 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 471b0a6e-13a1-4ff0-aa05-0eee5bc4e214

📥 Commits

Reviewing files that changed from the base of the PR and between f83425e and 2a218b4.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum, !**/*.sum
📒 Files selected for processing (44)
  • .gitignore
  • Dockerfile.controller
  • Makefile
  • api/inference/v1alpha1/externalmodel_types.go
  • api/inference/v1alpha1/externalprovider_types.go
  • api/inference/v1alpha1/groupversion_info.go
  • api/inference/v1alpha1/types_test.go
  • api/inference/v1alpha1/zz_generated.deepcopy.go
  • cmd/controller/main.go
  • config/crd/bases/inference.opendatahub.io_externalmodels.yaml
  • config/crd/bases/inference.opendatahub.io_externalproviders.yaml
  • deploy/controller/deployment.yaml
  • deploy/controller/samples/external-model-katan.yaml
  • deploy/controller/samples/external-provider-katan.yaml
  • deploy/payload-processing/templates/rbac.yaml
  • docs/plans/2026-04-27-split-external-provider-prs.md
  • go.mod
  • pkg/controller/common/constants.go
  • pkg/controller/externalmodel/reconciler.go
  • pkg/controller/externalmodel/reconciler_test.go
  • pkg/controller/externalmodel/resources.go
  • pkg/controller/externalmodel/resources_test.go
  • pkg/controller/externalmodel/testdata/gateway-api-crds/httproute-crd.yaml
  • pkg/controller/externalprovider/reconciler.go
  • pkg/controller/externalprovider/reconciler_test.go
  • pkg/controller/externalprovider/resources.go
  • pkg/controller/externalprovider/resources_test.go
  • pkg/controller/externalprovider/testdata/istio-crds/destinationrule-crd.yaml
  • pkg/controller/externalprovider/testdata/istio-crds/serviceentry-crd.yaml
  • pkg/plugins/api-translation/plugin.go
  • pkg/plugins/api-translation/plugin_test.go
  • pkg/plugins/common/state/state-keys.go
  • pkg/plugins/model-provider-resolver/external_model_reconciler.go
  • pkg/plugins/model-provider-resolver/external_model_reconciler_test.go
  • pkg/plugins/model-provider-resolver/external_provider_reconciler.go
  • pkg/plugins/model-provider-resolver/external_provider_reconciler_test.go
  • pkg/plugins/model-provider-resolver/model_store.go
  • pkg/plugins/model-provider-resolver/model_store_test.go
  • pkg/plugins/model-provider-resolver/plugin.go
  • pkg/plugins/model-provider-resolver/plugin_test.go
  • pkg/plugins/model-provider-resolver/provider_store.go
  • pkg/plugins/model-provider-resolver/provider_store_test.go
  • test/e2e-new-crds/e2e_suite_test.go
  • test/e2e-new-crds/e2e_test.go

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 39 minutes and 13 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 29, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@yossiovadia
Copy link
Copy Markdown
Contributor Author

Note on file count: This PR is stacked on top of #213#212#184#183#182. GitHub shows the full diff from main, not just this PR's delta.

The actual changes in this PR are 3 files (multi-provider test cases + apiFormat fix in e2e-new-crds suite). Once the dependency chain merges, GitHub will automatically recalculate the diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant