Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions test/e2e-integration/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__pycache__/
121 changes: 121 additions & 0 deletions test/e2e-integration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# E2E Integration Tests

End-to-end integration tests for ExternalProvider/ExternalModel CRDs (`inference.opendatahub.io/v1alpha1`).

These tests send **real HTTP requests** through the full gateway stack:
Envoy → Kuadrant/Authorino → BBR (payload-processing) → external model endpoint (llm-katan).

No mocks, no unit test fakes — every test validates live cluster behavior.

## What's Covered

| Category | File | Tests | Status |
|----------|------|-------|--------|
| **Reconciler: resource creation** | `test_reconciler.py` | Provider creates Service, ServiceEntry, DestinationRule; Model creates HTTPRoute; ownership, labels, gateway targeting | Pass |
| **Reconciler: negative cases** | `test_reconciler.py` | Model with non-existent provider ref; Provider with missing Secret | Pass |
| **Reconciler: multiple providers** | `test_reconciler.py` | OpenAI, Anthropic, Vertex providers/models all reconcile to Ready | Pass |
| **Auth: negative** | `test_auth.py` | No auth → 401; invalid bearer → 401; fake API key → 401; random auth → 401 | Pass (requires gateway-default-auth) |
| **Auth: positive** | `test_auth.py` | Valid API key → 200; response has choices, model field, non-empty content | Pass (requires MaaSModelRef + AuthPolicy + Subscription) |
| **Auth: error paths** | `test_auth.py` | Wrong model name in body → 404; unsupported path (/embeddings) → 400; empty messages; non-existent route | Pass |
| **Lifecycle** | `test_lifecycle.py` | Delete ExternalModel → HTTPRoute removed; delete provider → model goes Failed; recreate provider → model recovers | Pass |
| **Multi-provider weights** | `test_multiprovider.py` | Multiple provider refs, weighted traffic splitting, X-Selected-Provider header | xfail (PR #213 not merged) |
| **Migration v1alpha1 → v1alpha2** | `test_migration.py` | Auto-conversion of old ExternalModel CRs, credential preservation, provider deduplication | xfail (not implemented) |

Tests marked `xfail` run but are expected to fail — they track unimplemented features.
When a feature lands and the test starts passing, pytest flags it as `XPASS`, signaling the marker should be removed.

## Prerequisites

### Cluster requirements (both Kind and OpenShift)

- Istio with Gateway API support
- Gateway named `maas-default-gateway` in the gateway namespace
- Kuadrant operator + Authorino (for auth tests)
- BBR (payload-processing) deployed with `model-provider-resolver` plugin
- `inference.opendatahub.io` CRDs installed
- ExternalProvider + ExternalModel CRs deployed and reconciled
- An external model endpoint reachable from the cluster (e.g., llm-katan)

### For auth tests (test_auth.py)

- Gateway-level default-deny AuthPolicy (`gateway-default-auth`) applied
- MaaS controller deployed (with [PR #865](https://github.com/opendatahub-io/models-as-a-service/pull/865) fix for API-group agnostic MaaSModelRef)
- MaaSModelRef pointing at the ExternalModel (kind: ExternalModel, name matches)
- MaaSAuthPolicy granting access to the model
- MaaSSubscription with token budget
- MaaS API reachable at `{GATEWAY_HOST}/maas-api` (for API key creation)

## Environment Variables

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `GATEWAY_HOST` | **Yes** | — | Gateway endpoint (e.g., `localhost:19080` or `maas.example.com`) |
| `E2E_SIMULATOR_ENDPOINT` | For lifecycle tests | — | llm-katan FQDN (e.g., `3-13-21-181.sslip.io`) |
| `E2E_MODEL_NAMESPACE` | No | `llm` | Namespace where ExternalProvider/Model CRs live |
| `E2E_NEW_CRD_MODEL` | No | `new-katan-openai` | ExternalModel name for reconciler/auth tests |
| `E2E_NEW_CRD_PROVIDER` | No | `katan-openai-provider` | ExternalProvider name for reconciler tests |
| `E2E_NEW_CRD_TARGET_MODEL` | No | `llm-katan-echo` | targetModel value for request body |
| `E2E_NEW_CRD_SUBSCRIPTION` | No | `new-crd-subscription` | MaaSSubscription name for API key creation |
| `E2E_MULTI_PROVIDER_MODEL` | No | `multi-provider-test` | ExternalModel with multiple provider refs |
| `INSECURE_HTTP` | No | `false` | Use HTTP instead of HTTPS (for Kind port-forward) |
| `E2E_SKIP_TLS_VERIFY` | No | `false` | Skip TLS certificate verification |
| `E2E_TIMEOUT` | No | `30` | HTTP request timeout in seconds |

## Running on Kind (local-deploy)

```bash
# 1. Deploy the cluster using local-deploy.sh (from models-as-a-service repo)
# This sets up Istio, Kuadrant, MaaS, BBR, and test fixtures.

# 2. Port-forward the gateway
kubectl port-forward -n istio-system svc/maas-default-gateway-istio 19080:80 &

# 3. Install test dependencies
pip install -r test/e2e-integration/requirements.txt

# 4. Run all tests
GATEWAY_HOST="localhost:19080" \
INSECURE_HTTP="true" \
E2E_SKIP_TLS_VERIFY="true" \
E2E_SIMULATOR_ENDPOINT="3-13-21-181.sslip.io" \
E2E_MODEL_NAMESPACE="llm" \
python -m pytest test/e2e-integration/ -v

# Run a specific category
python -m pytest test/e2e-integration/test_auth.py -v

# Run only passing tests (skip xfail)
python -m pytest test/e2e-integration/ -v -m "not xfail_known"
```

## Running on OpenShift

The same tests work on OpenShift — the only differences are the gateway endpoint and TLS:

```bash
# 1. Ensure you're logged in to the OpenShift cluster
oc login ...

# 2. Get the gateway hostname
GATEWAY_HOST=$(oc get gateway maas-default-gateway -n openshift-ingress -o jsonpath='{.status.addresses[0].value}')

# 3. Run tests (HTTPS by default, no INSECURE_HTTP)
GATEWAY_HOST="$GATEWAY_HOST" \
E2E_SIMULATOR_ENDPOINT="3-13-21-181.sslip.io" \
E2E_MODEL_NAMESPACE="llm" \
python -m pytest test/e2e-integration/ -v
```

Key differences from Kind:
- **No `INSECURE_HTTP`** — OpenShift gateways use HTTPS with valid certs
- **No `E2E_SKIP_TLS_VERIFY`** — TLS certs are valid (unless self-signed)
- **Gateway hostname** — use the actual route/LB hostname, not localhost port-forward
- **Auth** — same flow, but the gateway-default-auth AuthPolicy should already be deployed by the MaaS operator

## Test Design Principles

- **No mocks** — all tests hit the real gateway and validate real HTTP responses
- **Standalone** — no imports from MaaS repo; helpers use `kubectl` and `requests` directly
- **Idempotent** — tests that create resources clean up after themselves
- **Descriptive failures** — assertion messages explain what went wrong and what to check
- **xfail for gaps** — unimplemented features are tracked with `pytest.mark.xfail(reason=...)` referencing the blocking PR/issue
30 changes: 30 additions & 0 deletions test/e2e-integration/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
"""
Shared fixtures for e2e-integration tests.

Environment variables:
GATEWAY_HOST - Gateway endpoint (required, e.g. localhost:19080)
E2E_SIMULATOR_ENDPOINT - llm-katan FQDN (required for provider tests)
E2E_MODEL_NAMESPACE - Namespace for test resources (default: llm)
INSECURE_HTTP - Use HTTP instead of HTTPS (default: false)
E2E_SKIP_TLS_VERIFY - Skip TLS cert verification (default: false)
"""

import os
import pytest


def pytest_configure(config):
config.addinivalue_line("markers", "xfail_known: mark test as expected failure with tracked issue")


@pytest.fixture(scope="session")
def model_namespace():
return os.environ.get("E2E_MODEL_NAMESPACE", "llm")


@pytest.fixture(scope="session")
def simulator_endpoint():
ep = os.environ.get("E2E_SIMULATOR_ENDPOINT", "")
if not ep:
pytest.skip("E2E_SIMULATOR_ENDPOINT not set")
return ep
107 changes: 107 additions & 0 deletions test/e2e-integration/helpers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
"""
Standalone helpers for e2e-integration tests.

No dependencies on MaaS repo — only kubectl, requests, and stdlib.
"""

import json
import logging
import os
import subprocess
import time

import requests

log = logging.getLogger(__name__)

TIMEOUT = int(os.environ.get("E2E_TIMEOUT", "30"))
TLS_VERIFY = os.environ.get("E2E_SKIP_TLS_VERIFY", "").lower() != "true"


def gateway_url():
host = os.environ.get("GATEWAY_HOST", "")
if not host:
raise RuntimeError("GATEWAY_HOST env var is required")
scheme = "http" if os.environ.get("INSECURE_HTTP", "").lower() == "true" else "https"
return f"{scheme}://{host}"


def apply_cr(cr_dict):
result = subprocess.run(
["kubectl", "apply", "-f", "-"],
input=json.dumps(cr_dict),
capture_output=True, text=True,
)
if result.returncode != 0:
raise RuntimeError(f"kubectl apply failed: {result.stderr}")


def delete_cr(kind, name, namespace):
subprocess.run(
["kubectl", "delete", kind, name, "-n", namespace, "--ignore-not-found", "--timeout=30s"],
capture_output=True, text=True,
)


def get_cr(kind, name, namespace):
result = subprocess.run(
["kubectl", "get", kind, name, "-n", namespace, "-o", "json"],
capture_output=True, text=True,
)
if result.returncode != 0:
if "not found" in result.stderr.lower() or "notfound" in result.stderr.lower():
return None
raise RuntimeError(f"kubectl get {kind}/{name} failed: {result.stderr}")
return json.loads(result.stdout)


def wait_for_cr(kind, name, namespace, jsonpath_check, timeout=60):
"""Poll until a CR field matches expected value.

jsonpath_check: callable that receives the CR dict and returns True when ready.
"""
deadline = time.time() + timeout
while time.time() < deadline:
cr = get_cr(kind, name, namespace)
if cr and jsonpath_check(cr):
return cr
time.sleep(2)
return None


def chat_request(model_url, body, auth_header=None):
headers = {"Content-Type": "application/json"}
if auth_header:
headers["Authorization"] = auth_header
return requests.post(model_url, headers=headers, json=body, timeout=TIMEOUT, verify=TLS_VERIFY)


def get_cluster_token(sa_name="maas-api", namespace="maas-system"):
result = subprocess.run(
["kubectl", "create", "token", sa_name, "-n", namespace,
"--duration=10m", "--audience=https://kubernetes.default.svc"],
capture_output=True, text=True,
)
token = result.stdout.strip()
if not token:
raise RuntimeError(f"Failed to create token for {sa_name}: {result.stderr}")
return token


def create_api_key(subscription, name=None):
import uuid
token = get_cluster_token()
key_name = name or f"e2e-int-{uuid.uuid4().hex[:8]}"
maas_api_url = f"{gateway_url()}/maas-api/v1/api-keys"
r = requests.post(
maas_api_url,
headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
json={"name": key_name, "subscription": subscription},
timeout=TIMEOUT, verify=TLS_VERIFY,
)
if r.status_code not in (200, 201):
raise RuntimeError(f"Failed to create API key: {r.status_code} {r.text}")
key = r.json().get("key")
if not key:
raise RuntimeError(f"API key response missing 'key': {r.json()}")
return key
2 changes: 2 additions & 0 deletions test/e2e-integration/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
requests>=2.31
pytest>=8.0
Loading