test(e2e): mTLS datapath scenario for Kind#339
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new end-to-end (E2E) test scenario for verifying the mTLS datapath in Kind environments, including the deployment of an mTLS-enabled MCPServer, certificate generation, and validation of client certificate authentication. The review feedback highlights two key improvements in the test script: first, removing the -f flag from curl during the spoofing check to prevent false positives when handling HTTP error responses, and second, ensuring both the gateway TLS and CA secrets are verified after the timeout loop to provide a clearer error message on failure.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| spoof_body="$(curl -fsS \ | ||
| --cert "${MTLS_CLIENT_CERT}" \ | ||
| --key "${MTLS_CLIENT_KEY}" \ | ||
| --cacert "${MTLS_CLIENT_CA}" \ | ||
| -H "Host: ${SERVER_HOST}" \ | ||
| -H "content-type: application/json" \ | ||
| -H "accept: application/json, text/event-stream" \ | ||
| -H "Mcp-Protocol-Version: ${MCP_PROTOCOL_VERSION}" \ | ||
| -H "X-MCP-Human-ID: spoofed-human" \ | ||
| -H "X-MCP-Agent-ID: spoofed-agent" \ | ||
| -H "X-MCP-Agent-Session: definitely-not-${MTLS_SESSION_NAME}" \ | ||
| --data '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"aaa-ping","arguments":{}}}' \ | ||
| "${MTLS_URL}" 2>/dev/null || true)" |
There was a problem hiding this comment.
Using curl -fsS in the spoofing check causes curl to exit with a non-zero status code (22) and suppress the response body when the gateway correctly returns a 4xx or 5xx error (such as 403 Forbidden). Because of || true, the spoof_body variable is assigned an empty string, which bypasses the if [[ -n "${spoof_body}" ]] block entirely. Consequently, the Python validation script is never executed on the expected error response, resulting in a false positive where any 4xx or 5xx response (including a 500 Internal Server Error) passes the test without verifying the actual error message.
Removing the -f flag from curl allows the response body to be captured and validated by the Python script even on HTTP error statuses.
| spoof_body="$(curl -fsS \ | |
| --cert "${MTLS_CLIENT_CERT}" \ | |
| --key "${MTLS_CLIENT_KEY}" \ | |
| --cacert "${MTLS_CLIENT_CA}" \ | |
| -H "Host: ${SERVER_HOST}" \ | |
| -H "content-type: application/json" \ | |
| -H "accept: application/json, text/event-stream" \ | |
| -H "Mcp-Protocol-Version: ${MCP_PROTOCOL_VERSION}" \ | |
| -H "X-MCP-Human-ID: spoofed-human" \ | |
| -H "X-MCP-Agent-ID: spoofed-agent" \ | |
| -H "X-MCP-Agent-Session: definitely-not-${MTLS_SESSION_NAME}" \ | |
| --data '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"aaa-ping","arguments":{}}}' \ | |
| "${MTLS_URL}" 2>/dev/null || true)" | |
| spoof_body="$(curl -sS \ | |
| --cert "${MTLS_CLIENT_CERT}" \ | |
| --key "${MTLS_CLIENT_KEY}" \ | |
| --cacert "${MTLS_CLIENT_CA}" \ | |
| -H "Host: ${SERVER_HOST}" \ | |
| -H "content-type: application/json" \ | |
| -H "accept: application/json, text/event-stream" \ | |
| -H "Mcp-Protocol-Version: ${MCP_PROTOCOL_VERSION}" \ | |
| -H "X-MCP-Human-ID: spoofed-human" \ | |
| -H "X-MCP-Agent-ID: spoofed-agent" \ | |
| -H "X-MCP-Agent-Session: definitely-not-${MTLS_SESSION_NAME}" \ | |
| --data '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"aaa-ping","arguments":{}}}' \ | |
| "${MTLS_URL}" 2>/dev/null || true)" |
| if ! kubectl get secret "${MTLS_SERVER_NAME}-gateway-mtls" -n mcp-servers >/dev/null 2>&1; then | ||
| echo "timed out waiting for gateway TLS secret ${MTLS_SERVER_NAME}-gateway-mtls" >&2 | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
The timeout validation only checks for the existence of the ${MTLS_SERVER_NAME}-gateway-mtls secret. If the ${MTLS_SERVER_NAME}-mtls-ca secret fails to be created, the loop will time out after 240 seconds, but the script will proceed past this block and fail later with a generic kubectl error. Checking both secrets here provides a clear and immediate error message on failure.
| if ! kubectl get secret "${MTLS_SERVER_NAME}-gateway-mtls" -n mcp-servers >/dev/null 2>&1; then | |
| echo "timed out waiting for gateway TLS secret ${MTLS_SERVER_NAME}-gateway-mtls" >&2 | |
| exit 1 | |
| fi | |
| if ! kubectl get secret "${MTLS_SERVER_NAME}-gateway-mtls" -n mcp-servers >/dev/null 2>&1 \ | |
| || ! kubectl get secret "${MTLS_SERVER_NAME}-mtls-ca" -n mcp-servers >/dev/null 2>&1; then | |
| echo "timed out waiting for gateway TLS secrets (${MTLS_SERVER_NAME}-gateway-mtls and/or ${MTLS_SERVER_NAME}-mtls-ca)" >&2 | |
| exit 1 | |
| fi |
Exercise test-mode workload PKI, Traefik websecure mTLS termination, adapter enroll, and session-bound client certificates without relying on governance headers. Co-authored-by: Cursor <cursoragent@cursor.com>
The terminate-and-re-encrypt mTLS model serves traffic through a Traefik IngressRoute and deletes the legacy passthrough IngressRouteTCP, but checkIngressReady still looked for the deleted IngressRouteTCP. That left every mTLS MCPServer stuck at PartiallyReady (ingressReady never true), which made the mTLS datapath e2e scenario time out waiting for phase Ready. Point the readiness check at the IngressRoute and cover both the ready and not-ready cases. Also address e2e review feedback in scenarios/mtls.sh: - drop curl -f on the spoofed-headers check so the gateway's 4xx/5xx body is captured and validated instead of silently skipped; - fail fast with a clear message if either the gateway-mtls or mtls-ca secret is missing after the wait loop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fb5ac6e to
724831c
Compare
TestCertReloaderWatchReloadsOnChange failed at exactly its 3s deadline when CI runs the service suites in parallel under a constrained CPU quota. Widen the deadline to 10s; the 20ms poll still returns as soon as the rotation is observed, so the happy path stays fast. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The mtls datapath drives traffic through Traefik's websecure entrypoint (TRAEFIK_TLS_PORT -> 8443); the plaintext web port-forward (18080) is never used by the scenario. Calling ensure_traefik_port_forward made the scenario time out waiting on localhost:18080 deep into the run for a port it doesn't need. Keep only the websecure port-forward. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
wait_for_policy_text hardcoded the default ${SERVER_NAME}-gateway-policy
ConfigMap, so the mtls scenario's waits for its grant and session never
observed mtls-mcp-server-gateway-policy and timed out. Add an optional
server argument (default SERVER_NAME) and pass MTLS_SERVER_NAME from the
mtls scenario.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
test/e2e/scenarios/mtls.shwith a dedicatedmtlsKind E2E scenario that verifies test-modemcp-runtime-ca, deploys anauth.mode: mtlsMCPServer, rejects unauthenticated initialize on Traefik websecure, and accepts session-bound client certs frommcp-runtime adapter enroll.kind.sh, PR path selection (select_pr_scenarios.sh), and scenario validation tests.Test plan
bash test/e2e/scenarios_test.shE2E_SCENARIOS=mtls bash test/e2e/kind.sh(requires test-mode cluster with workload PKI from setup)Notes
ClusterIssuer/mcp-runtime-caand operator/runtimeMCP_MTLS_CLUSTER_ISSUERwiring (merged via feat: mTLS/SPIFFE auth mode (Traefik-terminate + trusted header) #331 / follow-on setup PRs).E2E_SCENARIOS=mtlsor via PR path selection for operator mTLS, cert-manager, spiffe-identity plugin, and gateway changes.Made with Cursor