Skip to content

test(e2e): mTLS datapath scenario for Kind#339

Open
Agent-Hellboy wants to merge 5 commits into
mainfrom
test/mtls_e2e_datapath
Open

test(e2e): mTLS datapath scenario for Kind#339
Agent-Hellboy wants to merge 5 commits into
mainfrom
test/mtls_e2e_datapath

Conversation

@Agent-Hellboy

Copy link
Copy Markdown
Owner

Summary

  • Add test/e2e/scenarios/mtls.sh with a dedicated mtls Kind E2E scenario that verifies test-mode mcp-runtime-ca, deploys an auth.mode: mtls MCPServer, rejects unauthenticated initialize on Traefik websecure, and accepts session-bound client certs from mcp-runtime adapter enroll.
  • Wire the scenario into kind.sh, PR path selection (select_pr_scenarios.sh), and scenario validation tests.

Test plan

  • bash test/e2e/scenarios_test.sh
  • E2E_SCENARIOS=mtls bash test/e2e/kind.sh (requires test-mode cluster with workload PKI from setup)

Notes

  • Depends on test-mode auto-provisioning of ClusterIssuer/mcp-runtime-ca and operator/runtime MCP_MTLS_CLUSTER_ISSUER wiring (merged via feat: mTLS/SPIFFE auth mode (Traefik-terminate + trusted header) #331 / follow-on setup PRs).
  • Select with E2E_SCENARIOS=mtls or via PR path selection for operator mTLS, cert-manager, spiffe-identity plugin, and gateway changes.

Made with Cursor

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new end-to-end (E2E) test scenario for verifying the mTLS datapath in Kind environments, including the deployment of an mTLS-enabled MCPServer, certificate generation, and validation of client certificate authentication. The review feedback highlights two key improvements in the test script: first, removing the -f flag from curl during the spoofing check to prevent false positives when handling HTTP error responses, and second, ensuring both the gateway TLS and CA secrets are verified after the timeout loop to provide a clearer error message on failure.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread test/e2e/scenarios/mtls.sh Outdated
Comment on lines +218 to +230
spoof_body="$(curl -fsS \
--cert "${MTLS_CLIENT_CERT}" \
--key "${MTLS_CLIENT_KEY}" \
--cacert "${MTLS_CLIENT_CA}" \
-H "Host: ${SERVER_HOST}" \
-H "content-type: application/json" \
-H "accept: application/json, text/event-stream" \
-H "Mcp-Protocol-Version: ${MCP_PROTOCOL_VERSION}" \
-H "X-MCP-Human-ID: spoofed-human" \
-H "X-MCP-Agent-ID: spoofed-agent" \
-H "X-MCP-Agent-Session: definitely-not-${MTLS_SESSION_NAME}" \
--data '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"aaa-ping","arguments":{}}}' \
"${MTLS_URL}" 2>/dev/null || true)"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using curl -fsS in the spoofing check causes curl to exit with a non-zero status code (22) and suppress the response body when the gateway correctly returns a 4xx or 5xx error (such as 403 Forbidden). Because of || true, the spoof_body variable is assigned an empty string, which bypasses the if [[ -n "${spoof_body}" ]] block entirely. Consequently, the Python validation script is never executed on the expected error response, resulting in a false positive where any 4xx or 5xx response (including a 500 Internal Server Error) passes the test without verifying the actual error message.

Removing the -f flag from curl allows the response body to be captured and validated by the Python script even on HTTP error statuses.

Suggested change
spoof_body="$(curl -fsS \
--cert "${MTLS_CLIENT_CERT}" \
--key "${MTLS_CLIENT_KEY}" \
--cacert "${MTLS_CLIENT_CA}" \
-H "Host: ${SERVER_HOST}" \
-H "content-type: application/json" \
-H "accept: application/json, text/event-stream" \
-H "Mcp-Protocol-Version: ${MCP_PROTOCOL_VERSION}" \
-H "X-MCP-Human-ID: spoofed-human" \
-H "X-MCP-Agent-ID: spoofed-agent" \
-H "X-MCP-Agent-Session: definitely-not-${MTLS_SESSION_NAME}" \
--data '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"aaa-ping","arguments":{}}}' \
"${MTLS_URL}" 2>/dev/null || true)"
spoof_body="$(curl -sS \
--cert "${MTLS_CLIENT_CERT}" \
--key "${MTLS_CLIENT_KEY}" \
--cacert "${MTLS_CLIENT_CA}" \
-H "Host: ${SERVER_HOST}" \
-H "content-type: application/json" \
-H "accept: application/json, text/event-stream" \
-H "Mcp-Protocol-Version: ${MCP_PROTOCOL_VERSION}" \
-H "X-MCP-Human-ID: spoofed-human" \
-H "X-MCP-Agent-ID: spoofed-agent" \
-H "X-MCP-Agent-Session: definitely-not-${MTLS_SESSION_NAME}" \
--data '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"aaa-ping","arguments":{}}}' \
"${MTLS_URL}" 2>/dev/null || true)"

Comment thread test/e2e/scenarios/mtls.sh Outdated
Comment on lines +109 to +112
if ! kubectl get secret "${MTLS_SERVER_NAME}-gateway-mtls" -n mcp-servers >/dev/null 2>&1; then
echo "timed out waiting for gateway TLS secret ${MTLS_SERVER_NAME}-gateway-mtls" >&2
exit 1
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The timeout validation only checks for the existence of the ${MTLS_SERVER_NAME}-gateway-mtls secret. If the ${MTLS_SERVER_NAME}-mtls-ca secret fails to be created, the loop will time out after 240 seconds, but the script will proceed past this block and fail later with a generic kubectl error. Checking both secrets here provides a clear and immediate error message on failure.

Suggested change
if ! kubectl get secret "${MTLS_SERVER_NAME}-gateway-mtls" -n mcp-servers >/dev/null 2>&1; then
echo "timed out waiting for gateway TLS secret ${MTLS_SERVER_NAME}-gateway-mtls" >&2
exit 1
fi
if ! kubectl get secret "${MTLS_SERVER_NAME}-gateway-mtls" -n mcp-servers >/dev/null 2>&1 \
|| ! kubectl get secret "${MTLS_SERVER_NAME}-mtls-ca" -n mcp-servers >/dev/null 2>&1; then
echo "timed out waiting for gateway TLS secrets (${MTLS_SERVER_NAME}-gateway-mtls and/or ${MTLS_SERVER_NAME}-mtls-ca)" >&2
exit 1
fi

Agent-Hellboy and others added 2 commits June 30, 2026 15:28
Exercise test-mode workload PKI, Traefik websecure mTLS termination, adapter
enroll, and session-bound client certificates without relying on governance headers.

Co-authored-by: Cursor <cursoragent@cursor.com>
The terminate-and-re-encrypt mTLS model serves traffic through a Traefik
IngressRoute and deletes the legacy passthrough IngressRouteTCP, but
checkIngressReady still looked for the deleted IngressRouteTCP. That left
every mTLS MCPServer stuck at PartiallyReady (ingressReady never true),
which made the mTLS datapath e2e scenario time out waiting for phase
Ready. Point the readiness check at the IngressRoute and cover both the
ready and not-ready cases.

Also address e2e review feedback in scenarios/mtls.sh:
- drop curl -f on the spoofed-headers check so the gateway's 4xx/5xx
  body is captured and validated instead of silently skipped;
- fail fast with a clear message if either the gateway-mtls or mtls-ca
  secret is missing after the wait loop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Agent-Hellboy Agent-Hellboy force-pushed the test/mtls_e2e_datapath branch from fb5ac6e to 724831c Compare June 30, 2026 10:08
Agent-Hellboy and others added 3 commits June 30, 2026 15:55
TestCertReloaderWatchReloadsOnChange failed at exactly its 3s deadline
when CI runs the service suites in parallel under a constrained CPU
quota. Widen the deadline to 10s; the 20ms poll still returns as soon as
the rotation is observed, so the happy path stays fast.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The mtls datapath drives traffic through Traefik's websecure entrypoint
(TRAEFIK_TLS_PORT -> 8443); the plaintext web port-forward (18080) is
never used by the scenario. Calling ensure_traefik_port_forward made the
scenario time out waiting on localhost:18080 deep into the run for a
port it doesn't need. Keep only the websecure port-forward.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
wait_for_policy_text hardcoded the default ${SERVER_NAME}-gateway-policy
ConfigMap, so the mtls scenario's waits for its grant and session never
observed mtls-mcp-server-gateway-policy and timed out. Add an optional
server argument (default SERVER_NAME) and pass MTLS_SERVER_NAME from the
mtls scenario.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant