LCORE-1253: Add e2e proxy and TLS networking tests#1410
Conversation
Add comprehensive end-to-end tests verifying that Llama Stack's NetworkConfig (proxy, TLS) works correctly through the Lightspeed Stack pipeline. Test infrastructure: - TunnelProxy: Async HTTP CONNECT tunnel proxy that creates TCP tunnels for HTTPS traffic. Tracks CONNECT count and target hosts. - InterceptionProxy: Async TLS-intercepting (MITM) proxy using trustme CA to generate per-target server certificates. Simulates corporate SSL inspection proxies. Behave scenarios (tests/e2e/features/proxy.feature): - Tunnel proxy: Configures run.yaml with NetworkConfig proxy pointing to a local tunnel proxy. Verifies CONNECT to api.openai.com:443 is observed and the LLM query succeeds through the proxy. - Interception proxy: Configures run.yaml with proxy and custom CA cert (trustme). Verifies TLS interception of api.openai.com traffic and successful LLM query through the MITM proxy. - TLS version: Configures run.yaml with min_version TLSv1.2 and verifies the LLM query succeeds with the TLS constraint. Each scenario dynamically generates a modified run-ci.yaml with the appropriate NetworkConfig, restarts Llama Stack with the new config, restarts the Lightspeed Stack, and sends a query to verify the full pipeline. Added trustme>=1.2.1 to dev dependencies.
WalkthroughAdds end-to-end proxy and TLS e2e tests and supporting test utilities: new Behave feature and steps, tunnel and interception proxy implementations, run.yaml mutation and restart helpers, docs tags, and a dev dependency for trustme. Changes
Sequence DiagramssequenceDiagram
actor Test
participant Lightspeed
participant "Llama Stack"
participant TunnelProxy
participant "LLM Endpoint"
Test->>Test: Start TunnelProxy (port 8888)
Test->>Test: Modify run.yaml to use proxy
Test->>Test: Restart Llama Stack & Lightspeed
Test->>Lightspeed: POST /v1/query
Lightspeed->>"Llama Stack": Forward inference request
"Llama Stack"->>TunnelProxy: HTTP CONNECT target:443
TunnelProxy-->>TunnelProxy: Establish upstream TCP connection
TunnelProxy-->>"Llama Stack": HTTP/1.1 200 Connection Established
"Llama Stack"->>"LLM Endpoint": TLS handshake + request
"LLM Endpoint"-->>"Llama Stack": Response
"Llama Stack"-->>Lightspeed: Inference response
Lightspeed-->>Test: Query result
Test->>TunnelProxy: Verify CONNECT records present
sequenceDiagram
actor Test
participant Lightspeed
participant "Llama Stack"
participant InterceptionProxy
participant "LLM Endpoint"
Test->>Test: Generate trustme CA
Test->>Test: Start InterceptionProxy (port 8889) with CA
Test->>Test: Export CA and update run.yaml
Test->>Test: Restart Llama Stack & Lightspeed
Test->>Lightspeed: POST /v1/query
Lightspeed->>"Llama Stack": Forward inference request
"Llama Stack"->>InterceptionProxy: HTTP CONNECT target:443
InterceptionProxy->>InterceptionProxy: Generate per-target cert from CA
InterceptionProxy-->>"Llama Stack": HTTP/1.1 200 Connection Established
"Llama Stack"->>InterceptionProxy: TLS handshake (proxy presents cert)
InterceptionProxy-->>"LLM Endpoint": Establish upstream TLS connection
InterceptionProxy->>InterceptionProxy: Relay bidirectional TLS streams
"LLM Endpoint"-->>InterceptionProxy: Response
InterceptionProxy-->>"Llama Stack": Relay response
"Llama Stack"-->>Lightspeed: Inference response
Lightspeed-->>Test: Query result
Test->>InterceptionProxy: Verify interception records
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (10)
tests/e2e/features/proxy.feature (1)
18-18: Avoid fixed proxy ports to reduce CI flakiness.Hard-coded ports are prone to collisions. Prefer allocating free ports in steps and reusing them from
contextin subsequent setup steps.Also applies to: 41-41, 50-50
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/proxy.feature` at line 18, Replace the hard-coded proxy port in the Gherkin steps that read "Given A tunnel proxy is running on port 8888" (and the other similar steps) with a dynamic free-port allocation: add a setup step that finds/allocates an available port (store it on the scenario context, e.g. context.proxyPort) and change the step to start the tunnel proxy using that context value; update any later steps that reference the fixed port to read from the same context key so the allocated port is reused across setup and assertions.docs/e2e_testing.md (1)
193-197: Consider linking@Proxytag behavior to the exact hook section.Tiny docs improvement: add a brief cross-reference here to the teardown behavior in
after_scenarioso readers can find restoration flow faster.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/e2e_testing.md` around lines 193 - 197, Add a short cross-reference from the `@Proxy` row to the teardown/restore behaviour described in the `after_scenario` hook: update the `@Proxy` description to append a parenthetical or sentence like "see teardown in `after_scenario` for restoration flow" and link or point readers to the `after_scenario` hook section so they can quickly find how original services are restored after the test; ensure the reference uses the exact symbol `after_scenario` so it’s easy to locate.tests/e2e/proxy/interception_proxy.py (4)
148-149: Missingawait writer.wait_closed()afterwriter.close()may cause resource leaks.In Python 3.7+,
StreamWriter.close()is not a coroutine, but the underlying transport may not be fully closed untilwait_closed()is awaited. Without it, the connection may linger and cause resource leaks in long-running tests.Proposed fix for lines 148-149
except (OSError, ConnectionRefusedError, ssl.SSLError) as e: logger.warning("Failed to connect to %s: %s", target, e) tls_writer.close() + await tls_writer.wait_closed() returnProposed fix for lines 160-161
remote_writer.close() + await remote_writer.wait_closed() tls_writer.close() + await tls_writer.wait_closed()Proposed fix for line 171
finally: writer.close() + await writer.wait_closed()Also applies to: 160-161, 171-171
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/proxy/interception_proxy.py` around lines 148 - 149, The StreamWriter.close() calls (e.g., tls_writer.close() and other writer.close() occurrences in the same function) must be followed by awaiting the stream to finish closing to avoid resource leaks; after each writer.close() (for example where tls_writer.close() is called and the other two close calls identified in this block) add await writer.wait_closed() (or await tls_writer.wait_closed() as appropriate) so the underlying transport is fully closed before returning/exiting.
69-73:configure_trustis unnecessary for server-side SSL context.
configure_trust(ctx)configures which CA certificates the context trusts for verifying peer certificates. For a server-side context that doesn't verify clients, this call is a no-op. It doesn't cause harm but adds confusion about intent.Proposed fix
server_cert = self.ca.issue_cert(hostname) ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER) server_cert.configure_cert(ctx) - self.ca.configure_trust(ctx) return ctx🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/proxy/interception_proxy.py` around lines 69 - 73, The server SSL context creation currently calls self.ca.configure_trust(ctx) after server_cert.configure_cert(ctx); remove the unnecessary call to configure_trust(ctx) from the server-side context creation (keep server_cert.configure_cert(ctx) and returning ctx) since configure_trust only affects peer-trust settings used when verifying clients and is a no-op for a non-client-verifying server context.
96-96: Use an explicit exception instead ofassertfor runtime error handling.
assertstatements can be disabled withpython -O, which would silently skip this check. For a test utility that needs reliable error reporting, raise an explicit exception.Proposed fix
- assert new_transport is not None, "TLS handshake failed" + if new_transport is None: + raise RuntimeError("TLS handshake failed: start_tls returned None")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/proxy/interception_proxy.py` at line 96, The runtime check currently uses an assert on new_transport which can be skipped with python -O; replace the assert with an explicit exception raise (e.g., raise RuntimeError or a custom exception) when new_transport is None to ensure the TLS handshake failure is always reported; locate the check that reads "assert new_transport is not None, 'TLS handshake failed'" and change it to raise an exception with the same descriptive message so the test utility reliably fails on handshake errors.
91-91: Useasyncio.get_running_loop()instead of deprecatedget_event_loop().
asyncio.get_event_loop()is deprecated since Python 3.10 and will emit a DeprecationWarning when called from a coroutine. Useasyncio.get_running_loop()which is designed to be called from within a running event loop.Proposed fix
- loop = asyncio.get_event_loop() + loop = asyncio.get_running_loop()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/proxy/interception_proxy.py` at line 91, Replace the deprecated call "loop = asyncio.get_event_loop()" with "loop = asyncio.get_running_loop()" in the interception_proxy module; update the assignment where "loop = asyncio.get_event_loop()" appears so it uses asyncio.get_running_loop() (ensuring this code runs inside a coroutine or running loop), and remove or adjust any code paths that expect creating a new loop from that variable if needed.tests/e2e/features/steps/proxy.py (4)
154-154: Race condition: fixed sleep may not be sufficient for proxy readiness.
time.sleep(1)assumes the proxy is ready within 1 second, but under load or on slower systems this might not hold. Consider polling the proxy's readiness (e.g., attempting a connection) similar to the service restart logic.# Example polling approach for proxy readiness import socket def _wait_for_port(host: str, port: int, timeout: float = 5.0) -> None: """Wait until a port is accepting connections.""" deadline = time.time() + timeout while time.time() < deadline: try: with socket.create_connection((host, port), timeout=0.5): return except OSError: time.sleep(0.1) raise TimeoutError(f"Port {port} not ready within {timeout}s")Also applies to: 222-222
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/steps/proxy.py` at line 154, Replace the fixed sleep that waits for the proxy (the time.sleep(1) call) with a polling readiness helper that attempts TCP connections until success or timeout (create a helper like _wait_for_port(host, port, timeout) and call it where time.sleep(1) is used); update the proxy startup logic and the other occurrence noted (the second time.sleep at the other location) to call this helper with the proxy host/port and a sensible timeout so the test waits for an actual open port rather than a fixed delay.
143-154: Event loops created for proxy threads are never closed, causing resource leaks.The event loops created at lines 143 and 210 run forever in daemon threads but are never explicitly stopped or closed. While daemon threads are terminated when the main process exits, proper cleanup would stop the loop and close it in the
after_scenariohook. This also prevents potential issues if tests run in a long-lived process.Consider storing the loop reference and adding cleanup:
# In after_scenario or a cleanup step: if hasattr(context, 'proxy_loop'): context.proxy_loop.call_soon_threadsafe(context.proxy_loop.stop) # Allow thread to exit gracefully time.sleep(0.5)Also applies to: 210-222
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/steps/proxy.py` around lines 143 - 154, The new event loop created in run_proxy (asyncio.new_event_loop assigned to context.proxy_loop and started in a daemon thread) is never stopped or closed; update the teardown (after_scenario) to cleanly stop and close that loop and join the thread: call context.proxy_loop.call_soon_threadsafe(context.proxy_loop.stop), wait briefly for the thread to exit and join the thread (store the Thread as e.g. context.proxy_thread when starting), then call context.proxy_loop.close(); apply the same pattern for the second loop started at the other location (lines 210-222) so both context.proxy_loop/context.proxy_thread instances are stopped, joined and closed in after_scenario.
64-64: Usetempfilemodule instead of hardcoded/tmppath for cross-platform compatibility.Hardcoded
/tmppaths don't work on Windows and may cause issues in environments with non-standard temp directories. Usetempfile.gettempdir()ortempfile.NamedTemporaryFileconsistently.Proposed fix
- with open("/tmp/llama-stack-proxy-test.log", "w") as log_file: + log_path = os.path.join(tempfile.gettempdir(), "llama-stack-proxy-test.log") + with open(log_path, "w") as log_file:- with open("/tmp/lightspeed-stack-proxy-test.log", "w") as log_file: + log_path = os.path.join(tempfile.gettempdir(), "lightspeed-stack-proxy-test.log") + with open(log_path, "w") as log_file:Also applies to: 103-103
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/steps/proxy.py` at line 64, Replace the hardcoded "/tmp/llama-stack-proxy-test.log" open call with a cross-platform tempfile-based solution: use tempfile.NamedTemporaryFile(delete=False) to create a stable temp file or build the path with tempfile.gettempdir() + os.path.join(...) and open that path; update both occurrences (the open("/tmp/llama-stack-proxy-test.log", "w") as log_file: call and the same usage at the second occurrence) and ensure the rest of the code uses the returned filename/handle consistently (refer to the open call and any variable that holds log_file or the temp path).
54-58:pkill -fpattern may match unintended processes.The pattern
"llama stack run"could match any process with these words in its command line, including editors or terminals displaying the command. Consider using a more specific pattern or PID file.- subprocess.run( - ["pkill", "-f", "llama stack run"], - capture_output=True, - check=False, - ) + # More specific pattern to avoid killing unrelated processes + subprocess.run( + ["pkill", "-f", "llama stack run.*--port"], + capture_output=True, + check=False, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/steps/proxy.py` around lines 54 - 58, The pkill invocation using the broad pattern "llama stack run" in subprocess.run risks killing unrelated processes; change the shutdown logic to target the exact process you started instead—e.g., when launching the proxy store the subprocess.Popen object or write its PID to a PID file, then use that PID to send SIGTERM/SIGKILL (via proc.terminate()/proc.kill() or os.kill(pid, ...)) instead of subprocess.run(["pkill", "-f", ...]); update references around the subprocess.run call and the process startup code so termination uses the saved Popen or PID rather than a text-pattern pkill.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/features/environment.py`:
- Around line 293-296: The except block that catches Exception during service
restoration prints a warning but then unconditionally sets
context.services_modified = False, which clears the modified flag even when
restoration failed; change the handler so that context.services_modified is only
cleared when restoration succeeds (i.e., move the assignment into the try block
or set it after successful restore), and leave context.services_modified True on
exception so subsequent scenarios can detect unresolved modifications; update
the code paths around the try/except that reference context.services_modified to
ensure it reflects actual restore success.
In `@tests/e2e/features/steps/proxy.py`:
- Around line 73-84: The health check in the loop is calling the wrong path
(/v1/health); update the request URL in the loop that builds
f"http://{llama_host}:{llama_port}/v1/health" to use the actual Llama Stack
liveness endpoint f"http://{llama_host}:{llama_port}/liveness" (or optionally
/readiness) so the poll in the function in tests/e2e/features/steps/proxy.py
succeeds against the registered health routes.
In `@tests/e2e/proxy/tunnel_proxy.py`:
- Around line 70-73: The parsing of the CONNECT target (variables target,
target_host, target_port_str, target_port) can raise ValueError when converting
target_port_str to int (e.g., "host:abc"); wrap the int(target_port_str)
conversion in a try/except ValueError block and handle it by sending a proper
HTTP error response (e.g., 400 Bad Request) back to the client and closing the
connection cleanly instead of letting the outer handler close without a
response; update the logic around the target.rsplit(...) branch to validate the
port and return the HTTP error on malformed ports.
- Around line 85-87: The upstream connect using asyncio.open_connection (which
returns remote_reader, remote_writer) can hang; wrap the call in
asyncio.wait_for with a sensible timeout (e.g., CONNECT_TIMEOUT) and catch
asyncio.TimeoutError alongside existing exceptions in the exception handling
around the connect block in tunnel_proxy.py so timeouts close the connection and
return/cleanup instead of blocking the E2E test; update the code that assigns
remote_reader, remote_writer to call await
asyncio.wait_for(asyncio.open_connection(target_host, target_port),
CONNECT_TIMEOUT) and add handling for asyncio.TimeoutError to the existing
except logic.
---
Nitpick comments:
In `@docs/e2e_testing.md`:
- Around line 193-197: Add a short cross-reference from the `@Proxy` row to the
teardown/restore behaviour described in the `after_scenario` hook: update the
`@Proxy` description to append a parenthetical or sentence like "see teardown in
`after_scenario` for restoration flow" and link or point readers to the
`after_scenario` hook section so they can quickly find how original services are
restored after the test; ensure the reference uses the exact symbol
`after_scenario` so it’s easy to locate.
In `@tests/e2e/features/proxy.feature`:
- Line 18: Replace the hard-coded proxy port in the Gherkin steps that read
"Given A tunnel proxy is running on port 8888" (and the other similar steps)
with a dynamic free-port allocation: add a setup step that finds/allocates an
available port (store it on the scenario context, e.g. context.proxyPort) and
change the step to start the tunnel proxy using that context value; update any
later steps that reference the fixed port to read from the same context key so
the allocated port is reused across setup and assertions.
In `@tests/e2e/features/steps/proxy.py`:
- Line 154: Replace the fixed sleep that waits for the proxy (the time.sleep(1)
call) with a polling readiness helper that attempts TCP connections until
success or timeout (create a helper like _wait_for_port(host, port, timeout) and
call it where time.sleep(1) is used); update the proxy startup logic and the
other occurrence noted (the second time.sleep at the other location) to call
this helper with the proxy host/port and a sensible timeout so the test waits
for an actual open port rather than a fixed delay.
- Around line 143-154: The new event loop created in run_proxy
(asyncio.new_event_loop assigned to context.proxy_loop and started in a daemon
thread) is never stopped or closed; update the teardown (after_scenario) to
cleanly stop and close that loop and join the thread: call
context.proxy_loop.call_soon_threadsafe(context.proxy_loop.stop), wait briefly
for the thread to exit and join the thread (store the Thread as e.g.
context.proxy_thread when starting), then call context.proxy_loop.close(); apply
the same pattern for the second loop started at the other location (lines
210-222) so both context.proxy_loop/context.proxy_thread instances are stopped,
joined and closed in after_scenario.
- Line 64: Replace the hardcoded "/tmp/llama-stack-proxy-test.log" open call
with a cross-platform tempfile-based solution: use
tempfile.NamedTemporaryFile(delete=False) to create a stable temp file or build
the path with tempfile.gettempdir() + os.path.join(...) and open that path;
update both occurrences (the open("/tmp/llama-stack-proxy-test.log", "w") as
log_file: call and the same usage at the second occurrence) and ensure the rest
of the code uses the returned filename/handle consistently (refer to the open
call and any variable that holds log_file or the temp path).
- Around line 54-58: The pkill invocation using the broad pattern "llama stack
run" in subprocess.run risks killing unrelated processes; change the shutdown
logic to target the exact process you started instead—e.g., when launching the
proxy store the subprocess.Popen object or write its PID to a PID file, then use
that PID to send SIGTERM/SIGKILL (via proc.terminate()/proc.kill() or
os.kill(pid, ...)) instead of subprocess.run(["pkill", "-f", ...]); update
references around the subprocess.run call and the process startup code so
termination uses the saved Popen or PID rather than a text-pattern pkill.
In `@tests/e2e/proxy/interception_proxy.py`:
- Around line 148-149: The StreamWriter.close() calls (e.g., tls_writer.close()
and other writer.close() occurrences in the same function) must be followed by
awaiting the stream to finish closing to avoid resource leaks; after each
writer.close() (for example where tls_writer.close() is called and the other two
close calls identified in this block) add await writer.wait_closed() (or await
tls_writer.wait_closed() as appropriate) so the underlying transport is fully
closed before returning/exiting.
- Around line 69-73: The server SSL context creation currently calls
self.ca.configure_trust(ctx) after server_cert.configure_cert(ctx); remove the
unnecessary call to configure_trust(ctx) from the server-side context creation
(keep server_cert.configure_cert(ctx) and returning ctx) since configure_trust
only affects peer-trust settings used when verifying clients and is a no-op for
a non-client-verifying server context.
- Line 96: The runtime check currently uses an assert on new_transport which can
be skipped with python -O; replace the assert with an explicit exception raise
(e.g., raise RuntimeError or a custom exception) when new_transport is None to
ensure the TLS handshake failure is always reported; locate the check that reads
"assert new_transport is not None, 'TLS handshake failed'" and change it to
raise an exception with the same descriptive message so the test utility
reliably fails on handshake errors.
- Line 91: Replace the deprecated call "loop = asyncio.get_event_loop()" with
"loop = asyncio.get_running_loop()" in the interception_proxy module; update the
assignment where "loop = asyncio.get_event_loop()" appears so it uses
asyncio.get_running_loop() (ensuring this code runs inside a coroutine or
running loop), and remove or adjust any code paths that expect creating a new
loop from that variable if needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bfa48a5c-e3a7-4506-bb79-29a2f66ec1b1
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
docs/e2e_scenarios.mddocs/e2e_testing.mdpyproject.tomltests/e2e/features/environment.pytests/e2e/features/proxy.featuretests/e2e/features/steps/proxy.pytests/e2e/proxy/__init__.pytests/e2e/proxy/interception_proxy.pytests/e2e/proxy/tunnel_proxy.pytests/e2e/test_list.txt
| for _ in range(45): | ||
| try: | ||
| resp = requests.get( | ||
| f"http://{llama_host}:{llama_port}/v1/health", timeout=2 | ||
| ) | ||
| if resp.status_code == 200: | ||
| return | ||
| except requests.ConnectionError: | ||
| pass | ||
| time.sleep(2) | ||
|
|
||
| raise TimeoutError("Llama Stack did not start within 90 seconds") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Find all health-related endpoints in Llama Stack
# Search for route definitions containing health-related paths
rg -n --type py '@router\.(get|post).*health|/liveness|/readiness|/v1/health' src/Repository: lightspeed-core/lightspeed-stack
Length of output: 692
🏁 Script executed:
# Get details about the health endpoints in health.py
head -n 160 src/app/endpoints/health.py | tail -n 80Repository: lightspeed-core/lightspeed-stack
Length of output: 2721
🏁 Script executed:
# Check for any middleware or FastAPI configuration that might add /v1 prefix
rg -n --type py 'app\.include_router|router.*prefix|/v1' src/app/ | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 1640
🏁 Script executed:
# Check which router module is imported as health or root in routers.py
rg -n --type py 'from.*health import|import.*health' src/app/routers.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
# Check what's in the root router
head -n 50 src/app/endpoints/root.py | grep -E '@router|from|import' 2>/dev/null || rg -n --type py '@router|health' src/app/endpoints/root.py | head -10Repository: lightspeed-core/lightspeed-stack
Length of output: 470
🏁 Script executed:
# View the imports section of routers.py to find where health router comes from
head -n 70 src/app/routers.py | grep -E 'import|from'Repository: lightspeed-core/lightspeed-stack
Length of output: 131
🏁 Script executed:
# Also check if health endpoints are in a different file/router
rg -n --type py 'router.*health|health.*router' src/app/Repository: lightspeed-core/lightspeed-stack
Length of output: 357
Health endpoint path is incorrect — Llama Stack exposes /liveness, not /v1/health.
The health router is registered at the root level without a /v1 prefix (src/app/routers.py:69). Available endpoints are /liveness and /readiness (src/app/endpoints/health.py). The code polls a non-existent /v1/health endpoint, causing all health checks to fail with 404 responses and triggering a 90-second timeout on every test run.
Fix
for _ in range(45):
try:
resp = requests.get(
- f"http://{llama_host}:{llama_port}/v1/health", timeout=2
+ f"http://{llama_host}:{llama_port}/liveness", timeout=2
)
if resp.status_code == 200:
return🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/features/steps/proxy.py` around lines 73 - 84, The health check in
the loop is calling the wrong path (/v1/health); update the request URL in the
loop that builds f"http://{llama_host}:{llama_port}/v1/health" to use the actual
Llama Stack liveness endpoint f"http://{llama_host}:{llama_port}/liveness" (or
optionally /readiness) so the poll in the function in
tests/e2e/features/steps/proxy.py succeeds against the registered health routes.
Expand proxy e2e test coverage to fully address all acceptance criteria: AC1 (tunnel proxy): - Add negative test: LLM query fails gracefully when proxy is unreachable AC2 (interception proxy with CA): - Add negative test: LLM query fails when interception proxy CA is not provided (verifies "only successful when correct CA is provided") AC3 (TLS version and ciphers): - Add TLSv1.3 minimum version scenario - Add custom cipher suite configuration scenario (ECDHE+AESGCM:DHE+AESGCM) Test infrastructure: - Add after_scenario cleanup hook in environment.py that restores original Llama Stack and Lightspeed Stack configs after @Proxy scenarios. Prevents config leaks between scenarios. - Use different ports for each interception proxy instance to avoid address-already-in-use errors in sequential scenarios. Documentation: - Update docs/e2e_scenarios.md with all 7 proxy test scenarios. - Update docs/e2e_testing.md with proxy-related Behave tags (@Proxy, @tunnelproxy, @InterceptionProxy, @TLSVersion, @tlscipher).
931b654 to
edcd81a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
tests/e2e/features/steps/proxy.py (1)
148-157:⚠️ Potential issue | 🔴 CriticalHealth endpoint path is incorrect — use
/livenessinstead of/v1/health.The health router is registered at the root level without a
/v1prefix. This will cause all health checks to fail with 404 responses. Additionally, the loop silently continues without raising an error if the service never becomes healthy, which could lead to flaky tests.Proposed fix
for _ in range(45): try: resp = requests.get( - f"http://{llama_host}:{llama_port}/v1/health", timeout=2 + f"http://{llama_host}:{llama_port}/liveness", timeout=2 ) if resp.status_code == 200: - break + return except requests.ConnectionError: pass time.sleep(2) + raise TimeoutError(f"Llama Stack did not become healthy within 90 seconds")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/steps/proxy.py` around lines 148 - 157, The loop polling the service uses the wrong path and silently gives up: replace the requests.get URL path f"http://{llama_host}:{llama_port}/v1/health" with f"http://{llama_host}:{llama_port}/liveness" (the health router is mounted at /liveness) and add explicit failure handling after the retry loop (e.g., track success with a flag or inspect final resp and raise a RuntimeError or AssertionError if the service never returned 200) so tests fail fast instead of continuing silently; change the polling block around the requests.get call that references llama_host, llama_port and requests.get to implement this.
🧹 Nitpick comments (1)
tests/e2e/features/steps/proxy.py (1)
204-221: Proxy startup uses fixed sleep which may be fragile.The 1-second sleep (line 221) assumes the proxy is ready. Consider polling the proxy port for readiness, though for test reliability with daemon threads this is often acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/steps/proxy.py` around lines 204 - 221, The test uses a fixed time.sleep(1) after starting the daemon thread in start_tunnel_proxy which is fragile; replace that sleep with a readiness poll that attempts to connect to the proxy port until success or a short timeout. After thread.start() in start_tunnel_proxy, repeatedly attempt a socket connection (to host "127.0.0.1" or context.tunnel_proxy.host and the given port) with small sleeps between tries and fail the test if the timeout elapses; keep using context.proxy_loop and context.tunnel_proxy (and proxy.start()/run_proxy) so the change only replaces the fixed sleep with a connect-retry loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/features/steps/proxy.py`:
- Around line 190-198: The helper _restore_original_services silently swallows
all exceptions from _restart_services(), which can hide failures restoring
_LLAMA_STACK_CONFIG_BACKUP to _LLAMA_STACK_CONFIG; update
_restore_original_services to catch exceptions from _restart_services() but log
the exception details (including exception text and context) using the
test/logging framework and either re-raise or set/return a failure flag so
callers can detect the cleanup failure; ensure the code still removes/renames
_LLAMA_STACK_CONFIG_BACKUP only when safe and reference the symbols
_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG, and _restart_services in your
change.
- Around line 360-395: Both configure_llama_tls_version and
configure_llama_ciphers currently overwrite provider["config"]["network"];
instead load the existing network dict (e.g., network =
provider.setdefault("config", {}).get("network", {}) or
provider["config"].setdefault("network", {})), merge/update its "tls" sub-dict
with the new keys (preserving existing keys like "min_version" and "ciphers"),
then assign back to provider["config"]["network"] before calling _write_config;
update both functions (which use _find_openai_provider and _write_config) to
perform an in-place merge of tls settings rather than replacing the whole
network dict.
---
Duplicate comments:
In `@tests/e2e/features/steps/proxy.py`:
- Around line 148-157: The loop polling the service uses the wrong path and
silently gives up: replace the requests.get URL path
f"http://{llama_host}:{llama_port}/v1/health" with
f"http://{llama_host}:{llama_port}/liveness" (the health router is mounted at
/liveness) and add explicit failure handling after the retry loop (e.g., track
success with a flag or inspect final resp and raise a RuntimeError or
AssertionError if the service never returned 200) so tests fail fast instead of
continuing silently; change the polling block around the requests.get call that
references llama_host, llama_port and requests.get to implement this.
---
Nitpick comments:
In `@tests/e2e/features/steps/proxy.py`:
- Around line 204-221: The test uses a fixed time.sleep(1) after starting the
daemon thread in start_tunnel_proxy which is fragile; replace that sleep with a
readiness poll that attempts to connect to the proxy port until success or a
short timeout. After thread.start() in start_tunnel_proxy, repeatedly attempt a
socket connection (to host "127.0.0.1" or context.tunnel_proxy.host and the
given port) with small sleeps between tries and fail the test if the timeout
elapses; keep using context.proxy_loop and context.tunnel_proxy (and
proxy.start()/run_proxy) so the change only replaces the fixed sleep with a
connect-retry loop.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 56b6b0a8-6270-486b-b393-193c6af3e455
📒 Files selected for processing (5)
docs/e2e_scenarios.mddocs/e2e_testing.mdtests/e2e/features/environment.pytests/e2e/features/proxy.featuretests/e2e/features/steps/proxy.py
✅ Files skipped from review due to trivial changes (3)
- docs/e2e_scenarios.md
- docs/e2e_testing.md
- tests/e2e/features/proxy.feature
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/e2e/features/environment.py
| def _restore_original_services() -> None: | ||
| """Restore original run.yaml and restart services.""" | ||
| if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP): | ||
| shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG) | ||
| os.remove(_LLAMA_STACK_CONFIG_BACKUP) | ||
| try: | ||
| _restart_services() | ||
| except Exception: # pylint: disable=broad-exception-caught | ||
| pass |
There was a problem hiding this comment.
Exception swallowing hides restoration failures.
Catching and silently ignoring all exceptions during _restart_services() could mask critical cleanup failures. Consider logging the exception at minimum, or re-raising after setting a flag.
Proposed fix
def _restore_original_services() -> None:
"""Restore original run.yaml and restart services."""
if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP):
shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG)
os.remove(_LLAMA_STACK_CONFIG_BACKUP)
try:
_restart_services()
except Exception: # pylint: disable=broad-exception-caught
- pass
+ import logging
+ logging.getLogger(__name__).warning(
+ "Service restoration failed during cleanup", exc_info=True
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _restore_original_services() -> None: | |
| """Restore original run.yaml and restart services.""" | |
| if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP): | |
| shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG) | |
| os.remove(_LLAMA_STACK_CONFIG_BACKUP) | |
| try: | |
| _restart_services() | |
| except Exception: # pylint: disable=broad-exception-caught | |
| pass | |
| def _restore_original_services() -> None: | |
| """Restore original run.yaml and restart services.""" | |
| if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP): | |
| shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG) | |
| os.remove(_LLAMA_STACK_CONFIG_BACKUP) | |
| try: | |
| _restart_services() | |
| except Exception: # pylint: disable=broad-exception-caught | |
| import logging | |
| logging.getLogger(__name__).warning( | |
| "Service restoration failed during cleanup", exc_info=True | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/features/steps/proxy.py` around lines 190 - 198, The helper
_restore_original_services silently swallows all exceptions from
_restart_services(), which can hide failures restoring
_LLAMA_STACK_CONFIG_BACKUP to _LLAMA_STACK_CONFIG; update
_restore_original_services to catch exceptions from _restart_services() but log
the exception details (including exception text and context) using the
test/logging framework and either re-raise or set/return a failure flag so
callers can detect the cleanup failure; ensure the code still removes/renames
_LLAMA_STACK_CONFIG_BACKUP only when safe and reference the symbols
_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG, and _restart_services in your
change.
There was a problem hiding this comment.
probably CodeRabbitAI is right - let's log
There was a problem hiding this comment.
@tisnik, agreed! Here's the planned fix:
- Move
loggingimport to the top of the file with the other stdlib imports - Replace the bare
passin theexceptblock with alogging.warning(...)call that logs the exception details
Failed to handle agent chat message. Please try again.
| @given('Llama Stack is configured with minimum TLS version "{version}"') | ||
| def configure_llama_tls_version(context: Context, version: str) -> None: | ||
| """Modify run.yaml with TLS version config.""" | ||
| _backup_llama_config() | ||
| config = _load_llama_config() | ||
| provider = _find_openai_provider(config) | ||
|
|
||
| if "config" not in provider: | ||
| provider["config"] = {} | ||
| provider["config"]["network"] = { | ||
| "tls": { | ||
| "min_version": version, | ||
| } | ||
| } | ||
|
|
||
| _write_config(config, _LLAMA_STACK_CONFIG) | ||
| context.llama_config_modified = True | ||
|
|
||
|
|
||
| @given('Llama Stack is configured with ciphers "{ciphers}"') | ||
| def configure_llama_ciphers(context: Context, ciphers: str) -> None: | ||
| """Modify run.yaml with cipher suite config.""" | ||
| _backup_llama_config() | ||
| config = _load_llama_config() | ||
| provider = _find_openai_provider(config) | ||
|
|
||
| if "config" not in provider: | ||
| provider["config"] = {} | ||
| provider["config"]["network"] = { | ||
| "tls": { | ||
| "ciphers": ciphers.split(":"), | ||
| } | ||
| } | ||
|
|
||
| _write_config(config, _LLAMA_STACK_CONFIG) | ||
| context.llama_config_modified = True |
There was a problem hiding this comment.
TLS configuration steps overwrite rather than merge network settings.
Both configure_llama_tls_version and configure_llama_ciphers completely replace provider["config"]["network"]. If a scenario requires both min_version and ciphers, the second step will overwrite the first. Consider merging with existing network config.
Proposed fix for merging
`@given`('Llama Stack is configured with minimum TLS version "{version}"')
def configure_llama_tls_version(context: Context, version: str) -> None:
"""Modify run.yaml with TLS version config."""
_backup_llama_config()
config = _load_llama_config()
provider = _find_openai_provider(config)
if "config" not in provider:
provider["config"] = {}
- provider["config"]["network"] = {
- "tls": {
- "min_version": version,
- }
- }
+ network = provider["config"].setdefault("network", {})
+ tls = network.setdefault("tls", {})
+ tls["min_version"] = version
_write_config(config, _LLAMA_STACK_CONFIG)
context.llama_config_modified = TrueApply similar pattern to configure_llama_ciphers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/features/steps/proxy.py` around lines 360 - 395, Both
configure_llama_tls_version and configure_llama_ciphers currently overwrite
provider["config"]["network"]; instead load the existing network dict (e.g.,
network = provider.setdefault("config", {}).get("network", {}) or
provider["config"].setdefault("network", {})), merge/update its "tls" sub-dict
with the new keys (preserving existing keys like "min_version" and "ciphers"),
then assign back to provider["config"]["network"] before calling _write_config;
update both functions (which use _find_openai_provider and _write_config) to
perform an in-place merge of tls settings rather than replacing the whole
network dict.
tisnik
left a comment
There was a problem hiding this comment.
LGTM in overall. TYVM. Some nits, but LGTMed to not block you.
| * LLM query fails when interception proxy CA is not provided | ||
| * TLS minimum version TLSv1.2 is respected | ||
| * TLS minimum version TLSv1.3 is respected | ||
| * Custom cipher suite configuration is respected |
There was a problem hiding this comment.
yes, those are features we need to test, nice!
There was a problem hiding this comment.
Yea this testing breath is awesome!! Thanks for going this far :).
| _LLAMA_STACK_CONFIG_BACKUP = "run.yaml.proxy-backup" | ||
|
|
||
|
|
||
| def _is_docker_mode() -> bool: |
There was a problem hiding this comment.
in future (@radofuchs ) we can put an attribute into context, for the whole test run (before_all perhaps)
There was a problem hiding this comment.
yes, that will be part of the rework I will be working on, these kind of attributes are all over the e2e codebase.
For this one, please follow the advice and move it to before all, it saves time, when you execute the logic only once
| def _restore_original_services() -> None: | ||
| """Restore original run.yaml and restart services.""" | ||
| if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP): | ||
| shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG) | ||
| os.remove(_LLAMA_STACK_CONFIG_BACKUP) | ||
| try: | ||
| _restart_services() | ||
| except Exception: # pylint: disable=broad-exception-caught | ||
| pass |
There was a problem hiding this comment.
probably CodeRabbitAI is right - let's log
| """ | ||
| # Restore services after @Proxy scenarios that modified Llama Stack config | ||
| services_modified = getattr(context, "services_modified", False) | ||
| if services_modified and "Proxy" in scenario.effective_tags: |
There was a problem hiding this comment.
@radofuchs to be refactored later (the "Proxy" tag)
Changes requested by reviewer (tisnik) and CodeRabbit: - Detect Docker mode once in before_all and store as context.is_docker_mode. All proxy step functions now use the context attribute instead of calling _is_docker_mode() repeatedly. - Log exception in _restore_original_services instead of silently swallowing it. - Only clear context.services_modified on successful restoration, not when restoration fails (prevents leaking modified state). - Add 10-second timeout to tunnel proxy open_connection to prevent stalls on unreachable targets. - Handle malformed CONNECT port with ValueError catch and 400 response.
| restart_container("llama-stack") | ||
| restart_container("lightspeed-stack") | ||
| else: | ||
| _restart_services_local() |
There was a problem hiding this comment.
this will completely collapse in konflux/prow mode, do we actually need this part or is it only for debugging purposes?
| # --- Query Steps --- | ||
|
|
||
|
|
||
| @when('I send a query "{query}" to the LLM') |
There was a problem hiding this comment.
do you have any specific reason not to use the step defined in the common file?
| # --- Verification Steps --- | ||
|
|
||
|
|
||
| @then("The LLM responds successfully") |
There was a problem hiding this comment.
this step is already implemented
| ) | ||
|
|
||
|
|
||
| @then("The response indicates a proxy or connection error") |
There was a problem hiding this comment.
this step should also work through the common one
| """Verify the response indicates a connection or proxy error.""" | ||
| if context.response is not None: | ||
| assert ( | ||
| context.response.status_code >= 400 |
There was a problem hiding this comment.
this range is quite wide, cant we really check a single status code?
| Scenario: LLM traffic is routed through a configured tunnel proxy | ||
| Given A tunnel proxy is running on port 8888 | ||
| And Llama Stack is configured to route inference through the tunnel proxy | ||
| And The services are restarted with the modified Llama Stack config |
There was a problem hiding this comment.
there is actually specific order in which you need to restart the services so that it works properly, I would rather want to see here two steps for restarting of each service
Move config restoration from @Proxy after_scenario hook to an explicit Background Given step. This follows the team convention that tags are used only for test selection (filtering), not for triggering behavior. The Background step "The original Llama Stack config is restored if modified" runs before every scenario. If a previous scenario left a modified run.yaml (detected by backup file existence), it restores the original and restarts services. This handles cleanup even when the previous scenario failed mid-way. Removed: - @Proxy tag from feature file (was triggering after_scenario hook) - after_scenario hook for @Proxy in environment.py - _restore_original_services function (replaced by Background step) - context.services_modified tracking (no hook reads it) Updated docs/e2e_testing.md: tags documented as selection-only, not behavior-triggering.
Rewrite proxy e2e tests to follow project conventions: - Reuse existing step definitions: use "I use query to ask question" from llm_query_response.py and "The status code of the response is" from common_http.py instead of custom query/response steps. - Split service restart into two explicit Given steps: "Llama Stack is restarted" and "Lightspeed Stack is restarted" so restart ordering is visible in the feature file. - Remove local (non-Docker) mode code path. Proxy tests use restart_container() exclusively, consistent with the rest of the e2e test suite. - Check specific status code 500 for error scenarios instead of the broad >= 400 range. - Remove custom send_query, verify_llm_response, and verify_error_response steps that duplicated existing functionality. Net reduction: -183 lines from step definitions.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/features/steps/proxy.py`:
- Around line 339-342: Update the assertions in verify_tunnel_proxy_used and
verify_interception_proxy_used to validate the CONNECT target and intercepted
hosts against the expected provider target rather than only checking
connect_count; specifically, after the existing connect_count check
(proxy.connect_count >= count) also assert that proxy.last_connect_target ==
expected_provider_target (or contains the expected host) and for interception
verify that expected_provider_target (or its hostname) is present in
proxy.intercepted_hosts and/or proxy.last_connect_target is not None and equals
the expected target; use the existing proxy attributes proxy.connect_count,
proxy.last_connect_target, and proxy.intercepted_hosts to locate and strengthen
the checks.
- Around line 152-159: Replace the unguarded sleep-based startup in run_proxy
with a readiness handshake: create a threading.Event or queue (e.g.,
startup_event or startup_queue) that the thread-running function run_proxy uses
to report success or an exception from awaiting proxy.start(); inside run_proxy
call asyncio.set_event_loop(loop) then try to await proxy.start() and on success
set the event or put a "ready" token into the queue, and on exception put the
exception into the queue/event before exiting; in the main thread start the
daemon thread and then wait on the event or pop from the queue with a short
timeout and if an exception was reported raise/fail the test immediately (or
assert), otherwise continue — apply the same pattern to the second startup block
referenced (lines ~231-238) so tests never proceed against a dead proxy.
- Around line 108-120: The restore_if_modified routine currently in the
Background only runs before scenarios and leaves run.yaml and context state
lingering; add Behave teardown hooks (after_scenario and/or after_all) that
mirror restore_if_modified: check _LLAMA_STACK_CONFIG_BACKUP and copy it back to
_LLAMA_STACK_CONFIG, remove the backup file, call
restart_container("llama-stack") and restart_container("lightspeed-stack"), stop
any running proxy loops, and explicitly delete or reset any custom Context
attributes used by the proxy (clear the same attributes you set on context in
the proxy code) so per-scenario state does not leak across scenarios or after an
aborted run.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 46866ad0-cda3-4a8a-a70b-056809840cb0
📒 Files selected for processing (5)
docs/e2e_testing.mdtests/e2e/features/environment.pytests/e2e/features/proxy.featuretests/e2e/features/steps/proxy.pytests/e2e/proxy/tunnel_proxy.py
✅ Files skipped from review due to trivial changes (2)
- docs/e2e_testing.md
- tests/e2e/features/proxy.feature
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/e2e/features/environment.py
- tests/e2e/proxy/tunnel_proxy.py
| @given("The original Llama Stack config is restored if modified") | ||
| def restore_if_modified(context: Context) -> None: | ||
| """Restore original run.yaml if a previous scenario modified it. | ||
|
|
||
| Called from Background so every scenario starts with a clean config, | ||
| even if the previous scenario failed mid-way. | ||
| """ | ||
| if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP): | ||
| print("Restoring original Llama Stack config from backup...") | ||
| shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG) | ||
| os.remove(_LLAMA_STACK_CONFIG_BACKUP) | ||
| restart_container("llama-stack") | ||
| restart_container("lightspeed-stack") |
There was a problem hiding this comment.
Move cleanup to teardown as well.
Restoring the backup in Background only cleans up the previous scenario. After the last scenario — or after an abort before the next Background runs — run.yaml stays modified, run.yaml.proxy-backup is left behind, and the proxy state stored on context leaks into later scenarios. Add an after_scenario/after_all cleanup that restores the config, stops the proxy loops, and clears the custom context attributes.
Based on learnings: In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared, so per-scenario state should be reset in hooks to maintain test isolation.
Also applies to: 149-150, 223-229
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/features/steps/proxy.py` around lines 108 - 120, The
restore_if_modified routine currently in the Background only runs before
scenarios and leaves run.yaml and context state lingering; add Behave teardown
hooks (after_scenario and/or after_all) that mirror restore_if_modified: check
_LLAMA_STACK_CONFIG_BACKUP and copy it back to _LLAMA_STACK_CONFIG, remove the
backup file, call restart_container("llama-stack") and
restart_container("lightspeed-stack"), stop any running proxy loops, and
explicitly delete or reset any custom Context attributes used by the proxy
(clear the same attributes you set on context in the proxy code) so per-scenario
state does not leak across scenarios or after an aborted run.
| def run_proxy() -> None: | ||
| asyncio.set_event_loop(loop) | ||
| loop.run_until_complete(proxy.start()) | ||
| loop.run_forever() | ||
|
|
||
| thread = threading.Thread(target=run_proxy, daemon=True) | ||
| thread.start() | ||
| time.sleep(1) |
There was a problem hiding this comment.
Wait for proxy readiness instead of sleeping.
Both startup steps launch proxy.start() in a daemon thread and then unconditionally sleep(1). If bind/startup fails, the exception dies in the thread and the scenario keeps running against a dead proxy. Report either “ready” or the startup exception back to the step with an event/queue and fail immediately.
Also applies to: 231-238
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/features/steps/proxy.py` around lines 152 - 159, Replace the
unguarded sleep-based startup in run_proxy with a readiness handshake: create a
threading.Event or queue (e.g., startup_event or startup_queue) that the
thread-running function run_proxy uses to report success or an exception from
awaiting proxy.start(); inside run_proxy call asyncio.set_event_loop(loop) then
try to await proxy.start() and on success set the event or put a "ready" token
into the queue, and on exception put the exception into the queue/event before
exiting; in the main thread start the daemon thread and then wait on the event
or pop from the queue with a short timeout and if an exception was reported
raise/fail the test immediately (or assert), otherwise continue — apply the same
pattern to the second startup block referenced (lines ~231-238) so tests never
proceed against a dead proxy.
| assert proxy.connect_count >= count, ( | ||
| f"Expected at least {count} CONNECT requests, " f"got {proxy.connect_count}" | ||
| ) | ||
| assert proxy.last_connect_target is not None, "No CONNECT target recorded" |
There was a problem hiding this comment.
Strengthen the proxy verification assertions.
verify_tunnel_proxy_used currently accepts any CONNECT, and verify_interception_proxy_used only checks connect_count. That can pass on unrelated traffic or on a CONNECT that never completed MITM. The proxy classes already record last_connect_target and intercepted_hosts; assert those against the expected provider target so AC1/AC2 are actually being proved.
Also applies to: 349-352
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/features/steps/proxy.py` around lines 339 - 342, Update the
assertions in verify_tunnel_proxy_used and verify_interception_proxy_used to
validate the CONNECT target and intercepted hosts against the expected provider
target rather than only checking connect_count; specifically, after the existing
connect_count check (proxy.connect_count >= count) also assert that
proxy.last_connect_target == expected_provider_target (or contains the expected
host) and for interception verify that expected_provider_target (or its
hostname) is present in proxy.intercepted_hosts and/or proxy.last_connect_target
is not None and equals the expected target; use the existing proxy attributes
proxy.connect_count, proxy.last_connect_target, and proxy.intercepted_hosts to
locate and strengthen the checks.
Stop proxy servers and their event loops explicitly in the Background restore step. Previously, proxy daemon threads were left running after each scenario, causing asyncio "Task was destroyed but it is pending" warnings at process exit. The _stop_proxy helper schedules an async stop on the proxy's event loop, waits for it to complete, then stops the loop. Context references are cleared so the next scenario starts clean.
| * LLM query fails when interception proxy CA is not provided | ||
| * TLS minimum version TLSv1.2 is respected | ||
| * TLS minimum version TLSv1.3 is respected | ||
| * Custom cipher suite configuration is respected |
There was a problem hiding this comment.
Yea this testing breath is awesome!! Thanks for going this far :).
Add proxy cleanup in after_feature to stop proxy servers left running from the last scenario. The Background restore step handles cleanup between scenarios, but the last scenario's proxies persist until process exit, causing asyncio "Task was destroyed" warnings. The cleanup checks for proxy objects on context (no tag check needed) and calls _stop_proxy to gracefully shut down the event loops.
Description
Add comprehensive end-to-end tests verifying that Llama Stack's
NetworkConfig(proxy, TLS, ciphers) works correctly through the Lightspeed Stack pipeline. Resolves LCORE-1253.These tests configure
run.yamlwith proxy and TLS settings on the remote inference provider, then verify the full LSC → Llama Stack → LLM provider pipeline works correctly in restricted networking environments.Test scenarios (7 total)
AC1 — Tunnel proxy routing:
AC2 — Interception proxy with CA certificate:
AC3 — TLS version and cipher configuration:
AC4 — Road-core feature parity:
tls.min_versiontls.ciphersextra_ca/caCertPathtls.verify(path) /proxy.cacertproxy.urlproxy.url+proxy.cacerttls.client_cert+tls.client_keyno_proxyLlama Stack's NetworkConfig matches or exceeds road-core's networking capabilities.
Note:
ProxyConfig.no_proxyis defined on the Llama Stack model but not wired in_build_proxy_mounts()(llama_stack/providers/utils/inference/http_client.py). The field is silently ignored. Filed separately.Test infrastructure
How it works
Each scenario dynamically generates a modified
run-ci.yamlwithNetworkConfigon the OpenAI inference provider, restarts Llama Stack with the new config, restarts the Lightspeed Stack, and sends a query to verify the full pipeline.Type of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
Run e2e proxy tests against the full stack:
Result: 7 scenarios passed. Tunnel proxy verified CONNECT to
api.openai.com:443. Interception proxy verified TLS interception ofapi.openai.com:443. Negative tests confirmed graceful failure without CA cert and with unreachable proxy.Summary by CodeRabbit
Documentation
Tests
Chores