Skip to content

LCORE-1253: Add e2e proxy and TLS networking tests#1410

Merged
tisnik merged 7 commits into
lightspeed-core:mainfrom
max-svistunov:lcore-1253-e2e-proxy-networking
Mar 27, 2026
Merged

LCORE-1253: Add e2e proxy and TLS networking tests#1410
tisnik merged 7 commits into
lightspeed-core:mainfrom
max-svistunov:lcore-1253-e2e-proxy-networking

Conversation

@max-svistunov

@max-svistunov max-svistunov commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Description

Add comprehensive end-to-end tests verifying that Llama Stack's NetworkConfig (proxy, TLS, ciphers) works correctly through the Lightspeed Stack pipeline. Resolves LCORE-1253.

These tests configure run.yaml with proxy and TLS settings on the remote inference provider, then verify the full LSC → Llama Stack → LLM provider pipeline works correctly in restricted networking environments.

Test scenarios (7 total)

AC1 — Tunnel proxy routing:

  • Traffic is routed through a configured tunnel proxy (CONNECT to api.openai.com verified)
  • Query fails gracefully when proxy is unreachable

AC2 — Interception proxy with CA certificate:

  • Traffic works through interception proxy with correct trustme CA
  • Query fails when interception proxy CA is not provided

AC3 — TLS version and cipher configuration:

  • Minimum TLS version TLSv1.2 is respected
  • Minimum TLS version TLSv1.3 is respected
  • Custom cipher suite configuration is respected

AC4 — Road-core feature parity:

Capability Road-core Llama Stack NetworkConfig
TLS version Via TLSSecurityProfile tls.min_version
Cipher suites Via TLSSecurityProfile tls.ciphers
Custom CA cert extra_ca / caCertPath tls.verify (path) / proxy.cacert
Tunnel proxy Not implemented proxy.url
Interception proxy Not implemented proxy.url + proxy.cacert
mTLS Not implemented tls.client_cert + tls.client_key
no_proxy Not implemented Defined but not implemented (see note)

Llama Stack's NetworkConfig matches or exceeds road-core's networking capabilities.

Note: ProxyConfig.no_proxy is defined on the Llama Stack model but not wired in _build_proxy_mounts() (llama_stack/providers/utils/inference/http_client.py). The field is silently ignored. Filed separately.

Test infrastructure

  • TunnelProxy — Async HTTP CONNECT tunnel proxy (~150 lines). Creates TCP tunnels for HTTPS traffic without inspection. Tracks CONNECT count and target hosts.
  • InterceptionProxy — Async TLS-intercepting (MITM) proxy (~215 lines) using trustme CA. Generates per-target certificates, simulating corporate SSL inspection.
  • Both are Python-only (no Docker), run in background threads during tests.

How it works

Each scenario dynamically generates a modified run-ci.yaml with NetworkConfig on the OpenAI inference provider, restarts Llama Stack with the new config, restarts the Lightspeed Stack, and sends a query to verify the full pipeline.

Type of change

  • End to end tests improvement
  • Documentation Update

Tools used to create PR

  • Assisted-by: Claude Opus 4.6
  • Generated by: Claude Opus 4.6

Related Tickets & Documents

  • Related Issue # LCORE-1178
  • Closes # LCORE-1253

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

Run e2e proxy tests against the full stack:

E2E_LSC_HOSTNAME=localhost E2E_LSC_PORT=8080 E2E_LLAMA_HOSTNAME=localhost \
  E2E_LLAMA_PORT=8321 E2E_LSC_CONFIG=<path-to-local-config> \
  FAISS_VECTOR_STORE_ID=test-vector-store \
  uv run behave tests/e2e/features/proxy.feature --no-capture

Result: 7 scenarios passed. Tunnel proxy verified CONNECT to api.openai.com:443. Interception proxy verified TLS interception of api.openai.com:443. Negative tests confirmed graceful failure without CA cert and with unreachable proxy.

Summary by CodeRabbit

  • Documentation

    • Added a new E2E scenario for proxy/TLS behaviors and updated the E2E guide with tags for tunnel proxy, interception proxy, TLS version, and cipher-suite scenarios.
  • Tests

    • Added a comprehensive end-to-end proxy and TLS test suite covering tunnel routing, unreachable-proxy failure handling, interception-proxy success/failure with/without CA, TLS minimum-version enforcement, and cipher-suite validation.
  • Chores

    • Added trustme to dev dependencies for test certificate handling.

Add comprehensive end-to-end tests verifying that Llama Stack's
NetworkConfig (proxy, TLS) works correctly through the Lightspeed
Stack pipeline.

Test infrastructure:
- TunnelProxy: Async HTTP CONNECT tunnel proxy that creates TCP
  tunnels for HTTPS traffic. Tracks CONNECT count and target hosts.
- InterceptionProxy: Async TLS-intercepting (MITM) proxy using
  trustme CA to generate per-target server certificates. Simulates
  corporate SSL inspection proxies.

Behave scenarios (tests/e2e/features/proxy.feature):
- Tunnel proxy: Configures run.yaml with NetworkConfig proxy pointing
  to a local tunnel proxy. Verifies CONNECT to api.openai.com:443
  is observed and the LLM query succeeds through the proxy.
- Interception proxy: Configures run.yaml with proxy and custom CA
  cert (trustme). Verifies TLS interception of api.openai.com traffic
  and successful LLM query through the MITM proxy.
- TLS version: Configures run.yaml with min_version TLSv1.2 and
  verifies the LLM query succeeds with the TLS constraint.

Each scenario dynamically generates a modified run-ci.yaml with the
appropriate NetworkConfig, restarts Llama Stack with the new config,
restarts the Lightspeed Stack, and sends a query to verify the full
pipeline.

Added trustme>=1.2.1 to dev dependencies.
@coderabbitai

coderabbitai Bot commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Adds end-to-end proxy and TLS e2e tests and supporting test utilities: new Behave feature and steps, tunnel and interception proxy implementations, run.yaml mutation and restart helpers, docs tags, and a dev dependency for trustme.

Changes

Cohort / File(s) Summary
Documentation
docs/e2e_scenarios.md, docs/e2e_testing.md
Added a proxy.feature scenario and documented scenario tags: @Proxy, @TunnelProxy, @InterceptionProxy, @TLSVersion, @TLSCipher.
Dev dependencies
pyproject.toml
Added trustme>=1.2.1 to the dev dependency group.
Feature tests
tests/e2e/features/proxy.feature, tests/e2e/test_list.txt
New Behave feature exercising tunnel and interception proxies, unreachable-proxy failure, TLS min-version checks, and cipher-suite scenarios; added to test list.
Behave step defs
tests/e2e/features/steps/proxy.py
New step implementations to start/stop proxies, detect Docker mode, mutate and restore run.yaml (network/proxy/tls/ciphers), persist backups, restart services, and verify proxy usage counters and request outcomes.
Test hooks
tests/e2e/features/environment.py
before_all updated to set context.is_docker_mode via _is_docker_mode() from proxy steps before model selection.
Proxy utilities
tests/e2e/proxy/__init__.py, tests/e2e/proxy/tunnel_proxy.py, tests/e2e/proxy/interception_proxy.py
Added TunnelProxy (async CONNECT proxy with connect counting, last target, and relay) and InterceptionProxy (TLS-intercept MITM using trustme CA, per-target certs, interception counters, bidirectional TLS relay) plus package init.

Sequence Diagrams

sequenceDiagram
    actor Test
    participant Lightspeed
    participant "Llama Stack"
    participant TunnelProxy
    participant "LLM Endpoint"

    Test->>Test: Start TunnelProxy (port 8888)
    Test->>Test: Modify run.yaml to use proxy
    Test->>Test: Restart Llama Stack & Lightspeed
    Test->>Lightspeed: POST /v1/query
    Lightspeed->>"Llama Stack": Forward inference request
    "Llama Stack"->>TunnelProxy: HTTP CONNECT target:443
    TunnelProxy-->>TunnelProxy: Establish upstream TCP connection
    TunnelProxy-->>"Llama Stack": HTTP/1.1 200 Connection Established
    "Llama Stack"->>"LLM Endpoint": TLS handshake + request
    "LLM Endpoint"-->>"Llama Stack": Response
    "Llama Stack"-->>Lightspeed: Inference response
    Lightspeed-->>Test: Query result
    Test->>TunnelProxy: Verify CONNECT records present
Loading
sequenceDiagram
    actor Test
    participant Lightspeed
    participant "Llama Stack"
    participant InterceptionProxy
    participant "LLM Endpoint"

    Test->>Test: Generate trustme CA
    Test->>Test: Start InterceptionProxy (port 8889) with CA
    Test->>Test: Export CA and update run.yaml
    Test->>Test: Restart Llama Stack & Lightspeed
    Test->>Lightspeed: POST /v1/query
    Lightspeed->>"Llama Stack": Forward inference request
    "Llama Stack"->>InterceptionProxy: HTTP CONNECT target:443
    InterceptionProxy->>InterceptionProxy: Generate per-target cert from CA
    InterceptionProxy-->>"Llama Stack": HTTP/1.1 200 Connection Established
    "Llama Stack"->>InterceptionProxy: TLS handshake (proxy presents cert)
    InterceptionProxy-->>"LLM Endpoint": Establish upstream TLS connection
    InterceptionProxy->>InterceptionProxy: Relay bidirectional TLS streams
    "LLM Endpoint"-->>InterceptionProxy: Response
    InterceptionProxy-->>"Llama Stack": Relay response
    "Llama Stack"-->>Lightspeed: Inference response
    Lightspeed-->>Test: Query result
    Test->>InterceptionProxy: Verify interception records
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding end-to-end tests for proxy and TLS networking functionality.
Docstring Coverage ✅ Passed Docstring coverage is 94.74% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (10)
tests/e2e/features/proxy.feature (1)

18-18: Avoid fixed proxy ports to reduce CI flakiness.

Hard-coded ports are prone to collisions. Prefer allocating free ports in steps and reusing them from context in subsequent setup steps.

Also applies to: 41-41, 50-50

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/proxy.feature` at line 18, Replace the hard-coded proxy
port in the Gherkin steps that read "Given A tunnel proxy is running on port
8888" (and the other similar steps) with a dynamic free-port allocation: add a
setup step that finds/allocates an available port (store it on the scenario
context, e.g. context.proxyPort) and change the step to start the tunnel proxy
using that context value; update any later steps that reference the fixed port
to read from the same context key so the allocated port is reused across setup
and assertions.
docs/e2e_testing.md (1)

193-197: Consider linking @Proxy tag behavior to the exact hook section.

Tiny docs improvement: add a brief cross-reference here to the teardown behavior in after_scenario so readers can find restoration flow faster.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/e2e_testing.md` around lines 193 - 197, Add a short cross-reference from
the `@Proxy` row to the teardown/restore behaviour described in the
`after_scenario` hook: update the `@Proxy` description to append a parenthetical
or sentence like "see teardown in `after_scenario` for restoration flow" and
link or point readers to the `after_scenario` hook section so they can quickly
find how original services are restored after the test; ensure the reference
uses the exact symbol `after_scenario` so it’s easy to locate.
tests/e2e/proxy/interception_proxy.py (4)

148-149: Missing await writer.wait_closed() after writer.close() may cause resource leaks.

In Python 3.7+, StreamWriter.close() is not a coroutine, but the underlying transport may not be fully closed until wait_closed() is awaited. Without it, the connection may linger and cause resource leaks in long-running tests.

Proposed fix for lines 148-149
             except (OSError, ConnectionRefusedError, ssl.SSLError) as e:
                 logger.warning("Failed to connect to %s: %s", target, e)
                 tls_writer.close()
+                await tls_writer.wait_closed()
                 return
Proposed fix for lines 160-161
             remote_writer.close()
+            await remote_writer.wait_closed()
             tls_writer.close()
+            await tls_writer.wait_closed()
Proposed fix for line 171
         finally:
             writer.close()
+            await writer.wait_closed()

Also applies to: 160-161, 171-171

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/proxy/interception_proxy.py` around lines 148 - 149, The
StreamWriter.close() calls (e.g., tls_writer.close() and other writer.close()
occurrences in the same function) must be followed by awaiting the stream to
finish closing to avoid resource leaks; after each writer.close() (for example
where tls_writer.close() is called and the other two close calls identified in
this block) add await writer.wait_closed() (or await tls_writer.wait_closed() as
appropriate) so the underlying transport is fully closed before
returning/exiting.

69-73: configure_trust is unnecessary for server-side SSL context.

configure_trust(ctx) configures which CA certificates the context trusts for verifying peer certificates. For a server-side context that doesn't verify clients, this call is a no-op. It doesn't cause harm but adds confusion about intent.

Proposed fix
         server_cert = self.ca.issue_cert(hostname)
         ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
         server_cert.configure_cert(ctx)
-        self.ca.configure_trust(ctx)
         return ctx
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/proxy/interception_proxy.py` around lines 69 - 73, The server SSL
context creation currently calls self.ca.configure_trust(ctx) after
server_cert.configure_cert(ctx); remove the unnecessary call to
configure_trust(ctx) from the server-side context creation (keep
server_cert.configure_cert(ctx) and returning ctx) since configure_trust only
affects peer-trust settings used when verifying clients and is a no-op for a
non-client-verifying server context.

96-96: Use an explicit exception instead of assert for runtime error handling.

assert statements can be disabled with python -O, which would silently skip this check. For a test utility that needs reliable error reporting, raise an explicit exception.

Proposed fix
-        assert new_transport is not None, "TLS handshake failed"
+        if new_transport is None:
+            raise RuntimeError("TLS handshake failed: start_tls returned None")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/proxy/interception_proxy.py` at line 96, The runtime check
currently uses an assert on new_transport which can be skipped with python -O;
replace the assert with an explicit exception raise (e.g., raise RuntimeError or
a custom exception) when new_transport is None to ensure the TLS handshake
failure is always reported; locate the check that reads "assert new_transport is
not None, 'TLS handshake failed'" and change it to raise an exception with the
same descriptive message so the test utility reliably fails on handshake errors.

91-91: Use asyncio.get_running_loop() instead of deprecated get_event_loop().

asyncio.get_event_loop() is deprecated since Python 3.10 and will emit a DeprecationWarning when called from a coroutine. Use asyncio.get_running_loop() which is designed to be called from within a running event loop.

Proposed fix
-        loop = asyncio.get_event_loop()
+        loop = asyncio.get_running_loop()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/proxy/interception_proxy.py` at line 91, Replace the deprecated
call "loop = asyncio.get_event_loop()" with "loop = asyncio.get_running_loop()"
in the interception_proxy module; update the assignment where "loop =
asyncio.get_event_loop()" appears so it uses asyncio.get_running_loop()
(ensuring this code runs inside a coroutine or running loop), and remove or
adjust any code paths that expect creating a new loop from that variable if
needed.
tests/e2e/features/steps/proxy.py (4)

154-154: Race condition: fixed sleep may not be sufficient for proxy readiness.

time.sleep(1) assumes the proxy is ready within 1 second, but under load or on slower systems this might not hold. Consider polling the proxy's readiness (e.g., attempting a connection) similar to the service restart logic.

# Example polling approach for proxy readiness
import socket

def _wait_for_port(host: str, port: int, timeout: float = 5.0) -> None:
    """Wait until a port is accepting connections."""
    deadline = time.time() + timeout
    while time.time() < deadline:
        try:
            with socket.create_connection((host, port), timeout=0.5):
                return
        except OSError:
            time.sleep(0.1)
    raise TimeoutError(f"Port {port} not ready within {timeout}s")

Also applies to: 222-222

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` at line 154, Replace the fixed sleep that
waits for the proxy (the time.sleep(1) call) with a polling readiness helper
that attempts TCP connections until success or timeout (create a helper like
_wait_for_port(host, port, timeout) and call it where time.sleep(1) is used);
update the proxy startup logic and the other occurrence noted (the second
time.sleep at the other location) to call this helper with the proxy host/port
and a sensible timeout so the test waits for an actual open port rather than a
fixed delay.

143-154: Event loops created for proxy threads are never closed, causing resource leaks.

The event loops created at lines 143 and 210 run forever in daemon threads but are never explicitly stopped or closed. While daemon threads are terminated when the main process exits, proper cleanup would stop the loop and close it in the after_scenario hook. This also prevents potential issues if tests run in a long-lived process.

Consider storing the loop reference and adding cleanup:

# In after_scenario or a cleanup step:
if hasattr(context, 'proxy_loop'):
    context.proxy_loop.call_soon_threadsafe(context.proxy_loop.stop)
    # Allow thread to exit gracefully
    time.sleep(0.5)

Also applies to: 210-222

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` around lines 143 - 154, The new event loop
created in run_proxy (asyncio.new_event_loop assigned to context.proxy_loop and
started in a daemon thread) is never stopped or closed; update the teardown
(after_scenario) to cleanly stop and close that loop and join the thread: call
context.proxy_loop.call_soon_threadsafe(context.proxy_loop.stop), wait briefly
for the thread to exit and join the thread (store the Thread as e.g.
context.proxy_thread when starting), then call context.proxy_loop.close(); apply
the same pattern for the second loop started at the other location (lines
210-222) so both context.proxy_loop/context.proxy_thread instances are stopped,
joined and closed in after_scenario.

64-64: Use tempfile module instead of hardcoded /tmp path for cross-platform compatibility.

Hardcoded /tmp paths don't work on Windows and may cause issues in environments with non-standard temp directories. Use tempfile.gettempdir() or tempfile.NamedTemporaryFile consistently.

Proposed fix
-    with open("/tmp/llama-stack-proxy-test.log", "w") as log_file:
+    log_path = os.path.join(tempfile.gettempdir(), "llama-stack-proxy-test.log")
+    with open(log_path, "w") as log_file:
-    with open("/tmp/lightspeed-stack-proxy-test.log", "w") as log_file:
+    log_path = os.path.join(tempfile.gettempdir(), "lightspeed-stack-proxy-test.log")
+    with open(log_path, "w") as log_file:

Also applies to: 103-103

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` at line 64, Replace the hardcoded
"/tmp/llama-stack-proxy-test.log" open call with a cross-platform tempfile-based
solution: use tempfile.NamedTemporaryFile(delete=False) to create a stable temp
file or build the path with tempfile.gettempdir() + os.path.join(...) and open
that path; update both occurrences (the open("/tmp/llama-stack-proxy-test.log",
"w") as log_file: call and the same usage at the second occurrence) and ensure
the rest of the code uses the returned filename/handle consistently (refer to
the open call and any variable that holds log_file or the temp path).

54-58: pkill -f pattern may match unintended processes.

The pattern "llama stack run" could match any process with these words in its command line, including editors or terminals displaying the command. Consider using a more specific pattern or PID file.

-    subprocess.run(
-        ["pkill", "-f", "llama stack run"],
-        capture_output=True,
-        check=False,
-    )
+    # More specific pattern to avoid killing unrelated processes
+    subprocess.run(
+        ["pkill", "-f", "llama stack run.*--port"],
+        capture_output=True,
+        check=False,
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` around lines 54 - 58, The pkill invocation
using the broad pattern "llama stack run" in subprocess.run risks killing
unrelated processes; change the shutdown logic to target the exact process you
started instead—e.g., when launching the proxy store the subprocess.Popen object
or write its PID to a PID file, then use that PID to send SIGTERM/SIGKILL (via
proc.terminate()/proc.kill() or os.kill(pid, ...)) instead of
subprocess.run(["pkill", "-f", ...]); update references around the
subprocess.run call and the process startup code so termination uses the saved
Popen or PID rather than a text-pattern pkill.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/features/environment.py`:
- Around line 293-296: The except block that catches Exception during service
restoration prints a warning but then unconditionally sets
context.services_modified = False, which clears the modified flag even when
restoration failed; change the handler so that context.services_modified is only
cleared when restoration succeeds (i.e., move the assignment into the try block
or set it after successful restore), and leave context.services_modified True on
exception so subsequent scenarios can detect unresolved modifications; update
the code paths around the try/except that reference context.services_modified to
ensure it reflects actual restore success.

In `@tests/e2e/features/steps/proxy.py`:
- Around line 73-84: The health check in the loop is calling the wrong path
(/v1/health); update the request URL in the loop that builds
f"http://{llama_host}:{llama_port}/v1/health" to use the actual Llama Stack
liveness endpoint f"http://{llama_host}:{llama_port}/liveness" (or optionally
/readiness) so the poll in the function in tests/e2e/features/steps/proxy.py
succeeds against the registered health routes.

In `@tests/e2e/proxy/tunnel_proxy.py`:
- Around line 70-73: The parsing of the CONNECT target (variables target,
target_host, target_port_str, target_port) can raise ValueError when converting
target_port_str to int (e.g., "host:abc"); wrap the int(target_port_str)
conversion in a try/except ValueError block and handle it by sending a proper
HTTP error response (e.g., 400 Bad Request) back to the client and closing the
connection cleanly instead of letting the outer handler close without a
response; update the logic around the target.rsplit(...) branch to validate the
port and return the HTTP error on malformed ports.
- Around line 85-87: The upstream connect using asyncio.open_connection (which
returns remote_reader, remote_writer) can hang; wrap the call in
asyncio.wait_for with a sensible timeout (e.g., CONNECT_TIMEOUT) and catch
asyncio.TimeoutError alongside existing exceptions in the exception handling
around the connect block in tunnel_proxy.py so timeouts close the connection and
return/cleanup instead of blocking the E2E test; update the code that assigns
remote_reader, remote_writer to call await
asyncio.wait_for(asyncio.open_connection(target_host, target_port),
CONNECT_TIMEOUT) and add handling for asyncio.TimeoutError to the existing
except logic.

---

Nitpick comments:
In `@docs/e2e_testing.md`:
- Around line 193-197: Add a short cross-reference from the `@Proxy` row to the
teardown/restore behaviour described in the `after_scenario` hook: update the
`@Proxy` description to append a parenthetical or sentence like "see teardown in
`after_scenario` for restoration flow" and link or point readers to the
`after_scenario` hook section so they can quickly find how original services are
restored after the test; ensure the reference uses the exact symbol
`after_scenario` so it’s easy to locate.

In `@tests/e2e/features/proxy.feature`:
- Line 18: Replace the hard-coded proxy port in the Gherkin steps that read
"Given A tunnel proxy is running on port 8888" (and the other similar steps)
with a dynamic free-port allocation: add a setup step that finds/allocates an
available port (store it on the scenario context, e.g. context.proxyPort) and
change the step to start the tunnel proxy using that context value; update any
later steps that reference the fixed port to read from the same context key so
the allocated port is reused across setup and assertions.

In `@tests/e2e/features/steps/proxy.py`:
- Line 154: Replace the fixed sleep that waits for the proxy (the time.sleep(1)
call) with a polling readiness helper that attempts TCP connections until
success or timeout (create a helper like _wait_for_port(host, port, timeout) and
call it where time.sleep(1) is used); update the proxy startup logic and the
other occurrence noted (the second time.sleep at the other location) to call
this helper with the proxy host/port and a sensible timeout so the test waits
for an actual open port rather than a fixed delay.
- Around line 143-154: The new event loop created in run_proxy
(asyncio.new_event_loop assigned to context.proxy_loop and started in a daemon
thread) is never stopped or closed; update the teardown (after_scenario) to
cleanly stop and close that loop and join the thread: call
context.proxy_loop.call_soon_threadsafe(context.proxy_loop.stop), wait briefly
for the thread to exit and join the thread (store the Thread as e.g.
context.proxy_thread when starting), then call context.proxy_loop.close(); apply
the same pattern for the second loop started at the other location (lines
210-222) so both context.proxy_loop/context.proxy_thread instances are stopped,
joined and closed in after_scenario.
- Line 64: Replace the hardcoded "/tmp/llama-stack-proxy-test.log" open call
with a cross-platform tempfile-based solution: use
tempfile.NamedTemporaryFile(delete=False) to create a stable temp file or build
the path with tempfile.gettempdir() + os.path.join(...) and open that path;
update both occurrences (the open("/tmp/llama-stack-proxy-test.log", "w") as
log_file: call and the same usage at the second occurrence) and ensure the rest
of the code uses the returned filename/handle consistently (refer to the open
call and any variable that holds log_file or the temp path).
- Around line 54-58: The pkill invocation using the broad pattern "llama stack
run" in subprocess.run risks killing unrelated processes; change the shutdown
logic to target the exact process you started instead—e.g., when launching the
proxy store the subprocess.Popen object or write its PID to a PID file, then use
that PID to send SIGTERM/SIGKILL (via proc.terminate()/proc.kill() or
os.kill(pid, ...)) instead of subprocess.run(["pkill", "-f", ...]); update
references around the subprocess.run call and the process startup code so
termination uses the saved Popen or PID rather than a text-pattern pkill.

In `@tests/e2e/proxy/interception_proxy.py`:
- Around line 148-149: The StreamWriter.close() calls (e.g., tls_writer.close()
and other writer.close() occurrences in the same function) must be followed by
awaiting the stream to finish closing to avoid resource leaks; after each
writer.close() (for example where tls_writer.close() is called and the other two
close calls identified in this block) add await writer.wait_closed() (or await
tls_writer.wait_closed() as appropriate) so the underlying transport is fully
closed before returning/exiting.
- Around line 69-73: The server SSL context creation currently calls
self.ca.configure_trust(ctx) after server_cert.configure_cert(ctx); remove the
unnecessary call to configure_trust(ctx) from the server-side context creation
(keep server_cert.configure_cert(ctx) and returning ctx) since configure_trust
only affects peer-trust settings used when verifying clients and is a no-op for
a non-client-verifying server context.
- Line 96: The runtime check currently uses an assert on new_transport which can
be skipped with python -O; replace the assert with an explicit exception raise
(e.g., raise RuntimeError or a custom exception) when new_transport is None to
ensure the TLS handshake failure is always reported; locate the check that reads
"assert new_transport is not None, 'TLS handshake failed'" and change it to
raise an exception with the same descriptive message so the test utility
reliably fails on handshake errors.
- Line 91: Replace the deprecated call "loop = asyncio.get_event_loop()" with
"loop = asyncio.get_running_loop()" in the interception_proxy module; update the
assignment where "loop = asyncio.get_event_loop()" appears so it uses
asyncio.get_running_loop() (ensuring this code runs inside a coroutine or
running loop), and remove or adjust any code paths that expect creating a new
loop from that variable if needed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bfa48a5c-e3a7-4506-bb79-29a2f66ec1b1

📥 Commits

Reviewing files that changed from the base of the PR and between 56ce243 and e4af26a.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • docs/e2e_scenarios.md
  • docs/e2e_testing.md
  • pyproject.toml
  • tests/e2e/features/environment.py
  • tests/e2e/features/proxy.feature
  • tests/e2e/features/steps/proxy.py
  • tests/e2e/proxy/__init__.py
  • tests/e2e/proxy/interception_proxy.py
  • tests/e2e/proxy/tunnel_proxy.py
  • tests/e2e/test_list.txt

Comment thread tests/e2e/features/environment.py Outdated
Comment thread tests/e2e/features/steps/proxy.py Outdated
Comment on lines +73 to +84
for _ in range(45):
try:
resp = requests.get(
f"http://{llama_host}:{llama_port}/v1/health", timeout=2
)
if resp.status_code == 200:
return
except requests.ConnectionError:
pass
time.sleep(2)

raise TimeoutError("Llama Stack did not start within 90 seconds")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Find all health-related endpoints in Llama Stack

# Search for route definitions containing health-related paths
rg -n --type py '@router\.(get|post).*health|/liveness|/readiness|/v1/health' src/

Repository: lightspeed-core/lightspeed-stack

Length of output: 692


🏁 Script executed:

# Get details about the health endpoints in health.py
head -n 160 src/app/endpoints/health.py | tail -n 80

Repository: lightspeed-core/lightspeed-stack

Length of output: 2721


🏁 Script executed:

# Check for any middleware or FastAPI configuration that might add /v1 prefix
rg -n --type py 'app\.include_router|router.*prefix|/v1' src/app/ | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 1640


🏁 Script executed:

# Check which router module is imported as health or root in routers.py
rg -n --type py 'from.*health import|import.*health' src/app/routers.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

# Check what's in the root router
head -n 50 src/app/endpoints/root.py | grep -E '@router|from|import' 2>/dev/null || rg -n --type py '@router|health' src/app/endpoints/root.py | head -10

Repository: lightspeed-core/lightspeed-stack

Length of output: 470


🏁 Script executed:

# View the imports section of routers.py to find where health router comes from
head -n 70 src/app/routers.py | grep -E 'import|from'

Repository: lightspeed-core/lightspeed-stack

Length of output: 131


🏁 Script executed:

# Also check if health endpoints are in a different file/router
rg -n --type py 'router.*health|health.*router' src/app/

Repository: lightspeed-core/lightspeed-stack

Length of output: 357


Health endpoint path is incorrect — Llama Stack exposes /liveness, not /v1/health.

The health router is registered at the root level without a /v1 prefix (src/app/routers.py:69). Available endpoints are /liveness and /readiness (src/app/endpoints/health.py). The code polls a non-existent /v1/health endpoint, causing all health checks to fail with 404 responses and triggering a 90-second timeout on every test run.

Fix
     for _ in range(45):
         try:
             resp = requests.get(
-                f"http://{llama_host}:{llama_port}/v1/health", timeout=2
+                f"http://{llama_host}:{llama_port}/liveness", timeout=2
             )
             if resp.status_code == 200:
                 return
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` around lines 73 - 84, The health check in
the loop is calling the wrong path (/v1/health); update the request URL in the
loop that builds f"http://{llama_host}:{llama_port}/v1/health" to use the actual
Llama Stack liveness endpoint f"http://{llama_host}:{llama_port}/liveness" (or
optionally /readiness) so the poll in the function in
tests/e2e/features/steps/proxy.py succeeds against the registered health routes.

Comment thread tests/e2e/proxy/tunnel_proxy.py
Comment thread tests/e2e/proxy/tunnel_proxy.py Outdated
Expand proxy e2e test coverage to fully address all acceptance criteria:

AC1 (tunnel proxy):
- Add negative test: LLM query fails gracefully when proxy is unreachable

AC2 (interception proxy with CA):
- Add negative test: LLM query fails when interception proxy CA is not
  provided (verifies "only successful when correct CA is provided")

AC3 (TLS version and ciphers):
- Add TLSv1.3 minimum version scenario
- Add custom cipher suite configuration scenario (ECDHE+AESGCM:DHE+AESGCM)

Test infrastructure:
- Add after_scenario cleanup hook in environment.py that restores
  original Llama Stack and Lightspeed Stack configs after @Proxy
  scenarios. Prevents config leaks between scenarios.
- Use different ports for each interception proxy instance to avoid
  address-already-in-use errors in sequential scenarios.

Documentation:
- Update docs/e2e_scenarios.md with all 7 proxy test scenarios.
- Update docs/e2e_testing.md with proxy-related Behave tags
  (@Proxy, @tunnelproxy, @InterceptionProxy, @TLSVersion, @tlscipher).
@max-svistunov max-svistunov force-pushed the lcore-1253-e2e-proxy-networking branch from 931b654 to edcd81a Compare March 26, 2026 22:27

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
tests/e2e/features/steps/proxy.py (1)

148-157: ⚠️ Potential issue | 🔴 Critical

Health endpoint path is incorrect — use /liveness instead of /v1/health.

The health router is registered at the root level without a /v1 prefix. This will cause all health checks to fail with 404 responses. Additionally, the loop silently continues without raising an error if the service never becomes healthy, which could lead to flaky tests.

Proposed fix
     for _ in range(45):
         try:
             resp = requests.get(
-                f"http://{llama_host}:{llama_port}/v1/health", timeout=2
+                f"http://{llama_host}:{llama_port}/liveness", timeout=2
             )
             if resp.status_code == 200:
-                break
+                return
         except requests.ConnectionError:
             pass
         time.sleep(2)
+    raise TimeoutError(f"Llama Stack did not become healthy within 90 seconds")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` around lines 148 - 157, The loop polling
the service uses the wrong path and silently gives up: replace the requests.get
URL path f"http://{llama_host}:{llama_port}/v1/health" with
f"http://{llama_host}:{llama_port}/liveness" (the health router is mounted at
/liveness) and add explicit failure handling after the retry loop (e.g., track
success with a flag or inspect final resp and raise a RuntimeError or
AssertionError if the service never returned 200) so tests fail fast instead of
continuing silently; change the polling block around the requests.get call that
references llama_host, llama_port and requests.get to implement this.
🧹 Nitpick comments (1)
tests/e2e/features/steps/proxy.py (1)

204-221: Proxy startup uses fixed sleep which may be fragile.

The 1-second sleep (line 221) assumes the proxy is ready. Consider polling the proxy port for readiness, though for test reliability with daemon threads this is often acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` around lines 204 - 221, The test uses a
fixed time.sleep(1) after starting the daemon thread in start_tunnel_proxy which
is fragile; replace that sleep with a readiness poll that attempts to connect to
the proxy port until success or a short timeout. After thread.start() in
start_tunnel_proxy, repeatedly attempt a socket connection (to host "127.0.0.1"
or context.tunnel_proxy.host and the given port) with small sleeps between tries
and fail the test if the timeout elapses; keep using context.proxy_loop and
context.tunnel_proxy (and proxy.start()/run_proxy) so the change only replaces
the fixed sleep with a connect-retry loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/features/steps/proxy.py`:
- Around line 190-198: The helper _restore_original_services silently swallows
all exceptions from _restart_services(), which can hide failures restoring
_LLAMA_STACK_CONFIG_BACKUP to _LLAMA_STACK_CONFIG; update
_restore_original_services to catch exceptions from _restart_services() but log
the exception details (including exception text and context) using the
test/logging framework and either re-raise or set/return a failure flag so
callers can detect the cleanup failure; ensure the code still removes/renames
_LLAMA_STACK_CONFIG_BACKUP only when safe and reference the symbols
_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG, and _restart_services in your
change.
- Around line 360-395: Both configure_llama_tls_version and
configure_llama_ciphers currently overwrite provider["config"]["network"];
instead load the existing network dict (e.g., network =
provider.setdefault("config", {}).get("network", {}) or
provider["config"].setdefault("network", {})), merge/update its "tls" sub-dict
with the new keys (preserving existing keys like "min_version" and "ciphers"),
then assign back to provider["config"]["network"] before calling _write_config;
update both functions (which use _find_openai_provider and _write_config) to
perform an in-place merge of tls settings rather than replacing the whole
network dict.

---

Duplicate comments:
In `@tests/e2e/features/steps/proxy.py`:
- Around line 148-157: The loop polling the service uses the wrong path and
silently gives up: replace the requests.get URL path
f"http://{llama_host}:{llama_port}/v1/health" with
f"http://{llama_host}:{llama_port}/liveness" (the health router is mounted at
/liveness) and add explicit failure handling after the retry loop (e.g., track
success with a flag or inspect final resp and raise a RuntimeError or
AssertionError if the service never returned 200) so tests fail fast instead of
continuing silently; change the polling block around the requests.get call that
references llama_host, llama_port and requests.get to implement this.

---

Nitpick comments:
In `@tests/e2e/features/steps/proxy.py`:
- Around line 204-221: The test uses a fixed time.sleep(1) after starting the
daemon thread in start_tunnel_proxy which is fragile; replace that sleep with a
readiness poll that attempts to connect to the proxy port until success or a
short timeout. After thread.start() in start_tunnel_proxy, repeatedly attempt a
socket connection (to host "127.0.0.1" or context.tunnel_proxy.host and the
given port) with small sleeps between tries and fail the test if the timeout
elapses; keep using context.proxy_loop and context.tunnel_proxy (and
proxy.start()/run_proxy) so the change only replaces the fixed sleep with a
connect-retry loop.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 56b6b0a8-6270-486b-b393-193c6af3e455

📥 Commits

Reviewing files that changed from the base of the PR and between e4af26a and edcd81a.

📒 Files selected for processing (5)
  • docs/e2e_scenarios.md
  • docs/e2e_testing.md
  • tests/e2e/features/environment.py
  • tests/e2e/features/proxy.feature
  • tests/e2e/features/steps/proxy.py
✅ Files skipped from review due to trivial changes (3)
  • docs/e2e_scenarios.md
  • docs/e2e_testing.md
  • tests/e2e/features/proxy.feature
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/e2e/features/environment.py

Comment thread tests/e2e/features/steps/proxy.py Outdated
Comment on lines +190 to +198
def _restore_original_services() -> None:
"""Restore original run.yaml and restart services."""
if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP):
shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG)
os.remove(_LLAMA_STACK_CONFIG_BACKUP)
try:
_restart_services()
except Exception: # pylint: disable=broad-exception-caught
pass

@coderabbitai coderabbitai Bot Mar 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Exception swallowing hides restoration failures.

Catching and silently ignoring all exceptions during _restart_services() could mask critical cleanup failures. Consider logging the exception at minimum, or re-raising after setting a flag.

Proposed fix
 def _restore_original_services() -> None:
     """Restore original run.yaml and restart services."""
     if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP):
         shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG)
         os.remove(_LLAMA_STACK_CONFIG_BACKUP)
     try:
         _restart_services()
     except Exception:  # pylint: disable=broad-exception-caught
-        pass
+        import logging
+        logging.getLogger(__name__).warning(
+            "Service restoration failed during cleanup", exc_info=True
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _restore_original_services() -> None:
"""Restore original run.yaml and restart services."""
if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP):
shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG)
os.remove(_LLAMA_STACK_CONFIG_BACKUP)
try:
_restart_services()
except Exception: # pylint: disable=broad-exception-caught
pass
def _restore_original_services() -> None:
"""Restore original run.yaml and restart services."""
if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP):
shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG)
os.remove(_LLAMA_STACK_CONFIG_BACKUP)
try:
_restart_services()
except Exception: # pylint: disable=broad-exception-caught
import logging
logging.getLogger(__name__).warning(
"Service restoration failed during cleanup", exc_info=True
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` around lines 190 - 198, The helper
_restore_original_services silently swallows all exceptions from
_restart_services(), which can hide failures restoring
_LLAMA_STACK_CONFIG_BACKUP to _LLAMA_STACK_CONFIG; update
_restore_original_services to catch exceptions from _restart_services() but log
the exception details (including exception text and context) using the
test/logging framework and either re-raise or set/return a failure flag so
callers can detect the cleanup failure; ensure the code still removes/renames
_LLAMA_STACK_CONFIG_BACKUP only when safe and reference the symbols
_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG, and _restart_services in your
change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably CodeRabbitAI is right - let's log

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tisnik, agreed! Here's the planned fix:

  • Move logging import to the top of the file with the other stdlib imports
  • Replace the bare pass in the except block with a logging.warning(...) call that logs the exception details

Failed to handle agent chat message. Please try again.

Comment thread tests/e2e/features/steps/proxy.py Outdated
Comment on lines +360 to +395
@given('Llama Stack is configured with minimum TLS version "{version}"')
def configure_llama_tls_version(context: Context, version: str) -> None:
"""Modify run.yaml with TLS version config."""
_backup_llama_config()
config = _load_llama_config()
provider = _find_openai_provider(config)

if "config" not in provider:
provider["config"] = {}
provider["config"]["network"] = {
"tls": {
"min_version": version,
}
}

_write_config(config, _LLAMA_STACK_CONFIG)
context.llama_config_modified = True


@given('Llama Stack is configured with ciphers "{ciphers}"')
def configure_llama_ciphers(context: Context, ciphers: str) -> None:
"""Modify run.yaml with cipher suite config."""
_backup_llama_config()
config = _load_llama_config()
provider = _find_openai_provider(config)

if "config" not in provider:
provider["config"] = {}
provider["config"]["network"] = {
"tls": {
"ciphers": ciphers.split(":"),
}
}

_write_config(config, _LLAMA_STACK_CONFIG)
context.llama_config_modified = True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

TLS configuration steps overwrite rather than merge network settings.

Both configure_llama_tls_version and configure_llama_ciphers completely replace provider["config"]["network"]. If a scenario requires both min_version and ciphers, the second step will overwrite the first. Consider merging with existing network config.

Proposed fix for merging
 `@given`('Llama Stack is configured with minimum TLS version "{version}"')
 def configure_llama_tls_version(context: Context, version: str) -> None:
     """Modify run.yaml with TLS version config."""
     _backup_llama_config()
     config = _load_llama_config()
     provider = _find_openai_provider(config)

     if "config" not in provider:
         provider["config"] = {}
-    provider["config"]["network"] = {
-        "tls": {
-            "min_version": version,
-        }
-    }
+    network = provider["config"].setdefault("network", {})
+    tls = network.setdefault("tls", {})
+    tls["min_version"] = version

     _write_config(config, _LLAMA_STACK_CONFIG)
     context.llama_config_modified = True

Apply similar pattern to configure_llama_ciphers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` around lines 360 - 395, Both
configure_llama_tls_version and configure_llama_ciphers currently overwrite
provider["config"]["network"]; instead load the existing network dict (e.g.,
network = provider.setdefault("config", {}).get("network", {}) or
provider["config"].setdefault("network", {})), merge/update its "tls" sub-dict
with the new keys (preserving existing keys like "min_version" and "ciphers"),
then assign back to provider["config"]["network"] before calling _write_config;
update both functions (which use _find_openai_provider and _write_config) to
perform an in-place merge of tls settings rather than replacing the whole
network dict.

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in overall. TYVM. Some nits, but LGTMed to not block you.

Comment thread docs/e2e_scenarios.md
* LLM query fails when interception proxy CA is not provided
* TLS minimum version TLSv1.2 is respected
* TLS minimum version TLSv1.3 is respected
* Custom cipher suite configuration is respected

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, those are features we need to test, nice!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this testing breath is awesome!! Thanks for going this far :).

_LLAMA_STACK_CONFIG_BACKUP = "run.yaml.proxy-backup"


def _is_docker_mode() -> bool:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future (@radofuchs ) we can put an attribute into context, for the whole test run (before_all perhaps)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that will be part of the rework I will be working on, these kind of attributes are all over the e2e codebase.

For this one, please follow the advice and move it to before all, it saves time, when you execute the logic only once

Comment thread tests/e2e/features/steps/proxy.py Outdated
Comment on lines +190 to +198
def _restore_original_services() -> None:
"""Restore original run.yaml and restart services."""
if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP):
shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG)
os.remove(_LLAMA_STACK_CONFIG_BACKUP)
try:
_restart_services()
except Exception: # pylint: disable=broad-exception-caught
pass

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably CodeRabbitAI is right - let's log

Comment thread tests/e2e/proxy/tunnel_proxy.py Outdated
Comment thread tests/e2e/features/environment.py Outdated
"""
# Restore services after @Proxy scenarios that modified Llama Stack config
services_modified = getattr(context, "services_modified", False)
if services_modified and "Proxy" in scenario.effective_tags:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radofuchs to be refactored later (the "Proxy" tag)

Changes requested by reviewer (tisnik) and CodeRabbit:

- Detect Docker mode once in before_all and store as
  context.is_docker_mode. All proxy step functions now use the
  context attribute instead of calling _is_docker_mode() repeatedly.
- Log exception in _restore_original_services instead of silently
  swallowing it.
- Only clear context.services_modified on successful restoration,
  not when restoration fails (prevents leaking modified state).
- Add 10-second timeout to tunnel proxy open_connection to prevent
  stalls on unreachable targets.
- Handle malformed CONNECT port with ValueError catch and 400
  response.
Comment thread tests/e2e/features/steps/proxy.py Outdated
restart_container("llama-stack")
restart_container("lightspeed-stack")
else:
_restart_services_local()

@radofuchs radofuchs Mar 27, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will completely collapse in konflux/prow mode, do we actually need this part or is it only for debugging purposes?

Comment thread tests/e2e/features/steps/proxy.py Outdated
# --- Query Steps ---


@when('I send a query "{query}" to the LLM')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have any specific reason not to use the step defined in the common file?

Comment thread tests/e2e/features/steps/proxy.py Outdated
# --- Verification Steps ---


@then("The LLM responds successfully")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this step is already implemented

Comment thread tests/e2e/features/steps/proxy.py Outdated
)


@then("The response indicates a proxy or connection error")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this step should also work through the common one

Comment thread tests/e2e/features/steps/proxy.py Outdated
"""Verify the response indicates a connection or proxy error."""
if context.response is not None:
assert (
context.response.status_code >= 400

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this range is quite wide, cant we really check a single status code?

Comment thread tests/e2e/features/proxy.feature Outdated
Scenario: LLM traffic is routed through a configured tunnel proxy
Given A tunnel proxy is running on port 8888
And Llama Stack is configured to route inference through the tunnel proxy
And The services are restarted with the modified Llama Stack config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is actually specific order in which you need to restart the services so that it works properly, I would rather want to see here two steps for restarting of each service

Move config restoration from @Proxy after_scenario hook to an
explicit Background Given step. This follows the team convention
that tags are used only for test selection (filtering), not for
triggering behavior.

The Background step "The original Llama Stack config is restored
if modified" runs before every scenario. If a previous scenario
left a modified run.yaml (detected by backup file existence), it
restores the original and restarts services. This handles cleanup
even when the previous scenario failed mid-way.

Removed:
- @Proxy tag from feature file (was triggering after_scenario hook)
- after_scenario hook for @Proxy in environment.py
- _restore_original_services function (replaced by Background step)
- context.services_modified tracking (no hook reads it)

Updated docs/e2e_testing.md: tags documented as selection-only,
not behavior-triggering.
Rewrite proxy e2e tests to follow project conventions:

- Reuse existing step definitions: use "I use query to ask question"
  from llm_query_response.py and "The status code of the response is"
  from common_http.py instead of custom query/response steps.
- Split service restart into two explicit Given steps: "Llama Stack
  is restarted" and "Lightspeed Stack is restarted" so restart
  ordering is visible in the feature file.
- Remove local (non-Docker) mode code path. Proxy tests use
  restart_container() exclusively, consistent with the rest of the
  e2e test suite.
- Check specific status code 500 for error scenarios instead of the
  broad >= 400 range.
- Remove custom send_query, verify_llm_response, and
  verify_error_response steps that duplicated existing functionality.

Net reduction: -183 lines from step definitions.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/features/steps/proxy.py`:
- Around line 339-342: Update the assertions in verify_tunnel_proxy_used and
verify_interception_proxy_used to validate the CONNECT target and intercepted
hosts against the expected provider target rather than only checking
connect_count; specifically, after the existing connect_count check
(proxy.connect_count >= count) also assert that proxy.last_connect_target ==
expected_provider_target (or contains the expected host) and for interception
verify that expected_provider_target (or its hostname) is present in
proxy.intercepted_hosts and/or proxy.last_connect_target is not None and equals
the expected target; use the existing proxy attributes proxy.connect_count,
proxy.last_connect_target, and proxy.intercepted_hosts to locate and strengthen
the checks.
- Around line 152-159: Replace the unguarded sleep-based startup in run_proxy
with a readiness handshake: create a threading.Event or queue (e.g.,
startup_event or startup_queue) that the thread-running function run_proxy uses
to report success or an exception from awaiting proxy.start(); inside run_proxy
call asyncio.set_event_loop(loop) then try to await proxy.start() and on success
set the event or put a "ready" token into the queue, and on exception put the
exception into the queue/event before exiting; in the main thread start the
daemon thread and then wait on the event or pop from the queue with a short
timeout and if an exception was reported raise/fail the test immediately (or
assert), otherwise continue — apply the same pattern to the second startup block
referenced (lines ~231-238) so tests never proceed against a dead proxy.
- Around line 108-120: The restore_if_modified routine currently in the
Background only runs before scenarios and leaves run.yaml and context state
lingering; add Behave teardown hooks (after_scenario and/or after_all) that
mirror restore_if_modified: check _LLAMA_STACK_CONFIG_BACKUP and copy it back to
_LLAMA_STACK_CONFIG, remove the backup file, call
restart_container("llama-stack") and restart_container("lightspeed-stack"), stop
any running proxy loops, and explicitly delete or reset any custom Context
attributes used by the proxy (clear the same attributes you set on context in
the proxy code) so per-scenario state does not leak across scenarios or after an
aborted run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 46866ad0-cda3-4a8a-a70b-056809840cb0

📥 Commits

Reviewing files that changed from the base of the PR and between edcd81a and e416861.

📒 Files selected for processing (5)
  • docs/e2e_testing.md
  • tests/e2e/features/environment.py
  • tests/e2e/features/proxy.feature
  • tests/e2e/features/steps/proxy.py
  • tests/e2e/proxy/tunnel_proxy.py
✅ Files skipped from review due to trivial changes (2)
  • docs/e2e_testing.md
  • tests/e2e/features/proxy.feature
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/e2e/features/environment.py
  • tests/e2e/proxy/tunnel_proxy.py

Comment on lines +108 to +120
@given("The original Llama Stack config is restored if modified")
def restore_if_modified(context: Context) -> None:
"""Restore original run.yaml if a previous scenario modified it.

Called from Background so every scenario starts with a clean config,
even if the previous scenario failed mid-way.
"""
if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP):
print("Restoring original Llama Stack config from backup...")
shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG)
os.remove(_LLAMA_STACK_CONFIG_BACKUP)
restart_container("llama-stack")
restart_container("lightspeed-stack")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Move cleanup to teardown as well.

Restoring the backup in Background only cleans up the previous scenario. After the last scenario — or after an abort before the next Background runs — run.yaml stays modified, run.yaml.proxy-backup is left behind, and the proxy state stored on context leaks into later scenarios. Add an after_scenario/after_all cleanup that restores the config, stops the proxy loops, and clears the custom context attributes.

Based on learnings: In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared, so per-scenario state should be reset in hooks to maintain test isolation.

Also applies to: 149-150, 223-229

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` around lines 108 - 120, The
restore_if_modified routine currently in the Background only runs before
scenarios and leaves run.yaml and context state lingering; add Behave teardown
hooks (after_scenario and/or after_all) that mirror restore_if_modified: check
_LLAMA_STACK_CONFIG_BACKUP and copy it back to _LLAMA_STACK_CONFIG, remove the
backup file, call restart_container("llama-stack") and
restart_container("lightspeed-stack"), stop any running proxy loops, and
explicitly delete or reset any custom Context attributes used by the proxy
(clear the same attributes you set on context in the proxy code) so per-scenario
state does not leak across scenarios or after an aborted run.

Comment on lines +152 to +159
def run_proxy() -> None:
asyncio.set_event_loop(loop)
loop.run_until_complete(proxy.start())
loop.run_forever()

thread = threading.Thread(target=run_proxy, daemon=True)
thread.start()
time.sleep(1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wait for proxy readiness instead of sleeping.

Both startup steps launch proxy.start() in a daemon thread and then unconditionally sleep(1). If bind/startup fails, the exception dies in the thread and the scenario keeps running against a dead proxy. Report either “ready” or the startup exception back to the step with an event/queue and fail immediately.

Also applies to: 231-238

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` around lines 152 - 159, Replace the
unguarded sleep-based startup in run_proxy with a readiness handshake: create a
threading.Event or queue (e.g., startup_event or startup_queue) that the
thread-running function run_proxy uses to report success or an exception from
awaiting proxy.start(); inside run_proxy call asyncio.set_event_loop(loop) then
try to await proxy.start() and on success set the event or put a "ready" token
into the queue, and on exception put the exception into the queue/event before
exiting; in the main thread start the daemon thread and then wait on the event
or pop from the queue with a short timeout and if an exception was reported
raise/fail the test immediately (or assert), otherwise continue — apply the same
pattern to the second startup block referenced (lines ~231-238) so tests never
proceed against a dead proxy.

Comment on lines +339 to +342
assert proxy.connect_count >= count, (
f"Expected at least {count} CONNECT requests, " f"got {proxy.connect_count}"
)
assert proxy.last_connect_target is not None, "No CONNECT target recorded"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Strengthen the proxy verification assertions.

verify_tunnel_proxy_used currently accepts any CONNECT, and verify_interception_proxy_used only checks connect_count. That can pass on unrelated traffic or on a CONNECT that never completed MITM. The proxy classes already record last_connect_target and intercepted_hosts; assert those against the expected provider target so AC1/AC2 are actually being proved.

Also applies to: 349-352

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/features/steps/proxy.py` around lines 339 - 342, Update the
assertions in verify_tunnel_proxy_used and verify_interception_proxy_used to
validate the CONNECT target and intercepted hosts against the expected provider
target rather than only checking connect_count; specifically, after the existing
connect_count check (proxy.connect_count >= count) also assert that
proxy.last_connect_target == expected_provider_target (or contains the expected
host) and for interception verify that expected_provider_target (or its
hostname) is present in proxy.intercepted_hosts and/or proxy.last_connect_target
is not None and equals the expected target; use the existing proxy attributes
proxy.connect_count, proxy.last_connect_target, and proxy.intercepted_hosts to
locate and strengthen the checks.

@radofuchs radofuchs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Stop proxy servers and their event loops explicitly in the Background
restore step. Previously, proxy daemon threads were left running after
each scenario, causing asyncio "Task was destroyed but it is pending"
warnings at process exit.

The _stop_proxy helper schedules an async stop on the proxy's event
loop, waits for it to complete, then stops the loop. Context
references are cleared so the next scenario starts clean.

@jrobertboos jrobertboos left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great Job!!

Comment thread docs/e2e_scenarios.md
* LLM query fails when interception proxy CA is not provided
* TLS minimum version TLSv1.2 is respected
* TLS minimum version TLSv1.3 is respected
* Custom cipher suite configuration is respected

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this testing breath is awesome!! Thanks for going this far :).

Add proxy cleanup in after_feature to stop proxy servers left running
from the last scenario. The Background restore step handles cleanup
between scenarios, but the last scenario's proxies persist until
process exit, causing asyncio "Task was destroyed" warnings.

The cleanup checks for proxy objects on context (no tag check needed)
and calls _stop_proxy to gracefully shut down the event loops.
@tisnik tisnik merged commit 270aa62 into lightspeed-core:main Mar 27, 2026
22 of 23 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Mar 30, 2026
19 tasks
@coderabbitai coderabbitai Bot mentioned this pull request Apr 14, 2026
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants