LCORE-1251: Added TLS E2E Tests#1413
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds comprehensive TLS testing infrastructure, including a mock HTTPS inference server, end-to-end test scenarios, and configuration updates. It introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant Test as BDD Test
participant LStack as Lightspeed Stack
participant LlStack as Llama Stack
participant MockTLS as Mock TLS Inference
Test->>LStack: POST /v1/chat/completions
LStack->>LlStack: Forward request to inference
LlStack->>MockTLS: HTTPS request with TLS config
Note over MockTLS: Verify TLS:<br/>- Certificate trust<br/>- Client cert (if mTLS)<br/>- Min version
MockTLS->>MockTLS: Generate completion
MockTLS->>LlStack: TLS response
LlStack->>LStack: Response
LStack->>Test: Result (200 or 500)
Test->>Test: Assert status & content
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
I like how @max-svistunov used |
tisnik
left a comment
There was a problem hiding this comment.
PTAL at comments.
Also certs can be generated on the fly - please look at @max-svistunov 's PR
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/features/environment.py`:
- Around line 591-598: After restoring TLS defaults with
switch_config(context.default_config_backup), also remove the backup and restart
the service: call remove_config_backup(context.default_config_backup) and clear
the attribute (e.g., delattr(context, "default_config_backup") or set it to
None), then if not context.is_library_mode call
restart_container("lightspeed-stack") so the restored file is actually reloaded;
keep the existing handling for run_yaml_backup
(switch_run_config/remove_config_backup) unchanged.
- Around line 534-542: The TLS feature block currently hardcodes Docker TLS
configs and doesn't work on Prow; modify the block that checks "TLS" in
feature.tags to detect the Prow environment (e.g., via an env var such as PROW
or a CI indicator) and short-circuit by skipping the feature when running on
Prow (call feature.skip or raise behave.exceptions.Skipped with a clear message)
instead of calling create_config_backup, switch_config or switch_run_config;
alternatively, if you prefer to support Prow, implement the missing Prow-side
TLS plumbing where switch_run_config would apply run.yaml variants—reference the
existing symbols create_config_backup, switch_config and switch_run_config to
locate the code to change.
In `@tests/e2e/mock_tls_inference_server/certs/ca.key`:
- Around line 1-52: This file contains a committed CA private key (ca.key) which
must be removed; update the TLS test fixture to generate ephemeral certs at test
runtime (e.g., use trustme or an OpenSSL-based helper in the mock TLS server
setup used by the mock_tls_inference_server tests), change the test setup code
that currently reads the static ca.key to instead create and load
runtime-generated key/certpair into the test server and client, ensure generated
artifacts are not committed by removing ca.key from the repo and adding runtime
output paths to .gitignore, and update any test/CI initialization helpers to
produce and clean up certs during test execution.
In `@tests/e2e/mock_tls_inference_server/Dockerfile`:
- Around line 1-6: The Dockerfile runs the mock TLS server as root; update it to
create and switch to a non-root user before CMD so the container runs with least
privilege: add steps to create a dedicated user/group (e.g., "appuser"), chown
the application files and certs (referencing the WORKDIR /app and files copied
by COPY server.py and COPY certs/), and add a USER appuser instruction before
CMD ["python", "server.py"] so the process runs as that non-root user.
In `@tests/e2e/mock_tls_inference_server/server.py`:
- Around line 64-69: The current handler silently swallows JSON errors and falls
back to {} then a default MODEL_ID which lets malformed/empty requests succeed;
change the except block so that when json.loads(body) raises
JSONDecodeError/UnicodeDecodeError you return an HTTP 400/422 error response (no
longer set request_data = {}) and do not proceed; after parsing, validate
request_data contains required completion fields (e.g., ensure
request_data.get("model") is present or that one of "input"/"prompt"/"messages"
exists) and if validation fails return a 400/422 error instead of defaulting
model = MODEL_ID; update the logic around request_data, model and any response
path in the request handler so malformed or empty payloads are rejected.
- Around line 105-115: The TLS server context currently doesn't pin protocol
versions; update _make_tls_context to accept optional min_version and
max_version parameters (e.g., of type ssl.TLSVersion or accepted string mapped
to ssl.TLSVersion) and set ctx.minimum_version and ctx.maximum_version
accordingly before returning. Ensure callers in tests (the run-tls-min-version
scenario) pass min_version=ssl.TLSVersion.TLSv1_3 (or equivalent) so the mock
server enforces the same TLS bounds as the client, and adjust any call sites to
supply these arguments.
In `@tests/e2e/utils/utils.py`:
- Around line 347-348: The function currently substitutes an empty string when
context.faiss_vector_store_id is missing, which hides configuration errors;
change it to fail fast by checking if the placeholder "{VECTOR_STORE_ID}"
appears in result and context.faiss_vector_store_id is falsy, and in that case
raise a clear AssertionError/ValueError (mentioning that FAISS vector store ID
is required) instead of returning result.replace with an empty string; otherwise
continue to replace using context.faiss_vector_store_id as before (referencing
result, context.faiss_vector_store_id, and the "{VECTOR_STORE_ID}" placeholder).
- Around line 180-181: The Prow branch in switch_run_config() currently returns
early and skips applying TLS run.yaml updates; modify switch_run_config to
detect Prow via is_prow_environment(), call update_config_configmap(source_path)
to push the updated run.yaml/configmap into the cluster, then return so
subsequent restart logic isn't run locally; target the switch_run_config
function and ensure update_config_configmap is invoked with the same source_path
used by switch_run_config.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d22b2762-aa0d-4c94-9347-02f856f1c52c
📒 Files selected for processing (21)
docker-compose-library.yamldocker-compose.yamltests/e2e/configs/run-tls-min-version.yamltests/e2e/configs/run-tls-mtls.yamltests/e2e/configs/run-tls-verify-ca.yamltests/e2e/configs/run-tls-verify-default.yamltests/e2e/configs/run-tls-verify-false.yamltests/e2e/configuration/library-mode/lightspeed-stack-tls.yamltests/e2e/configuration/server-mode/lightspeed-stack-tls.yamltests/e2e/features/environment.pytests/e2e/features/tls.featuretests/e2e/mock_tls_inference_server/Dockerfiletests/e2e/mock_tls_inference_server/certs/ca.crttests/e2e/mock_tls_inference_server/certs/ca.keytests/e2e/mock_tls_inference_server/certs/client.crttests/e2e/mock_tls_inference_server/certs/client.keytests/e2e/mock_tls_inference_server/certs/server.crttests/e2e/mock_tls_inference_server/certs/server.keytests/e2e/mock_tls_inference_server/generate_certs.shtests/e2e/mock_tls_inference_server/server.pytests/e2e/utils/utils.py
| FROM python:3.12-slim | ||
| WORKDIR /app | ||
| COPY server.py . | ||
| COPY certs/ certs/ | ||
| EXPOSE 8443 8444 | ||
| CMD ["python", "server.py"] |
There was a problem hiding this comment.
Run the mock TLS server as a non-root user.
Container currently runs as root, which weakens container security posture and can fail policy checks.
Proposed fix
FROM python:3.12-slim
WORKDIR /app
+RUN useradd --create-home --uid 10001 appuser
COPY server.py .
COPY certs/ certs/
+RUN chown -R appuser:appuser /app
+USER appuser
EXPOSE 8443 8444
CMD ["python", "server.py"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| FROM python:3.12-slim | |
| WORKDIR /app | |
| COPY server.py . | |
| COPY certs/ certs/ | |
| EXPOSE 8443 8444 | |
| CMD ["python", "server.py"] | |
| FROM python:3.12-slim | |
| WORKDIR /app | |
| RUN useradd --create-home --uid 10001 appuser | |
| COPY server.py . | |
| COPY certs/ certs/ | |
| RUN chown -R appuser:appuser /app | |
| USER appuser | |
| EXPOSE 8443 8444 | |
| CMD ["python", "server.py"] |
🧰 Tools
🪛 Trivy (0.69.3)
[error] 1-1: Image user should not be 'root'
Specify at least 1 USER command in Dockerfile with non-root user as argument
Rule: DS-0002
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/mock_tls_inference_server/Dockerfile` around lines 1 - 6, The
Dockerfile runs the mock TLS server as root; update it to create and switch to a
non-root user before CMD so the container runs with least privilege: add steps
to create a dedicated user/group (e.g., "appuser"), chown the application files
and certs (referencing the WORKDIR /app and files copied by COPY server.py and
COPY certs/), and add a USER appuser instruction before CMD ["python",
"server.py"] so the process runs as that non-root user.
| try: | ||
| request_data = json.loads(body.decode("utf-8")) | ||
| except (json.JSONDecodeError, UnicodeDecodeError): | ||
| request_data = {} | ||
|
|
||
| model = request_data.get("model", MODEL_ID) |
There was a problem hiding this comment.
Reject empty or malformed completion payloads.
Falling back to {} and then defaulting model lets this mock return 200 even when the client stops sending a valid OpenAI request, so the TLS scenario can false-pass while request construction is broken.
🐛 Proposed fix
try:
request_data = json.loads(body.decode("utf-8"))
except (json.JSONDecodeError, UnicodeDecodeError):
- request_data = {}
+ self.send_error(400, "Invalid JSON body")
+ return
- model = request_data.get("model", MODEL_ID)
+ if not isinstance(request_data, dict) or "model" not in request_data:
+ self.send_error(400, "Missing required field: model")
+ return
+
+ model = request_data["model"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| request_data = json.loads(body.decode("utf-8")) | |
| except (json.JSONDecodeError, UnicodeDecodeError): | |
| request_data = {} | |
| model = request_data.get("model", MODEL_ID) | |
| try: | |
| request_data = json.loads(body.decode("utf-8")) | |
| except (json.JSONDecodeError, UnicodeDecodeError): | |
| self.send_error(400, "Invalid JSON body") | |
| return | |
| if not isinstance(request_data, dict) or "model" not in request_data: | |
| self.send_error(400, "Missing required field: model") | |
| return | |
| model = request_data["model"] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/mock_tls_inference_server/server.py` around lines 64 - 69, The
current handler silently swallows JSON errors and falls back to {} then a
default MODEL_ID which lets malformed/empty requests succeed; change the except
block so that when json.loads(body) raises JSONDecodeError/UnicodeDecodeError
you return an HTTP 400/422 error response (no longer set request_data = {}) and
do not proceed; after parsing, validate request_data contains required
completion fields (e.g., ensure request_data.get("model") is present or that one
of "input"/"prompt"/"messages" exists) and if validation fails return a 400/422
error instead of defaulting model = MODEL_ID; update the logic around
request_data, model and any response path in the request handler so malformed or
empty payloads are rejected.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
tests/e2e/features/environment.py (2)
595-602:⚠️ Potential issue | 🟠 MajorRestart
lightspeed-stackafter restoring the TLS defaults.This only restores files on disk. The running service keeps the TLS config loaded into later features, and this branch never removes
context.default_config_backup.♻️ Proposed fix
if "TLS" in feature.tags: switch_config(context.default_config_backup) run_backup = getattr(context, "run_yaml_backup", None) if run_backup: switch_run_config(run_backup) remove_config_backup(run_backup) if not context.is_library_mode: restart_container("llama-stack") + restart_container("lightspeed-stack") + remove_config_backup(context.default_config_backup)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/environment.py` around lines 595 - 602, After restoring TLS defaults with switch_config(context.default_config_backup), also delete the backup (clear context.default_config_backup) and, if a run_backup was applied, remove_config_backup(run_backup) as before; finally restart the running service "lightspeed-stack" (not "llama-stack") when not in library mode by calling restart_container("lightspeed-stack") so the restored TLS files are picked up by the running service. Ensure the branch still checks getattr(context, "run_yaml_backup", None) and uses switch_run_config(run_backup) prior to remove_config_backup(run_backup).
538-545:⚠️ Potential issue | 🟠 MajorGuard
@TLSin Prow.This branch assumes Docker-side config swapping, but
switch_run_config()is a no-op intests/e2e/utils/utils.pyon Prow. Without an explicit skip or separate Prow plumbing, the@TLS*Configscenarios silently exercise the wrongrun.yamlthere.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/environment.py` around lines 538 - 545, The TLS branch in environment.py assumes Docker-side config swapping but on Prow switch_run_config() is a no-op, causing `@TLS` scenarios to exercise the wrong run.yaml; update the TLS handling (the block that sets context.feature_config, context.default_config_backup via create_config_backup, context.run_yaml_backup, and calls switch_config) to detect Prow (e.g., an existing PROW env var or a context.is_prow flag) and either skip TLS scenarios when running on Prow or call an alternative plumbing path; specifically, guard the TLS branch so it does not perform config swaps on Prow (or explicitly skip the scenario) and ensure any calls to switch_run_config()/switch_config() are only executed when not on Prow.tests/e2e/mock_tls_inference_server/Dockerfile (1)
1-14:⚠️ Potential issue | 🟠 MajorStill running as root.
Add a non-root
USERbeforeCMD; otherwise this container keeps broader-than-needed privileges for the test server.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/mock_tls_inference_server/Dockerfile` around lines 1 - 14, The Dockerfile is still running as root; add a dedicated non-root user and switch to it before CMD to reduce privileges: create a user (e.g., adduser or useradd with a fixed UID/GID like 1000), ensure ownership of copied artifacts and the /certs directory is changed (chown) so that the new user can access server.py and /certs, perform package installs and file copies as root first, then add a USER instruction (referencing USER and CMD in the Dockerfile) to run the container as that non-root user.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker-compose-library.yaml`:
- Around line 125-127: Remove the unnecessary host port bindings from the mock
TLS service so the library-mode stack uses the compose network only; locate the
mock-tls-inference service (look for the service name "mock-tls-inference") and
delete or comment out the ports block entries "- "8443:8443"" and "-
"8444:8444"" (or remove the entire ports: section if it becomes empty) so the
service is reachable only via the compose network and does not publish ports to
the host.
In `@docker-compose.yaml`:
- Around line 153-155: Remove the host port bindings "8443:8443" and "8444:8444"
from the docker-compose service that talks to mock-tls-inference over
lightspeednet (the entries under ports currently mapping 8443 and 8444), so the
service only exposes those ports internally to the compose network; keep the
internal container ports as needed for in-container healthcheck and
inter-service communication but do not publish them to the host to avoid startup
conflicts.
---
Duplicate comments:
In `@tests/e2e/features/environment.py`:
- Around line 595-602: After restoring TLS defaults with
switch_config(context.default_config_backup), also delete the backup (clear
context.default_config_backup) and, if a run_backup was applied,
remove_config_backup(run_backup) as before; finally restart the running service
"lightspeed-stack" (not "llama-stack") when not in library mode by calling
restart_container("lightspeed-stack") so the restored TLS files are picked up by
the running service. Ensure the branch still checks getattr(context,
"run_yaml_backup", None) and uses switch_run_config(run_backup) prior to
remove_config_backup(run_backup).
- Around line 538-545: The TLS branch in environment.py assumes Docker-side
config swapping but on Prow switch_run_config() is a no-op, causing `@TLS`
scenarios to exercise the wrong run.yaml; update the TLS handling (the block
that sets context.feature_config, context.default_config_backup via
create_config_backup, context.run_yaml_backup, and calls switch_config) to
detect Prow (e.g., an existing PROW env var or a context.is_prow flag) and
either skip TLS scenarios when running on Prow or call an alternative plumbing
path; specifically, guard the TLS branch so it does not perform config swaps on
Prow (or explicitly skip the scenario) and ensure any calls to
switch_run_config()/switch_config() are only executed when not on Prow.
In `@tests/e2e/mock_tls_inference_server/Dockerfile`:
- Around line 1-14: The Dockerfile is still running as root; add a dedicated
non-root user and switch to it before CMD to reduce privileges: create a user
(e.g., adduser or useradd with a fixed UID/GID like 1000), ensure ownership of
copied artifacts and the /certs directory is changed (chown) so that the new
user can access server.py and /certs, perform package installs and file copies
as root first, then add a USER instruction (referencing USER and CMD in the
Dockerfile) to run the container as that non-root user.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a26de868-fde3-4fe4-a391-b36f163c1457
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
docker-compose-library.yamldocker-compose.yamlpyproject.tomlsrc/app/endpoints/responses.pytests/e2e/configs/run-tls-min-version.yamltests/e2e/configs/run-tls-mtls.yamltests/e2e/configs/run-tls-verify-ca.yamltests/e2e/configs/run-tls-verify-default.yamltests/e2e/configs/run-tls-verify-false.yamltests/e2e/configuration/library-mode/lightspeed-stack-tls.yamltests/e2e/configuration/server-mode/lightspeed-stack-tls.yamltests/e2e/features/environment.pytests/e2e/features/tls.featuretests/e2e/mock_tls_inference_server/Dockerfiletests/e2e/mock_tls_inference_server/server.pytests/e2e/test_list.txttests/e2e/utils/utils.py
✅ Files skipped from review due to trivial changes (12)
- pyproject.toml
- src/app/endpoints/responses.py
- tests/e2e/configuration/server-mode/lightspeed-stack-tls.yaml
- tests/e2e/test_list.txt
- tests/e2e/configuration/library-mode/lightspeed-stack-tls.yaml
- tests/e2e/features/tls.feature
- tests/e2e/configs/run-tls-mtls.yaml
- tests/e2e/configs/run-tls-verify-false.yaml
- tests/e2e/configs/run-tls-min-version.yaml
- tests/e2e/configs/run-tls-verify-ca.yaml
- tests/e2e/mock_tls_inference_server/server.py
- tests/e2e/configs/run-tls-verify-default.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/e2e/utils/utils.py
| ports: | ||
| - "8443:8443" | ||
| - "8444:8444" |
There was a problem hiding this comment.
Don't publish the mock TLS ports to the host here either.
The library-mode stack reaches mock-tls-inference over the compose network, so these host bindings only make unrelated local runs fail when 8443 or 8444 is already occupied.
✂️ Proposed fix
- ports:
- - "8443:8443"
- - "8444:8444"
+ expose:
+ - "8443"
+ - "8444"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ports: | |
| - "8443:8443" | |
| - "8444:8444" | |
| expose: | |
| - "8443" | |
| - "8444" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker-compose-library.yaml` around lines 125 - 127, Remove the unnecessary
host port bindings from the mock TLS service so the library-mode stack uses the
compose network only; locate the mock-tls-inference service (look for the
service name "mock-tls-inference") and delete or comment out the ports block
entries "- "8443:8443"" and "- "8444:8444"" (or remove the entire ports: section
if it becomes empty) so the service is reachable only via the compose network
and does not publish ports to the host.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/e2e/features/steps/tls.py (1)
122-235: Consider extracting repeated config initialization into a helper.All five TLS configuration steps share identical boilerplate for ensuring nested dict keys exist before setting TLS options. This could be consolidated into a helper function.
Proposed refactor
def _get_tls_config_section(provider: dict) -> dict: """Ensure provider has config.network.tls structure and return tls dict reference. Parameters: provider: The provider configuration dictionary. Returns: Reference to the tls config dict (creates nested structure if needed). """ if "config" not in provider: provider["config"] = {} if "network" not in provider["config"]: provider["config"]["network"] = {} if "tls" not in provider["config"]["network"]: provider["config"]["network"]["tls"] = {} return provider["config"]["network"] `@given`("Llama Stack is configured with TLS verification disabled") def configure_tls_verify_false(context: Context) -> None: """Configure run.yaml with TLS verify: false.""" _backup_llama_config() config = _load_llama_config() provider = _find_tls_openai_provider(config) _get_tls_config_section(provider)["tls"] = {"verify": False} _write_config(config, _LLAMA_STACK_CONFIG)Apply the same pattern to the other four configuration steps.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/steps/tls.py` around lines 122 - 235, Multiple step handlers (configure_tls_verify_false, configure_tls_verify_ca, configure_tls_verify_true, configure_tls_mtls, configure_tls_min_version) repeat the same nested dict setup for provider["config"]["network"]["tls"]; add a helper function (e.g., _get_tls_config_section(provider)) that ensures provider["config"], provider["config"]["network"], and provider["config"]["network"]["tls"] exist and returns the tls dict (or network dict reference) and then replace the repeated if-blocks in each step to call that helper and assign the TLS fields (keep the configure_tls_mtls change to provider["config"]["base_url"] as-is), then call _write_config as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/features/environment.py`:
- Around line 502-509: The TLS feature handler sets context.feature_config and
calls create_config_backup("lightspeed-stack.yaml") and
switch_config(context.feature_config) but does not call
restart_container("lightspeed-stack"), so the lightspeed-stack container won't
pick up the new TLS config; after switch_config(context.feature_config) add a
call to restart_container("lightspeed-stack") (matching how other tag handlers
like Authorized/RBAC use restart_container) to ensure the container is restarted
and the TLS configuration is applied.
---
Nitpick comments:
In `@tests/e2e/features/steps/tls.py`:
- Around line 122-235: Multiple step handlers (configure_tls_verify_false,
configure_tls_verify_ca, configure_tls_verify_true, configure_tls_mtls,
configure_tls_min_version) repeat the same nested dict setup for
provider["config"]["network"]["tls"]; add a helper function (e.g.,
_get_tls_config_section(provider)) that ensures provider["config"],
provider["config"]["network"], and provider["config"]["network"]["tls"] exist
and returns the tls dict (or network dict reference) and then replace the
repeated if-blocks in each step to call that helper and assign the TLS fields
(keep the configure_tls_mtls change to provider["config"]["base_url"] as-is),
then call _write_config as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7697c5ae-3f60-4820-ae07-8770eefacf4e
📒 Files selected for processing (3)
tests/e2e/features/environment.pytests/e2e/features/steps/tls.pytests/e2e/features/tls.feature
✅ Files skipped from review due to trivial changes (1)
- tests/e2e/features/tls.feature
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/e2e/features/steps/tls.py (1)
153-242: Consider extracting the repeated nested structure setup.The pattern for ensuring nested
config.networkstructure appears in all five TLS configuration functions:if "config" not in provider: provider["config"] = {} if "network" not in provider["config"]: provider["config"]["network"] = {}This could be consolidated into a helper that returns the prepared provider or merges into
_ensure_tls_provider. Given your comment about future trustme refactoring, this is low priority—but it would reduce repetition and simplify each step function to focus on its specific TLS settings.♻️ Optional: Extract network config setup
+def _ensure_network_config(provider: dict) -> dict: + """Ensure provider has config.network structure and return the network dict.""" + provider.setdefault("config", {}).setdefault("network", {}) + return provider["config"]["network"] + + `@given`("Llama Stack is configured with TLS verification disabled") def configure_tls_verify_false(context: Context) -> None: _backup_llama_config() config = _load_llama_config() provider = _ensure_tls_provider(config) - - if "config" not in provider: - provider["config"] = {} - if "network" not in provider["config"]: - provider["config"]["network"] = {} - - provider["config"]["network"]["tls"] = {"verify": False} + network = _ensure_network_config(provider) + network["tls"] = {"verify": False} _write_config(config, _LLAMA_STACK_CONFIG)Apply similar simplification to the other four TLS configuration functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/steps/tls.py` around lines 153 - 242, The TLS step functions (configure_tls_verify_false, configure_tls_verify_ca, configure_tls_verify_true, configure_tls_mtls) all repeat the same nested setup for provider["config"]["network"]; extract this into a small helper (e.g., ensure_provider_network(provider) or extend _ensure_tls_provider) that ensures provider["config"] and provider["config"]["network"] exist and returns the provider, then replace the repeated if-blocks in each step with a single call to that helper so each step only sets the tls-specific fields and base_url.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/e2e/features/steps/tls.py`:
- Around line 153-242: The TLS step functions (configure_tls_verify_false,
configure_tls_verify_ca, configure_tls_verify_true, configure_tls_mtls) all
repeat the same nested setup for provider["config"]["network"]; extract this
into a small helper (e.g., ensure_provider_network(provider) or extend
_ensure_tls_provider) that ensures provider["config"] and
provider["config"]["network"] exist and returns the provider, then replace the
repeated if-blocks in each step with a single call to that helper so each step
only sets the tls-specific fields and base_url.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8de213f2-765f-4880-8a13-a644223e4193
📒 Files selected for processing (1)
tests/e2e/features/steps/tls.py
|
@jrobertboos you need to rebase + resolve conflicts |
d4d67ea to
b8daafc
Compare
|
@tisnik rebased and resolved conflicts :) |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker-compose-library.yaml`:
- Around line 132-137: The current Docker healthcheck block only tests the TLS
listener at https://localhost:8443/health; add a second check for the mTLS
listener on port 8444 (or replace the single test with a script that checks both
ports) by updating the healthcheck "test" to verify
https://localhost:8444/health as well (or run a small shell script that curls
both endpoints), being mindful that the mTLS endpoint may require a client
certificate—if so supply the cert/key to curl or accept the limitation and
document it; look for the existing healthcheck block and its test array to
implement the additional port check.
In `@docker-compose.yaml`:
- Around line 160-165: The current healthcheck (healthcheck.test) only probes
the TLS listener on port 8443 and misses the mTLS listener on port 8444; update
healthcheck.test to verify both endpoints: keep the existing check for
https://localhost:8443 and add a second check for https://localhost:8444 that
performs an mTLS handshake (load a client cert/key and CA into an SSL context or
call a helper script that uses openssl s_client) so the container fails
healthcheck if either TLS or mTLS listeners are not bound/responding.
In `@tests/e2e/features/steps/proxy.py`:
- Around line 140-157: The TLS backup constant _LLAMA_STACK_TLS_BACKUP is
defined inside the function while _LLAMA_STACK_CONFIG_BACKUP lives at module
scope; move the declaration of _LLAMA_STACK_TLS_BACKUP to module-level alongside
_LLAMA_STACK_CONFIG_BACKUP so both constants are defined once per module. Update
any references in the function (the code that checks os.path.exists, sets
backup_to_restore, and the cleanup loop) to use the now-module-level
_LLAMA_STACK_TLS_BACKUP; ensure imports (os, shutil) and restart_container usage
remain unchanged.
In `@tests/e2e/features/tls.feature`:
- Around line 32-41: Add more TLS failure scenarios to the existing feature to
cover mTLS missing/invalid client certificate, expired server certificate, and
certificate hostname mismatch: add new Scenario blocks similar to the existing
one that enable the specific TLS condition (e.g., configure Llama Stack for mTLS
without client cert, connect to a server using an expired cert, and connect to a
server with hostname mismatch), restart Llama and Lightspeed stacks, send the
same "query" request (use distinct provider/model identifiers such as
"mtls-openai", "expired-cert-openai", "hostname-mismatch-openai") and assert the
response status is 500 and the body does not contain the successful mock
response string ("Hello from the TLS mock inference server"). Ensure each
scenario includes clear titles and the same Given/When/Then steps pattern used
in the existing Scenario.
- Around line 43-51: The healthcheck for the mock-tls-inference service
currently only verifies port 8443, so add verification for the mutual-TLS
listener on port 8444 by updating the mock-tls-inference healthcheck to also
probe https://localhost:8444/health (or add a separate healthcheck entry) using
an SSL-aware check that disables certificate verification for the test
environment; update the service's healthcheck block (healthcheck / test for
mock-tls-inference) to include a command that accesses port 8444 with an SSL
context (or add a second test that targets 8444) so Docker marks the service
healthy only after the mTLS endpoint is ready.
In `@tests/e2e/mock_tls_inference_server/Dockerfile`:
- Line 5: The Dockerfile currently runs an unpinned pip install (RUN pip install
--no-cache-dir trustme); change this to install a specific, pinned trustme
version (e.g., trustme==<desired_version>) to ensure reproducible builds and
avoid transient dependency changes; update the RUN line in the Dockerfile to
include the chosen exact version and rebuild to verify compatibility.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8343d531-334b-4e2d-88bb-d5d2651480f1
📒 Files selected for processing (11)
docker-compose-library.yamldocker-compose.yamltests/e2e/configuration/library-mode/lightspeed-stack-tls.yamltests/e2e/configuration/server-mode/lightspeed-stack-tls.yamltests/e2e/features/environment.pytests/e2e/features/steps/proxy.pytests/e2e/features/steps/tls.pytests/e2e/features/tls.featuretests/e2e/mock_tls_inference_server/Dockerfiletests/e2e/mock_tls_inference_server/server.pytests/e2e/test_list.txt
📜 Review details
🧰 Additional context used
📓 Path-based instructions (5)
tests/e2e/test_list.txt
📄 CodeRabbit inference engine (AGENTS.md)
Maintain test list in
tests/e2e/test_list.txt
Files:
tests/e2e/test_list.txt
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use absolute imports for internal modules from the Lightspeed Core Stack (e.g.,
from authentication import get_auth_dependency)
Files:
tests/e2e/features/steps/proxy.pytests/e2e/features/environment.pytests/e2e/mock_tls_inference_server/server.pytests/e2e/features/steps/tls.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests
Files:
tests/e2e/features/steps/proxy.pytests/e2e/features/environment.pytests/e2e/mock_tls_inference_server/server.pytests/e2e/features/steps/tls.py
tests/e2e/features/steps/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Define E2E test step implementations in
tests/e2e/features/steps/directory
Files:
tests/e2e/features/steps/proxy.pytests/e2e/features/steps/tls.py
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (AGENTS.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/tls.feature
🧠 Learnings (8)
📚 Learning: 2026-03-29T10:11:27.124Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-29T10:11:27.124Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt`
Applied to files:
tests/e2e/test_list.txt
📚 Learning: 2026-03-29T10:11:27.124Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-29T10:11:27.124Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Applied to files:
tests/e2e/test_list.txttests/e2e/features/environment.pytests/e2e/features/tls.feature
📚 Learning: 2026-01-12T10:58:50.251Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:50.251Z
Learning: In the lightspeed-stack repository, when a user claims a fix is done but the code still shows the original issue, verify the current state of the code before accepting the fix.
Applied to files:
tests/e2e/features/environment.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack-tls.yamltests/e2e/configuration/server-mode/lightspeed-stack-tls.yaml
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack-tls.yamltests/e2e/configuration/server-mode/lightspeed-stack-tls.yaml
📚 Learning: 2026-02-19T10:06:58.708Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:58.708Z
Learning: In tests/e2e-prow/rhoai/, ConfigMaps like mock-jwks-script and mcp-mock-server-script are created dynamically by the pipeline.sh deployment script using `oc create configmap` commands, rather than being defined as static ConfigMap resources in the manifest YAML files.
Applied to files:
docker-compose.yamldocker-compose-library.yaml
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-tls.yaml
📚 Learning: 2026-03-29T10:11:27.124Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-29T10:11:27.124Z
Learning: Applies to tests/e2e/features/steps/**/*.py : Define E2E test step implementations in `tests/e2e/features/steps/` directory
Applied to files:
tests/e2e/features/steps/tls.py
🪛 Checkov (3.2.510)
tests/e2e/mock_tls_inference_server/Dockerfile
[low] 1-14: Ensure that HEALTHCHECK instructions have been added to container images
(CKV_DOCKER_2)
[low] 1-14: Ensure that a user for the container has been created
(CKV_DOCKER_3)
🪛 Hadolint (2.14.0)
tests/e2e/mock_tls_inference_server/Dockerfile
[warning] 5-5: Pin versions in pip. Instead of pip install <package> use pip install <package>==<version> or pip install --requirement <requirements file>
(DL3013)
🪛 Trivy (0.69.3)
tests/e2e/mock_tls_inference_server/Dockerfile
[error] 1-1: Image user should not be 'root'
Specify at least 1 USER command in Dockerfile with non-root user as argument
Rule: DS-0002
(IaC/Dockerfile)
[info] 1-1: No HEALTHCHECK defined
Add HEALTHCHECK instruction in your Dockerfile
Rule: DS-0026
(IaC/Dockerfile)
🔇 Additional comments (21)
tests/e2e/mock_tls_inference_server/server.py (3)
67-72: Reject empty or malformed completion payloads.The handler silently swallows JSON parse errors and falls back to an empty dict, allowing malformed requests to succeed with HTTP 200. This could mask broken client request construction in TLS tests.
🐛 Proposed fix
try: request_data = json.loads(body.decode("utf-8")) except (json.JSONDecodeError, UnicodeDecodeError): - request_data = {} - - model = request_data.get("model", MODEL_ID) + self.send_error(400, "Invalid JSON body") + return + + if not isinstance(request_data, dict) or "model" not in request_data: + self.send_error(400, "Missing required field: model") + return + + model = request_data["model"]
108-128: Consider adding TLS version parameters to_make_tls_context.For the
run-tls-min-version.yamlscenario that pins the client toTLSv1.3, the server context has no explicit version bounds. Adding optionalminimum_version/maximum_versionparameters would make version negotiation tests more deterministic across base-image updates.
140-212: LGTM!The main function properly generates certificates with trustme, exports them to
/certsfor test consumption, and starts both TLS and mTLS servers in daemon threads. The certificate SANs include both Docker service name and localhost, which is correct for the test environment.tests/e2e/mock_tls_inference_server/Dockerfile (1)
1-14: Run the mock TLS server as a non-root user.Container currently runs as root, which weakens container security posture.
Proposed fix
FROM python:3.12-slim WORKDIR /app +RUN useradd --create-home --uid 10001 appuser # Install trustme for dynamic certificate generation RUN pip install --no-cache-dir trustme # Copy server script COPY server.py . # Create /certs directory for generated certificates -RUN mkdir -p /certs +RUN mkdir -p /certs && chown appuser:appuser /certs + +USER appuser EXPOSE 8443 8444 CMD ["python", "server.py"]docker-compose-library.yaml (2)
125-127: Don't publish the mock TLS ports to the host.The library-mode stack reaches
mock-tls-inferenceover the compose network, so these host bindings only add potential startup failures when ports are already in use.Proposed fix
- ports: - - "8443:8443" - - "8444:8444" + expose: + - "8443" + - "8444"
33-34: LGTM!The
mock-tls-inferencedependency andmock-tls-certsvolume mount forlightspeed-stackare correctly configured to ensure the TLS mock server is healthy before the stack starts, and that the generated certificates are available.Also applies to: 45-45
docker-compose.yaml (2)
153-155: Don't publish the mock TLS ports to the host.Everything communicates via
lightspeednet, and the healthcheck runs in-container. Binding8443/8444on the host only creates potential port conflicts.Proposed fix
- ports: - - "8443:8443" - - "8444:8444" + expose: + - "8443" + - "8444"
28-37: LGTM!The
llama-stackservice correctly depends onmock-tls-inferencebeing healthy and mounts themock-tls-certsvolume at/certs:ro. This allows TLS test steps to configurerun.yamlwith certificate paths like/certs/ca.crtthat will be accessible inside the container.tests/e2e/test_list.txt (1)
23-23: LGTM!The
features/tls.featureentry is correctly added to the test list. Based on learnings, maintaining this file is required for the behave test framework. Consider adding a trailing newline for POSIX compliance.tests/e2e/features/environment.py (1)
555-562: LGTM!The TLS teardown logic correctly restores the config, removes the backup, conditionally restarts
llama-stackin server mode, and always restartslightspeed-stackto apply the restored configuration. Thetls_config_activeflag prevents duplicate restoration.tests/e2e/features/steps/proxy.py (1)
148-157: LGTM!The backup restoration logic correctly handles both proxy and TLS scenarios, with appropriate cleanup of stale backups. This ensures each scenario starts with a clean configuration.
tests/e2e/configuration/server-mode/lightspeed-stack-tls.yaml (1)
1-22: LGTM!The server-mode TLS configuration correctly sets up Lightspeed Stack to use
llama-stack:8321as a remote service with thetls-openaiprovider andmock-tls-model. This aligns with the TLS e2e test setup where the TLS step definitions configurerun.yamlto register thetls-openaiprovider pointing to the mock TLS inference server.tests/e2e/features/steps/tls.py (7)
127-146: Library-mode path (line 138-139) is currently dead code due to@skip-in-library-mode.As noted in the config file review, the library-mode branch in the path selection will never execute because the feature is tagged
@skip-in-library-mode. This is consistent—either remove the library-mode handling here and the library-mode config file, or remove the skip tag if TLS tests should work in library mode.
1-23: LGTM! Module structure and imports follow project conventions.The module uses absolute imports from
tests.e2e.utils.utilsas required by coding guidelines, and the docstring clearly explains the config-switching pattern used across e2e tests.
53-100: LGTM! Provider template and ensure function are well-structured.Good use of
copy.deepcopyto prevent mutation of the template dictionaries. The_ensure_tls_providerfunction correctly handles both finding existing providers and creating new ones with the associated model resource.
152-161: LGTM! TLS verification disabled step is correctly implemented.Sets
verify: Falsewhich will be serialized asfalsein YAML by PyYAML'syaml.dump.
193-210: LGTM! Mutual TLS configuration step correctly updates both base_url and TLS settings.The step properly:
- Changes
base_urlto use the mTLS server port 8444- Sets CA verification, client certificate, and client key paths
213-226: LGTM! TLS minimum version step is parameterized and flexible.The step accepts the version string as a parameter, allowing tests to specify different TLS versions (e.g., "TLSv1.2", "TLSv1.3").
102-118: The restoration coordination forrun.yaml.tls-backupis already properly implemented in the sharedrestore_if_modifiedstep (proxy.py:128-157). The step correctly handles both proxy and TLS backup files, restores the appropriate one, cleans it up after restoration, and even handles cleanup of the other backup type if present. No action required.tests/e2e/features/tls.feature (1)
1-11: LGTM! Feature file structure and Background setup are well-organized.The feature correctly uses
@skip-in-library-modetag, sets up the TLS testing environment in Background, and includes a cleanup step to restore configuration. The Gherkin syntax follows behave framework conventions.tests/e2e/configuration/library-mode/lightspeed-stack-tls.yaml (1)
1-21:⚠️ Potential issue | 🟡 MinorLibrary-mode TLS configuration is dead code due to
@skip-in-library-modetag.The
tls.featureis tagged with@skip-in-library-mode, which skips the entire feature in library mode. However,configure_lightspeed_for_tlsintls.pycontains logic to select this library-mode configuration whencontext.is_library_modeis true. Since the feature that calls this function is skipped in library mode, the library-mode config file is unreachable.Either remove
@skip-in-library-modeif TLS tests should run in library mode, or remove this library-mode configuration file if they cannot.⛔ Skipped due to learnings
Learnt from: radofuchs Repo: lightspeed-core/lightspeed-stack PR: 485 File: tests/e2e/features/environment.py:87-95 Timestamp: 2025-09-02T11:09:40.404Z Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.Learnt from: luis5tb Repo: lightspeed-core/lightspeed-stack PR: 727 File: src/app/endpoints/a2a.py:43-43 Timestamp: 2025-10-29T13:05:22.438Z Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
| healthcheck: | ||
| test: ["CMD", "python", "-c", "import urllib.request,ssl;c=ssl.create_default_context();c.check_hostname=False;c.verify_mode=ssl.CERT_NONE;urllib.request.urlopen('https://localhost:8443/health',context=c)"] | ||
| interval: 5s | ||
| timeout: 3s | ||
| retries: 3 | ||
| start_period: 5s |
There was a problem hiding this comment.
Healthcheck only verifies the TLS listener, not the mTLS listener.
Same concern as in docker-compose-library.yaml: the healthcheck only validates port 8443. If port 8444 (mTLS) fails to bind, mTLS test scenarios could run against a non-functional endpoint.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker-compose.yaml` around lines 160 - 165, The current healthcheck
(healthcheck.test) only probes the TLS listener on port 8443 and misses the mTLS
listener on port 8444; update healthcheck.test to verify both endpoints: keep
the existing check for https://localhost:8443 and add a second check for
https://localhost:8444 that performs an mTLS handshake (load a client cert/key
and CA into an SSL context or call a helper script that uses openssl s_client)
so the container fails healthcheck if either TLS or mTLS listeners are not
bound/responding.
| # Check for backups from both proxy and TLS scenarios | ||
| _LLAMA_STACK_TLS_BACKUP = "run.yaml.tls-backup" | ||
| backup_to_restore = None | ||
| if os.path.exists(_LLAMA_STACK_CONFIG_BACKUP): | ||
| print("Restoring original Llama Stack config from backup...") | ||
| shutil.copy(_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_CONFIG) | ||
| os.remove(_LLAMA_STACK_CONFIG_BACKUP) | ||
| backup_to_restore = _LLAMA_STACK_CONFIG_BACKUP | ||
| elif os.path.exists(_LLAMA_STACK_TLS_BACKUP): | ||
| backup_to_restore = _LLAMA_STACK_TLS_BACKUP | ||
|
|
||
| if backup_to_restore: | ||
| print(f"Restoring original Llama Stack config from {backup_to_restore}...") | ||
| shutil.copy(backup_to_restore, _LLAMA_STACK_CONFIG) | ||
| os.remove(backup_to_restore) | ||
| # Clean up the other backup too if it exists | ||
| for other_backup in [_LLAMA_STACK_CONFIG_BACKUP, _LLAMA_STACK_TLS_BACKUP]: | ||
| if other_backup != backup_to_restore and os.path.exists(other_backup): | ||
| os.remove(other_backup) | ||
| restart_container("llama-stack") | ||
| restart_container("lightspeed-stack") |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider moving _LLAMA_STACK_TLS_BACKUP to module level.
The TLS backup path constant is defined inside the function while _LLAMA_STACK_CONFIG_BACKUP is at module level. For consistency and to avoid recreating the string on each call, consider moving it to module level.
Proposed change
# Llama Stack config — mounted into the container from the host
_LLAMA_STACK_CONFIG = "run.yaml"
_LLAMA_STACK_CONFIG_BACKUP = "run.yaml.proxy-backup"
+_LLAMA_STACK_TLS_BACKUP = "run.yaml.tls-backup"Then in the function:
`@given`("The original Llama Stack config is restored if modified")
def restore_if_modified(context: Context) -> None:
...
# Stop any leftover proxy servers from previous scenario
_stop_proxy(context, "tunnel_proxy", "proxy_loop")
_stop_proxy(context, "interception_proxy", "interception_proxy_loop")
# Check for backups from both proxy and TLS scenarios
- _LLAMA_STACK_TLS_BACKUP = "run.yaml.tls-backup"
backup_to_restore = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/features/steps/proxy.py` around lines 140 - 157, The TLS backup
constant _LLAMA_STACK_TLS_BACKUP is defined inside the function while
_LLAMA_STACK_CONFIG_BACKUP lives at module scope; move the declaration of
_LLAMA_STACK_TLS_BACKUP to module-level alongside _LLAMA_STACK_CONFIG_BACKUP so
both constants are defined once per module. Update any references in the
function (the code that checks os.path.exists, sets backup_to_restore, and the
cleanup loop) to use the now-module-level _LLAMA_STACK_TLS_BACKUP; ensure
imports (os, shutil) and restart_container usage remain unchanged.
| WORKDIR /app | ||
|
|
||
| # Install trustme for dynamic certificate generation | ||
| RUN pip install --no-cache-dir trustme |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Pin the trustme version for reproducible builds.
Static analysis flagged unpinned pip install. Pinning ensures consistent behavior across builds.
Proposed fix
-RUN pip install --no-cache-dir trustme
+RUN pip install --no-cache-dir trustme==1.2.1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RUN pip install --no-cache-dir trustme | |
| RUN pip install --no-cache-dir trustme==1.2.1 |
🧰 Tools
🪛 Hadolint (2.14.0)
[warning] 5-5: Pin versions in pip. Instead of pip install <package> use pip install <package>==<version> or pip install --requirement <requirements file>
(DL3013)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/mock_tls_inference_server/Dockerfile` at line 5, The Dockerfile
currently runs an unpinned pip install (RUN pip install --no-cache-dir trustme);
change this to install a specific, pinned trustme version (e.g.,
trustme==<desired_version>) to ensure reproducible builds and avoid transient
dependency changes; update the RUN line in the Dockerfile to include the chosen
exact version and rebuild to verify compatibility.
| @@ -0,0 +1,182 @@ | |||
| @skip-in-library-mode | |||
There was a problem hiding this comment.
run these tests also against the other providers(i.e. enable temporarily the workflow for e2e-providers to run on your PR)
There was a problem hiding this comment.
I won't be able to run these TLS tests against any of the other providers because this basically uses a specific openai style provider in order to connect with the mock tls server (as the mock tis server mocks an openai inference server)
| @@ -0,0 +1,316 @@ | |||
| #!/usr/bin/env python3 | |||
| """Mock OpenAI-compatible HTTPS inference server for TLS e2e testing. | |||
There was a problem hiding this comment.
please create this also in the prow env, otherwise it will not work there at all
There was a problem hiding this comment.
- I will make a ticket for prow/konflux
- I will make a ticket for testing broken signature
max-svistunov
left a comment
There was a problem hiding this comment.
Overall good test coverage of tls. PTAL at the proxy.py feedback. Also see about Radovan's feedback regarding prow/konflux.
|
@jrobertboos JIC, test_list.txt -- make sure to restore it before merging, it currently only has tls.featur |
| try: | ||
| request_data = json.loads(body.decode("utf-8")) | ||
| except (json.JSONDecodeError, UnicodeDecodeError): | ||
| request_data = {} |
There was a problem hiding this comment.
so exception is fine and accepted?
There was a problem hiding this comment.
Since this is a mock server for TLS e2e testing, the focus is on the TLS, not strict request validation so as long as the client can successfully connect to server all is good (at least imo).
| # .github/workflows/e2e_tests.yml | ||
| name: E2E Inference Provider Tests | ||
|
|
||
| on: |
There was a problem hiding this comment.
remember to undo this change before merging
(lcore-1251) fixed tls tests & removed other e2e tests for quicker test running (lcore-1251) restored test_list.txt (lcore-1251) use `trustme` for certs (lcore-1251) quick tls server fix (lcore-1251) removed tags in place of steps (fix) removed unused code fix tls config verified correct llm response clean LCORE-1253: Add e2e proxy and TLS networking tests Add comprehensive end-to-end tests verifying that Llama Stack's NetworkConfig (proxy, TLS) works correctly through the Lightspeed Stack pipeline. Test infrastructure: - TunnelProxy: Async HTTP CONNECT tunnel proxy that creates TCP tunnels for HTTPS traffic. Tracks CONNECT count and target hosts. - InterceptionProxy: Async TLS-intercepting (MITM) proxy using trustme CA to generate per-target server certificates. Simulates corporate SSL inspection proxies. Behave scenarios (tests/e2e/features/proxy.feature): - Tunnel proxy: Configures run.yaml with NetworkConfig proxy pointing to a local tunnel proxy. Verifies CONNECT to api.openai.com:443 is observed and the LLM query succeeds through the proxy. - Interception proxy: Configures run.yaml with proxy and custom CA cert (trustme). Verifies TLS interception of api.openai.com traffic and successful LLM query through the MITM proxy. - TLS version: Configures run.yaml with min_version TLSv1.2 and verifies the LLM query succeeds with the TLS constraint. Each scenario dynamically generates a modified run-ci.yaml with the appropriate NetworkConfig, restarts Llama Stack with the new config, restarts the Lightspeed Stack, and sends a query to verify the full pipeline. Added trustme>=1.2.1 to dev dependencies. LCORE-1253: Add negative tests, TLS/cipher scenarios, and cleanup hooks Expand proxy e2e test coverage to fully address all acceptance criteria: AC1 (tunnel proxy): - Add negative test: LLM query fails gracefully when proxy is unreachable AC2 (interception proxy with CA): - Add negative test: LLM query fails when interception proxy CA is not provided (verifies "only successful when correct CA is provided") AC3 (TLS version and ciphers): - Add TLSv1.3 minimum version scenario - Add custom cipher suite configuration scenario (ECDHE+AESGCM:DHE+AESGCM) Test infrastructure: - Add after_scenario cleanup hook in environment.py that restores original Llama Stack and Lightspeed Stack configs after @Proxy scenarios. Prevents config leaks between scenarios. - Use different ports for each interception proxy instance to avoid address-already-in-use errors in sequential scenarios. Documentation: - Update docs/e2e_scenarios.md with all 7 proxy test scenarios. - Update docs/e2e_testing.md with proxy-related Behave tags (@Proxy, @tunnelproxy, @InterceptionProxy, @TLSVersion, @tlscipher). LCORE-1253: Address review feedback Changes requested by reviewer (tisnik) and CodeRabbit: - Detect Docker mode once in before_all and store as context.is_docker_mode. All proxy step functions now use the context attribute instead of calling _is_docker_mode() repeatedly. - Log exception in _restore_original_services instead of silently swallowing it. - Only clear context.services_modified on successful restoration, not when restoration fails (prevents leaking modified state). - Add 10-second timeout to tunnel proxy open_connection to prevent stalls on unreachable targets. - Handle malformed CONNECT port with ValueError catch and 400 response. LCORE-1253: Replace tag-based cleanup with Background restore step Move config restoration from @Proxy after_scenario hook to an explicit Background Given step. This follows the team convention that tags are used only for test selection (filtering), not for triggering behavior. The Background step "The original Llama Stack config is restored if modified" runs before every scenario. If a previous scenario left a modified run.yaml (detected by backup file existence), it restores the original and restarts services. This handles cleanup even when the previous scenario failed mid-way. Removed: - @Proxy tag from feature file (was triggering after_scenario hook) - after_scenario hook for @Proxy in environment.py - _restore_original_services function (replaced by Background step) - context.services_modified tracking (no hook reads it) Updated docs/e2e_testing.md: tags documented as selection-only, not behavior-triggering. LCORE-1253: Address radofuchs review feedback Rewrite proxy e2e tests to follow project conventions: - Reuse existing step definitions: use "I use query to ask question" from llm_query_response.py and "The status code of the response is" from common_http.py instead of custom query/response steps. - Split service restart into two explicit Given steps: "Llama Stack is restarted" and "Lightspeed Stack is restarted" so restart ordering is visible in the feature file. - Remove local (non-Docker) mode code path. Proxy tests use restart_container() exclusively, consistent with the rest of the e2e test suite. - Check specific status code 500 for error scenarios instead of the broad >= 400 range. - Remove custom send_query, verify_llm_response, and verify_error_response steps that duplicated existing functionality. Net reduction: -183 lines from step definitions. LCORE-1253: Clean up proxy servers between scenarios Stop proxy servers and their event loops explicitly in the Background restore step. Previously, proxy daemon threads were left running after each scenario, causing asyncio "Task was destroyed but it is pending" warnings at process exit. The _stop_proxy helper schedules an async stop on the proxy's event loop, waits for it to complete, then stops the loop. Context references are cleared so the next scenario starts clean. LCORE-1253: Stop proxy servers after last scenario in after_feature Add proxy cleanup in after_feature to stop proxy servers left running from the last scenario. The Background restore step handles cleanup between scenarios, but the last scenario's proxies persist until process exit, causing asyncio "Task was destroyed" warnings. The cleanup checks for proxy objects on context (no tag check needed) and calls _stop_proxy to gracefully shut down the event loops. fixed dup steps addressed comments debug fix fix readded tests to test fix addressed comments update added invalid cert test readded test list added expired and untrusted certs added more tests fix added hostname mismatch tests fix format addressed comments addressed comments fixed style run e2e providers fix improved healthcheck removed multiple health checks re added tests fix fix fix
tisnik
left a comment
There was a problem hiding this comment.
Approved
for the record: the plan is to create new tasks for:
- Extend tests to prow/konflux
- Test broken signatures
| t.start() | ||
|
|
||
| # Keep main thread alive (daemon threads run until container stops) | ||
| threading.Event().wait() |
There was a problem hiding this comment.
so basically - you start daemon threads, so there is no wait to join them, and we just trust the container to take care of cleanup?
There was a problem hiding this comment.
yea but if you really think we need dedicated clean up I can do that quickly, I just think it is a little overkill for the purpose of a mock server
|
For Exposure: Here are the follow up tasks: |
max-svistunov
left a comment
There was a problem hiding this comment.
Feedback addressed, thanks, just one note about EXPOSE.
| # Create /certs directory for generated certificates | ||
| RUN mkdir -p /certs | ||
|
|
||
| EXPOSE 8443 8444 |
There was a problem hiding this comment.
EXPOSE lists these two but doesn't list 8445 (hostname-mismatch port). This should either list all 3 or be removed (container is only accessed over the docker network).
max-svistunov
left a comment
There was a problem hiding this comment.
LGTM (one nit, but don't want to block)
Description
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Release Notes