LCORE-1860: Add E2E tests for degraded mode startup scenarios#1992
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📜 Recent review details⏰ Context from checks skipped due to timeout. (17)
🧰 Additional context used📓 Path-based instructions (2)tests/e2e/**/*.{py,feature}📄 CodeRabbit inference engine (AGENTS.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (4)📚 Learning: 2026-05-20T08:09:30.641ZApplied to files:
📚 Learning: 2026-04-07T09:20:26.590ZApplied to files:
📚 Learning: 2026-04-13T13:39:54.963ZApplied to files:
📚 Learning: 2026-06-24T13:45:37.249ZApplied to files:
🔇 Additional comments (3)
WalkthroughAdds a degraded-mode e2e service configuration, a new step that checks response-body substrings, and feature scenarios that verify metrics and readiness when llama-stack is available or disrupted at startup. ChangesDegraded mode startup E2E
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/e2e/features/degraded_mode_startup.feature`:
- Around line 58-62: Remove the `@skip` tag from the degraded mode startup
scenario so it is not excluded by the e2e behave runs; keep the scenario gated
with a CI-specific tag expression and retain `@manual` only if it still needs to
be excluded from automation. Update the tags on the Scenario in
degraded_mode_startup.feature so the Llama Stack startup check can actually
execute in CI.
In `@tests/e2e/utils/utils.py`:
- Around line 224-225: The liveness check fallback in the utility that waits for
a container response currently prints a message and returns, which can mask a
restart failure. Update the helper in utils.py where the retry loop ends so it
fails immediately by raising an assertion or error instead of returning
silently, keeping the existing container_name context in the failure message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b1335973-46d9-423a-90d9-c2a2b542ee79
📒 Files selected for processing (5)
tests/e2e/configuration/server-mode/lightspeed-stack-degraded-mode.yamltests/e2e/features/degraded_mode_startup.featuretests/e2e/features/steps/common.pytests/e2e/features/steps/common_http.pytests/e2e/utils/utils.py
📜 Review details
⏰ Context from checks skipped due to timeout. (18)
- GitHub Check: bandit
- GitHub Check: Pyright
- GitHub Check: build-pr
- GitHub Check: integration_tests (3.12)
- GitHub Check: spectral
- GitHub Check: Pylinter
- GitHub Check: integration_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/e2e/features/steps/common.pytests/e2e/features/steps/common_http.pytests/e2e/utils/utils.py
tests/e2e/**/*.{py,feature}
📄 CodeRabbit inference engine (AGENTS.md)
Use behave (BDD) framework for end-to-end testing with Gherkin feature files
Files:
tests/e2e/features/steps/common.pytests/e2e/features/degraded_mode_startup.featuretests/e2e/features/steps/common_http.pytests/e2e/utils/utils.py
🧠 Learnings (4)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-degraded-mode.yaml
📚 Learning: 2026-04-07T09:20:26.590Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.
Applied to files:
tests/e2e/features/steps/common.pytests/e2e/features/steps/common_http.py
📚 Learning: 2026-04-13T13:39:54.963Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.
Applied to files:
tests/e2e/features/steps/common.pytests/e2e/features/steps/common_http.py
📚 Learning: 2026-06-24T13:45:37.249Z
Learnt from: Jdubrick
Repo: lightspeed-core/lightspeed-stack PR: 1971
File: src/utils/markdown_repair.py:31-36
Timestamp: 2026-06-24T13:45:37.249Z
Learning: In the lightspeed-stack repository, docstrings must use the section header name "Parameters:" (not "Args:") for function arguments, even if the project references Google Python docstring conventions. Ensure docstrings follow the project’s established "Parameters:" header format for any documented function parameters.
Applied to files:
tests/e2e/features/steps/common.pytests/e2e/features/steps/common_http.pytests/e2e/utils/utils.py
🪛 GitHub Actions: Black / 0_black.txt
tests/e2e/utils/utils.py
[error] 1-1: Black --check failed: formatting mismatch. Output indicates this file 'would be reformatted' (1 file would be reformatted), causing exit code 1. Run 'black src tests' (or 'black --write tests/e2e/utils/utils.py') to apply formatting.
🪛 GitHub Actions: Black / black
tests/e2e/utils/utils.py
[error] 1-1: Black --check would reformat files. 1 file would be reformatted (tests/e2e/utils/utils.py). Run 'uv tool run black' (without --check) or 'black --write' to fix formatting.
🔇 Additional comments (5)
tests/e2e/configuration/server-mode/lightspeed-stack-degraded-mode.yaml (1)
1-25: LGTM!tests/e2e/features/degraded_mode_startup.feature (1)
1-57: LGTM!Also applies to: 63-67
tests/e2e/features/steps/common_http.py (1)
115-126: LGTM!tests/e2e/features/steps/common.py (1)
193-207: LGTM!tests/e2e/utils/utils.py (1)
184-223: LGTM!Also applies to: 461-467, 500-500
| print(f"{container_name} did not respond after 30 attempts (degraded mode)") | ||
| return |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Fail fast when liveness never becomes 200.
This path currently logs and returns, which can hide a failed restart and cause later, less-actionable test failures. Raise an assertion/error here.
Proposed fix
- print(f"{container_name} did not respond after 30 attempts (degraded mode)")
- return
+ raise AssertionError(
+ f"{container_name} did not respond on /liveness after 30 attempts (degraded mode)"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f"{container_name} did not respond after 30 attempts (degraded mode)") | |
| return | |
| raise AssertionError( | |
| f"{container_name} did not respond on /liveness after 30 attempts (degraded mode)" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/e2e/utils/utils.py` around lines 224 - 225, The liveness check fallback
in the utility that waits for a container response currently prints a message
and returns, which can mask a restart failure. Update the helper in utils.py
where the retry loop ends so it fails immediately by raising an assertion or
error instead of returning silently, keeping the existing container_name context
in the failure message.
580cef9 to
24d5db9
Compare
| max_attempts (int): Maximum number of health check attempts (default 6). | ||
| skip_health_check (bool): If True, skip health check (for degraded mode tests). | ||
| """ | ||
| if skip_health_check: |
There was a problem hiding this comment.
we already use the liveness endpoint to determine if the lightspeed-stack container is healthy, so why do we actually need this?
| And REST API service prefix is /v1 | ||
| And the Lightspeed stack configuration directory is "tests/e2e/configuration" | ||
|
|
||
| @skip-health-check |
There was a problem hiding this comment.
hiding functionality behind tags is what we are actively trying to avoid now, please find different way to do this if actually necessary
| And the Lightspeed stack configuration directory is "tests/e2e/configuration" | ||
|
|
||
| @skip-health-check | ||
| Scenario: Service starts in degraded mode when llama-stack is not running |
There was a problem hiding this comment.
this test can essentially never fail, the liveness check is part of the helthcheck for the container. So the assertion and basically the whole test is redundant
| """ | ||
|
|
||
| @skip-health-check | ||
| Scenario: Metrics endpoint works in degraded mode |
There was a problem hiding this comment.
this test is redundant, the two that follow do the same thing
| And The response body contains "ls_started_in_degraded_mode 1.0" | ||
|
|
||
| Scenario: Degraded mode metric is set to 0.0 when started with llama-stack | ||
| Given Llama Stack is restarted |
There was a problem hiding this comment.
you can avoid this unnecessary extra restart of llama-stack service if this test is run as first in the feature file
Add end-to-end tests to verify LCORE behavior when starting without llama-stack and allow_degraded_mode is enabled.
24d5db9 to
8150a5f
Compare
Description
Add end-to-end tests to verify LCORE behavior when starting without llama-stack and
allow_degraded_modeis enabled.Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Bug Fixes