Skip to content

LCORE-1860: Add E2E tests for degraded mode startup scenarios#1992

Merged
tisnik merged 1 commit into
lightspeed-core:mainfrom
anik120:degraded-mode-tests
Jun 25, 2026
Merged

LCORE-1860: Add E2E tests for degraded mode startup scenarios#1992
tisnik merged 1 commit into
lightspeed-core:mainfrom
anik120:degraded-mode-tests

Conversation

@anik120

@anik120 anik120 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Description

Add end-to-end tests to verify LCORE behavior when starting without llama-stack and allow_degraded_mode is enabled.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added end-to-end coverage for starting the service in degraded mode when a downstream dependency is unavailable.
    • Added checks for degraded-mode metrics and readiness responses during startup.
  • Bug Fixes

    • Improved service startup validation so degraded mode behavior is verified through HTTP endpoints and response content.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ff1f45fb-502f-47ec-bff1-1865962fc33f

📥 Commits

Reviewing files that changed from the base of the PR and between 606cf00 and 8150a5f.

📒 Files selected for processing (3)
  • tests/e2e/configuration/server-mode/lightspeed-stack-degraded-mode.yaml
  • tests/e2e/features/degraded_mode_startup.feature
  • tests/e2e/features/steps/common_http.py
📜 Recent review details
⏰ Context from checks skipped due to timeout. (17)
  • GitHub Check: spectral
  • GitHub Check: mypy
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: Pylinter
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (2)
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/degraded_mode_startup.feature
  • tests/e2e/features/steps/common_http.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/e2e/features/steps/common_http.py
🧠 Learnings (4)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-degraded-mode.yaml
📚 Learning: 2026-04-07T09:20:26.590Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

  • tests/e2e/features/steps/common_http.py
📚 Learning: 2026-04-13T13:39:54.963Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

  • tests/e2e/features/steps/common_http.py
📚 Learning: 2026-06-24T13:45:37.249Z
Learnt from: Jdubrick
Repo: lightspeed-core/lightspeed-stack PR: 1971
File: src/utils/markdown_repair.py:31-36
Timestamp: 2026-06-24T13:45:37.249Z
Learning: In the lightspeed-stack repository, docstrings must use the section header name "Parameters:" (not "Args:") for function arguments, even if the project references Google Python docstring conventions. Ensure docstrings follow the project’s established "Parameters:" header format for any documented function parameters.

Applied to files:

  • tests/e2e/features/steps/common_http.py
🔇 Additional comments (3)
tests/e2e/configuration/server-mode/lightspeed-stack-degraded-mode.yaml (1)

1-25: LGTM!

tests/e2e/features/steps/common_http.py (1)

115-125: LGTM!

tests/e2e/features/degraded_mode_startup.feature (1)

1-40: LGTM!


Walkthrough

Adds a degraded-mode e2e service configuration, a new step that checks response-body substrings, and feature scenarios that verify metrics and readiness when llama-stack is available or disrupted at startup.

Changes

Degraded mode startup E2E

Layer / File(s) Summary
Degraded-mode fixture and HTTP assertion
tests/e2e/configuration/server-mode/lightspeed-stack-degraded-mode.yaml, tests/e2e/features/steps/common_http.py
Adds a server-mode degraded-mode config and a quoted-substring response-body assertion step.
Degraded-mode startup scenarios
tests/e2e/features/degraded_mode_startup.feature
Adds a shared background plus metrics and readiness scenarios for degraded startup with and without llama-stack.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • tisnik
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: new E2E tests for degraded mode startup scenarios.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/features/degraded_mode_startup.feature`:
- Around line 58-62: Remove the `@skip` tag from the degraded mode startup
scenario so it is not excluded by the e2e behave runs; keep the scenario gated
with a CI-specific tag expression and retain `@manual` only if it still needs to
be excluded from automation. Update the tags on the Scenario in
degraded_mode_startup.feature so the Llama Stack startup check can actually
execute in CI.

In `@tests/e2e/utils/utils.py`:
- Around line 224-225: The liveness check fallback in the utility that waits for
a container response currently prints a message and returns, which can mask a
restart failure. Update the helper in utils.py where the retry loop ends so it
fails immediately by raising an assertion or error instead of returning
silently, keeping the existing container_name context in the failure message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b1335973-46d9-423a-90d9-c2a2b542ee79

📥 Commits

Reviewing files that changed from the base of the PR and between ab86328 and 606cf00.

📒 Files selected for processing (5)
  • tests/e2e/configuration/server-mode/lightspeed-stack-degraded-mode.yaml
  • tests/e2e/features/degraded_mode_startup.feature
  • tests/e2e/features/steps/common.py
  • tests/e2e/features/steps/common_http.py
  • tests/e2e/utils/utils.py
📜 Review details
⏰ Context from checks skipped due to timeout. (18)
  • GitHub Check: bandit
  • GitHub Check: Pyright
  • GitHub Check: build-pr
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: spectral
  • GitHub Check: Pylinter
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/e2e/features/steps/common.py
  • tests/e2e/features/steps/common_http.py
  • tests/e2e/utils/utils.py
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/steps/common.py
  • tests/e2e/features/degraded_mode_startup.feature
  • tests/e2e/features/steps/common_http.py
  • tests/e2e/utils/utils.py
🧠 Learnings (4)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-degraded-mode.yaml
📚 Learning: 2026-04-07T09:20:26.590Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

  • tests/e2e/features/steps/common.py
  • tests/e2e/features/steps/common_http.py
📚 Learning: 2026-04-13T13:39:54.963Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

  • tests/e2e/features/steps/common.py
  • tests/e2e/features/steps/common_http.py
📚 Learning: 2026-06-24T13:45:37.249Z
Learnt from: Jdubrick
Repo: lightspeed-core/lightspeed-stack PR: 1971
File: src/utils/markdown_repair.py:31-36
Timestamp: 2026-06-24T13:45:37.249Z
Learning: In the lightspeed-stack repository, docstrings must use the section header name "Parameters:" (not "Args:") for function arguments, even if the project references Google Python docstring conventions. Ensure docstrings follow the project’s established "Parameters:" header format for any documented function parameters.

Applied to files:

  • tests/e2e/features/steps/common.py
  • tests/e2e/features/steps/common_http.py
  • tests/e2e/utils/utils.py
🪛 GitHub Actions: Black / 0_black.txt
tests/e2e/utils/utils.py

[error] 1-1: Black --check failed: formatting mismatch. Output indicates this file 'would be reformatted' (1 file would be reformatted), causing exit code 1. Run 'black src tests' (or 'black --write tests/e2e/utils/utils.py') to apply formatting.

🪛 GitHub Actions: Black / black
tests/e2e/utils/utils.py

[error] 1-1: Black --check would reformat files. 1 file would be reformatted (tests/e2e/utils/utils.py). Run 'uv tool run black' (without --check) or 'black --write' to fix formatting.

🔇 Additional comments (5)
tests/e2e/configuration/server-mode/lightspeed-stack-degraded-mode.yaml (1)

1-25: LGTM!

tests/e2e/features/degraded_mode_startup.feature (1)

1-57: LGTM!

Also applies to: 63-67

tests/e2e/features/steps/common_http.py (1)

115-126: LGTM!

tests/e2e/features/steps/common.py (1)

193-207: LGTM!

tests/e2e/utils/utils.py (1)

184-223: LGTM!

Also applies to: 461-467, 500-500

Comment thread tests/e2e/features/degraded_mode_startup.feature Outdated
Comment thread tests/e2e/utils/utils.py Outdated
Comment on lines +224 to +225
print(f"{container_name} did not respond after 30 attempts (degraded mode)")
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Fail fast when liveness never becomes 200.

This path currently logs and returns, which can hide a failed restart and cause later, less-actionable test failures. Raise an assertion/error here.

Proposed fix
-        print(f"{container_name} did not respond after 30 attempts (degraded mode)")
-        return
+        raise AssertionError(
+            f"{container_name} did not respond on /liveness after 30 attempts (degraded mode)"
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"{container_name} did not respond after 30 attempts (degraded mode)")
return
raise AssertionError(
f"{container_name} did not respond on /liveness after 30 attempts (degraded mode)"
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/utils/utils.py` around lines 224 - 225, The liveness check fallback
in the utility that waits for a container response currently prints a message
and returns, which can mask a restart failure. Update the helper in utils.py
where the retry loop ends so it fails immediately by raising an assertion or
error instead of returning silently, keeping the existing container_name context
in the failure message.

@anik120 anik120 force-pushed the degraded-mode-tests branch 2 times, most recently from 580cef9 to 24d5db9 Compare June 24, 2026 19:52

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread tests/e2e/utils/utils.py Outdated
max_attempts (int): Maximum number of health check attempts (default 6).
skip_health_check (bool): If True, skip health check (for degraded mode tests).
"""
if skip_health_check:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already use the liveness endpoint to determine if the lightspeed-stack container is healthy, so why do we actually need this?

And REST API service prefix is /v1
And the Lightspeed stack configuration directory is "tests/e2e/configuration"

@skip-health-check

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hiding functionality behind tags is what we are actively trying to avoid now, please find different way to do this if actually necessary

And the Lightspeed stack configuration directory is "tests/e2e/configuration"

@skip-health-check
Scenario: Service starts in degraded mode when llama-stack is not running

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test can essentially never fail, the liveness check is part of the helthcheck for the container. So the assertion and basically the whole test is redundant

"""

@skip-health-check
Scenario: Metrics endpoint works in degraded mode

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is redundant, the two that follow do the same thing

And The response body contains "ls_started_in_degraded_mode 1.0"

Scenario: Degraded mode metric is set to 0.0 when started with llama-stack
Given Llama Stack is restarted

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can avoid this unnecessary extra restart of llama-stack service if this test is run as first in the feature file

Add end-to-end tests to verify LCORE behavior when starting without
llama-stack and allow_degraded_mode is enabled.
@anik120 anik120 force-pushed the degraded-mode-tests branch from 24d5db9 to 8150a5f Compare June 25, 2026 13:14

@radofuchs radofuchs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 62251cf into lightspeed-core:main Jun 25, 2026
38 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants