Skip to content

LCORE-1599: Add retry with backoff to check_llama_stack_version#1409

Merged
tisnik merged 1 commit into
lightspeed-core:mainfrom
savitojs:savitojs/LCORE-1599-retry-llama-stack-version-check
Mar 27, 2026
Merged

LCORE-1599: Add retry with backoff to check_llama_stack_version#1409
tisnik merged 1 commit into
lightspeed-core:mainfrom
savitojs:savitojs/LCORE-1599-retry-llama-stack-version-check

Conversation

@savitojs

@savitojs savitojs commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Description

check_llama_stack_version() now retries up to 5 times with a 2-second delay when it gets an APIConnectionError from Llama Stack. Previously it made a single call and crashed immediately if Llama Stack wasn't ready. This handles the sidecar startup race condition where both containers start concurrently but Llama Stack needs a few extra seconds to initialize.

After all retries are exhausted, the original error is still raised so misconfiguration isn't silently ignored.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Version checking now automatically retries on connection errors instead of failing immediately, improving resilience.
    • Added configurable retry parameters to customize retry attempts and delay with sensible defaults.
    • Enhanced logging during retry attempts provides better troubleshooting and monitoring visibility.
    • Added validation for retry parameters to catch invalid values early.

@coderabbitai

coderabbitai Bot commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@savitojs has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 25 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 851b7b38-b743-4de0-8aad-76b1422112a9

📥 Commits

Reviewing files that changed from the base of the PR and between 5cf466e and 5b25619.

📒 Files selected for processing (2)
  • src/utils/llama_stack_version.py
  • tests/unit/utils/test_llama_stack_version.py

Walkthrough

check_llama_stack_version now accepts max_retries and retry_delay, retries client.inspect.version() on APIConnectionError up to max_retries with asyncio.sleep between attempts, logs warnings on intermediate failures, and re-raises the error if all attempts fail. Unit tests added to validate retry-success and retry-exhaustion paths.

Changes

Cohort / File(s) Summary
Version Check Retry Implementation
src/utils/llama_stack_version.py
Added max_retries and retry_delay parameters (with defaults). Implements retry loop around await client.inspect.version() catching APIConnectionError, logging warnings on failures, sleeping between attempts with asyncio.sleep, and re-raising after the final failed attempt. Input validation added for max_retries. Import path adjusted to AsyncLlamaStackClient from llama_stack_client.
Retry Behavior Tests
tests/unit/utils/test_llama_stack_version.py
Added two async tests: one simulates two APIConnectionError failures then success and asserts call/sleep counts; the other simulates persistent APIConnectionError and asserts the exception is raised after exhausting retries. Both patch asyncio.sleep to avoid real delays.

Sequence Diagram(s)

sequenceDiagram
    participant App as App (startup)
    participant Checker as check_llama_stack_version
    participant Client as AsyncLlamaStackClient
    participant Verifier as compare_versions
    participant Async as asyncio.sleep

    App->>Checker: call check_llama_stack_version(...)
    loop until success or attempts exhausted
        Checker->>Client: await inspect.version()
        alt version returned
            Client-->>Checker: VersionInfo
            Checker->>Verifier: compare_versions(version, min, max)
            Verifier-->>Checker: ok / raises
            Checker-->>App: return
        else APIConnectionError
            Client-->>Checker: raises APIConnectionError
            Checker->>App: log warning (attempt N)
            Checker->>Async: await sleep(retry_delay)
            Async-->>Checker: resume
        end
    end
    alt exhausted
        Checker-->>App: re-raise APIConnectionError
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding retry-with-backoff functionality to check_llama_stack_version.
Linked Issues check ✅ Passed The implementation fully addresses issue #1408 requirements: retry logic with configurable max_retries and retry_delay, error re-raising on exhaustion, and appropriate logging of failures.
Out of Scope Changes check ✅ Passed All changes are directly related to adding retry-with-backoff functionality; no unrelated modifications to unrelated files or features were introduced.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/unit/utils/test_llama_stack_version.py (1)

121-139: Add the exhausted-retries test path.

This test covers eventual success, but not the required behavior that APIConnectionError is re-raised after all retries are exhausted.

Suggested companion test
+@pytest.mark.asyncio
+async def test_check_llama_stack_version_raises_after_max_retries(
+    mocker: MockerFixture,
+) -> None:
+    mock_client = mocker.AsyncMock()
+    mock_sleep = mocker.patch("utils.llama_stack_version.asyncio.sleep")
+    mock_client.inspect.version.side_effect = APIConnectionError(request=mocker.MagicMock())
+
+    with pytest.raises(APIConnectionError):
+        await check_llama_stack_version(mock_client, max_retries=3, retry_delay=1)
+
+    assert mock_client.inspect.version.call_count == 3
+    assert mock_sleep.call_count == 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_llama_stack_version.py` around lines 121 - 139, Add a
new unit test that verifies check_llama_stack_version re-raises
APIConnectionError after all retries are exhausted: create an async test
(similar to test_check_llama_stack_version_retries_on_connection_error) that
sets mock_client.inspect.version.side_effect to raise APIConnectionError for
every call, call await check_llama_stack_version(mock_client, max_retries=N,
retry_delay=1) inside pytest.raises(APIConnectionError) and assert
mock_client.inspect.version.call_count == N and mock_sleep.call_count == N-1 to
confirm it retried the expected number of times before re-raising; reference the
existing test name and the check_llama_stack_version function, as well as
mock_client.inspect.version and APIConnectionError to locate and mirror
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/llama_stack_version.py`:
- Around line 32-34: The retry parameters allow max_retries to be zero or
negative which causes the retry loop to run zero times and skip the version/API
check; in the function that declares max_retries and retry_delay, validate
max_retries at the start (e.g., if max_retries < 1) and either raise a
ValueError with a clear message or coerce it to a minimum of 1 before entering
the retry loop so the version check runs at least once; reference the parameters
max_retries and retry_delay to locate the function and update the validation
accordingly.
- Around line 6-7: The import uses a private module: replace the internal import
"from llama_stack_client._client import AsyncLlamaStackClient" with the public
export from the package (import AsyncLlamaStackClient directly from
llama_stack_client), i.e. change the import to use the top-level export of
AsyncLlamaStackClient so code uses the public API surface instead of the private
_client module.

---

Nitpick comments:
In `@tests/unit/utils/test_llama_stack_version.py`:
- Around line 121-139: Add a new unit test that verifies
check_llama_stack_version re-raises APIConnectionError after all retries are
exhausted: create an async test (similar to
test_check_llama_stack_version_retries_on_connection_error) that sets
mock_client.inspect.version.side_effect to raise APIConnectionError for every
call, call await check_llama_stack_version(mock_client, max_retries=N,
retry_delay=1) inside pytest.raises(APIConnectionError) and assert
mock_client.inspect.version.call_count == N and mock_sleep.call_count == N-1 to
confirm it retried the expected number of times before re-raising; reference the
existing test name and the check_llama_stack_version function, as well as
mock_client.inspect.version and APIConnectionError to locate and mirror
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bc949baa-c4b2-49dc-bdb7-8c3c309760b3

📥 Commits

Reviewing files that changed from the base of the PR and between 1ee31fc and 96616f1.

📒 Files selected for processing (2)
  • src/utils/llama_stack_version.py
  • tests/unit/utils/test_llama_stack_version.py

Comment thread src/utils/llama_stack_version.py Outdated
Comment thread src/utils/llama_stack_version.py
@savitojs savitojs force-pushed the savitojs/LCORE-1599-retry-llama-stack-version-check branch from 96616f1 to 5cf466e Compare March 26, 2026 16:54
Signed-off-by: Savitoj Singh <savsingh@redhat.com>
@savitojs savitojs force-pushed the savitojs/LCORE-1599-retry-llama-stack-version-check branch from 5cf466e to 5b25619 Compare March 26, 2026 17:03
@tisnik tisnik requested a review from asimurka March 26, 2026 18:26

@asimurka asimurka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in overall, please correct linter errors

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asimurka asimurka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit 409cdc2 into lightspeed-core:main Mar 27, 2026
20 of 22 checks passed
@savitojs savitojs deleted the savitojs/LCORE-1599-retry-llama-stack-version-check branch March 27, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Startup crash when Llama Stack sidecar is not ready (no retry in check_llama_stack_version)

3 participants