LCORE-1566: Update llama stack to 0.6.0 by jrobertboos · Pull Request #1396 · lightspeed-core/lightspeed-stack

jrobertboos · 2026-03-24T18:34:56Z

Description

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

Updates
- Upgraded core library dependencies to version 0.6.0 for improved stability and features.
- Added new PDF processing dependency for enhanced document handling.
Bug Fixes
- Improved error detection for prompt-length exceeded scenarios across multiple endpoints.
- Enhanced error handling for conversation operations with more accurate error reporting.

coderabbitai · 2026-03-24T18:35:04Z

Warning

Rate limit exceeded

@jrobertboos has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 10 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 23 minutes and 10 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d624d8cb-9e72-4f42-b1f4-c768d46ae960

📥 Commits

Reviewing files that changed from the base of the PR and between 291a873 and d92c7d2.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (12)

pyproject.toml
requirements-build.txt
requirements.hashes.source.txt
requirements.hashes.wheel.txt
src/app/endpoints/conversations_v1.py
src/app/endpoints/query.py
src/app/endpoints/responses.py
src/app/endpoints/rlsapi_v1.py
src/app/endpoints/streaming_query.py
src/constants.py
src/utils/query.py
tests/e2e/features/info.feature

Walkthrough

Dependencies are bumped to llama-stack 0.6.0 (from 0.5.2), uvicorn to 0.44.0, and maturin to 1.12.6. New runtime dependency pypdf (>=6.9.2) is added. Error handling is updated to catch ConversationNotFoundError, and prompt-length detection logic is expanded to match both "context_length" and "context length" substrings across multiple endpoints. Version constants and test assertions are updated accordingly.

Changes

Cohort / File(s)	Summary
Dependency & Version Pinning `pyproject.toml`, `requirements-build.txt`, `requirements.hashes.source.txt`, `requirements.hashes.wheel.txt`, `src/constants.py`	Bump `llama-stack`, `llama-stack-client`, `llama-stack-api` from 0.5.2 to 0.6.0; add `pypdf>=6.9.2`; bump `maturin` to 1.12.6 and `uvicorn` to 0.44.0; update hash mappings; remove stale wheel hashes for `pillow` and `python-multipart`; update max supported version constant.
Exception Handling `src/app/endpoints/conversations_v1.py`	Add `ConversationNotFoundError` import and catch alongside `APIStatusError` in three endpoints (`get_conversation_endpoint_handler`, `delete_conversation_endpoint_handler`, `update_conversation_endpoint_handler`) to map to consistent `NotFoundResponse`.
Prompt Length Error Detection `src/app/endpoints/query.py`, `src/app/endpoints/responses.py`, `src/app/endpoints/streaming_query.py`	Normalize `RuntimeError` message once and check for both `"context_length"` and `"context length"` substrings to detect prompt-too-long conditions; apply across streaming and non-streaming response handlers.
Test Updates `tests/e2e/features/info.feature`	Update expected `llama-stack` version assertion from 0.5.2 to 0.6.0.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: updating llama stack from an earlier version to 0.6.0, which is the primary focus of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tisnik · 2026-03-30T16:05:25Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 1 🔵⚪⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Missing Hash Entry The pillow dependency hashes were removed from the wheel lock file. If pillow is still a transitive dependency, this will break reproducible wheel installs. Either confirm pillow is no longer needed or restore its hash entries. `--hash=sha256:e3a18fae723b808514670a4a0172f9939cdbb095abd5eef1f34cf5ae1b99f424 peft==0.18.1 \ --hash=sha256:026817e68c93fcc0569360afa0ee4fb74b06b0a4268240f922bc2bc0a691bcc1 prometheus-client==0.24.1 \ --hash=sha256:fe601e041eac55bad8f46da2f3c54f2ab6cd8a8272d9595742c83980e95ed5e4 prompt-toolkit==3.0.52 \`

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 76: Add a brief justification for introducing "pypdf>=6.9.2" to the repo
by updating the PR description and the project metadata (e.g., README or
CHANGELOG) to state why pypdf is required; mention the exact dependency string
"pypdf>=6.9.2" and whether it is a direct runtime requirement, a transitive
dependency pinned to avoid breaking changes, or included for a specific feature;
if it was added mistakenly, remove "pypdf>=6.9.2" from pyproject.toml and
document that removal in the PR so the dependency list matches actual
requirements.

In `@requirements.hashes.source.txt`:
- Around line 877-885: The requirements.lock contains duplicate pypdf==6.9.2
entries (the two identical blocks listing sha256:662c... and sha256:7f85...) —
remove the redundant block so only a single pypdf==6.9.2 entry with its two
hashes remains; search for the duplicate pypdf==6.9.2 blocks and delete one,
then run any lockfile validation or dependency tooling you use to ensure no
other duplicates remain.

In `@src/app/endpoints/conversations_v1.py`:
- Line 6: The import for the ConversationNotFoundError is pointing at the wrong
module; update the import to reference the correct module path where the
exception is defined by replacing the current root-package import of
ConversationNotFoundError with an import from llama_stack_api.common.errors so
code that catches or raises ConversationNotFoundError (e.g., in conversations_v1
handlers) uses the actual exception class.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7a6991d5-a6ba-4d57-8077-41e4cedc565d

📥 Commits

Reviewing files that changed from the base of the PR and between 7b29914 and 291a873.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (10)

pyproject.toml
requirements-build.txt
requirements.hashes.source.txt
requirements.hashes.wheel.txt
src/app/endpoints/conversations_v1.py
src/app/endpoints/query.py
src/app/endpoints/responses.py
src/app/endpoints/streaming_query.py
src/constants.py
tests/e2e/features/info.feature

💤 Files with no reviewable changes (1)

requirements.hashes.wheel.txt

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

📓 Path-based instructions (3)

tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

tests/e2e/features/info.feature

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules (e.g., from authentication import get_auth_dependency)
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack imports
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use complete type annotations for all class attributes; avoid using Any
Follow Google Python docstring conventions for all modules, classes, and functions
Include Parameters:, Returns:, Raises: sections in function docstrings as needed

Files:

src/constants.py
src/app/endpoints/query.py
src/app/endpoints/conversations_v1.py
src/app/endpoints/streaming_query.py
src/app/endpoints/responses.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: Use from fastapi import APIRouter, HTTPException, Request, status, Depends for FastAPI dependencies
Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/query.py
src/app/endpoints/conversations_v1.py
src/app/endpoints/streaming_query.py
src/app/endpoints/responses.py

🧠 Learnings (6)

📓 Common learnings

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Handle `APIConnectionError` from Llama Stack in error handling

📚 Learning: 2026-01-14T09:37:51.612Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

src/app/endpoints/query.py

📚 Learning: 2026-04-05T12:19:36.009Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack imports

Applied to files:

pyproject.toml
requirements.hashes.source.txt

📚 Learning: 2026-04-05T12:19:36.009Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Handle `APIConnectionError` from Llama Stack in error handling

Applied to files:

src/app/endpoints/conversations_v1.py

📚 Learning: 2026-04-05T12:19:36.009Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use `from fastapi import APIRouter, HTTPException, Request, status, Depends` for FastAPI dependencies

Applied to files:

src/app/endpoints/conversations_v1.py

📚 Learning: 2026-04-05T12:19:36.009Z

Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

src/app/endpoints/responses.py

🔇 Additional comments (16)

tests/e2e/features/info.feature (1)

19-19: LGTM!

The expected llama-stack version is correctly updated to 0.6.0, consistent with the dependency bump in pyproject.toml and the MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION constant in src/constants.py.

src/constants.py (1)

4-5: LGTM!

The maximum supported version is correctly bumped to 0.6.0, matching the dependency version in pyproject.toml. The supported version range [0.2.17, 0.6.0] is consistent with the upgrade.

pyproject.toml (1)

31-33: LGTM!

The llama-stack ecosystem packages are correctly pinned to exact version 0.6.0, ensuring consistent builds across all three packages.

src/app/endpoints/query.py (1)

305-310: LGTM!

The error handling correctly normalizes the message once and checks for both "context_length" and "context length" patterns, maintaining consistency with the same handling in streaming_query.py, responses.py, and rlsapi_v1.py. This accommodates potential variations in llama-stack error message formatting.

requirements-build.txt (2)

67-68: LGTM!

The maturin version bump from 1.10.2 to 1.12.6 is a build dependency update that aligns with the overall dependency refresh.

19-19: Transitive dependency addition noted.

The pypdf entry under flit-core reflects the new pypdf>=6.9.2 runtime dependency added in pyproject.toml.

src/app/endpoints/responses.py (2)

337-342: LGTM!

The streaming response handler correctly normalizes the error message once and checks for both prompt-too-long patterns, consistent with other endpoints.

699-704: LGTM!

The non-streaming response handler mirrors the same dual-check pattern, maintaining consistency with the streaming handler and other query endpoints.

src/app/endpoints/streaming_query.py (3)

356-361: LGTM!

The retrieve_response_generator correctly handles prompt-too-long RuntimeError with the dual-check pattern, consistent with other endpoints.

591-597: LGTM!

The generate_response error handling correctly normalizes once and checks both patterns in the conditional, producing either PromptTooLongResponse or a generic internal server error.

838-844: LGTM!

The response_generator handles response.incomplete and response.failed events by checking both context length patterns in the error message, consistent with the prompt-too-long detection elsewhere.

src/app/endpoints/conversations_v1.py (3)

279-285: LGTM!

The GET handler correctly catches both APIStatusError (remote mode) and ConversationNotFoundError (library mode), ensuring consistent 404 responses. The inline comment clearly explains the library-mode exception behavior.

387-392: LGTM!

The DELETE handler appropriately treats both exception types as "already deleted" scenarios, with clear logging. This is the correct behavior for idempotent delete operations.

525-531: LGTM!

The UPDATE handler consistently maps both exception types to a 404 response, maintaining parity with the GET handler behavior.
requirements.hashes.source.txt (2)
512-520: Version/hash bump blocks look consistent.

The updated pinned versions and hash pairs for llama-stack*, python-multipart, and uvicorn are structurally correct for a hash-locked requirements file.

Also applies to: 889-891, 1048-1050

512-520: ⚠️ Potential issue | 🟠 Major

Remove duplicate pypdf==6.9.2 entry and add hashes for pillow==12.1.1 to both lock files.

The hash lock file is missing entries for pillow==12.1.1, which is declared in requirements.overrides.txt. Without these hashes, reproducible installs will fail. Additionally, pypdf==6.9.2 appears twice in this file with identical hashes—remove the duplicate to clean up the lock manifest.
⛔ Skipped due to learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack imports

major · 2026-04-06T16:03:40Z

Very exciting! 🚀

fix missing dep potential fix konflux addressed comments chore(deps): update konflux references Signed-off-by: red-hat-konflux-kflux-prd-rh02 <190377777+red-hat-konflux-kflux-prd-rh02[bot]@users.noreply.github.com> Fixes docstrings in conversation cache unit tests Fixed docstrings in Splunk unit tests Test docstrings in quota limiters unit tests LCORE-1596: Branching graphs feat: add shield moderation to rlsapi_v1 /infer endpoint Wire run_shield_moderation() into the rlsapi_v1 /infer endpoint so CLA requests go through the same safety checks as responses, query, and streaming_query. When a shield blocks input, the refusal message is returned as normal response text and the LLM call is skipped entirely. No-op when no shields are configured. RSPEED-2809 Signed-off-by: Major Hayden <major@redhat.com> refactor: extract _check_shield_moderation helper, fix integration tests Move the moderation logic out of infer_endpoint into a private helper to avoid increasing the function's cyclomatic complexity (stays at C 13). Add shield moderation mock to the integration tests so they don't hit the real run_shield_moderation path with non-async client mocks. Signed-off-by: Major Hayden <major@redhat.com> LCORE-1441: Updated Konflux dependencies Specific rule name when types are ignored: integration tests Specific rule name when types are ignored: sources Specific rule name when types are ignored: end to end tests LCORE-1715: Fixes in LiteLLM package LCORE-1472: use single config set (lightspeed-core#1467) * LCORE-1472: use single config set refactor: reduce infer_endpoint cyclomatic complexity from C(13) to B(7) Extract helpers from infer_endpoint to eliminate the verbose mode branching that inflated its complexity: - _call_llm: transport-only LLM call (no metrics side effects) - _is_verbose_enabled: the 3-way config+request check - _build_infer_response: verbose vs minimal response construction, keyed on response object presence rather than a boolean flag retrieve_simple_response now delegates to _call_llm internally and handles its own token usage extraction. The verbose failure path in infer_endpoint is preserved: if the LLM call succeeded but later processing fails, token usage is still recorded. No behavior changes, pure refactor. Signed-off-by: Major Hayden <major@redhat.com>

tisnik

LGTM

tisnik · 2026-04-07T15:34:13Z

/retest

tisnik added the Review effort 1/5 label Mar 30, 2026

jrobertboos force-pushed the lcore-1566 branch 3 times, most recently from 62e29cc to 291a873 Compare April 6, 2026 13:12

jrobertboos marked this pull request as ready for review April 6, 2026 15:19

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

Comment thread requirements.hashes.source.txt Outdated

Comment thread src/app/endpoints/conversations_v1.py

tisnik reviewed Apr 6, 2026

View reviewed changes

Comment thread requirements.hashes.source.txt

Comment thread src/app/endpoints/streaming_query.py Outdated

Comment thread pyproject.toml Outdated

jrobertboos force-pushed the lcore-1566 branch from 4b65bc8 to 7fe65c0 Compare April 7, 2026 15:14

jrobertboos force-pushed the lcore-1566 branch from e463a52 to d92c7d2 Compare April 7, 2026 15:16

jrobertboos requested a review from tisnik April 7, 2026 15:23

tisnik approved these changes Apr 7, 2026

View reviewed changes

tisnik merged commit ff55533 into lightspeed-core:main Apr 7, 2026
25 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCORE-1566: Update llama stack to 0.6.0#1396

LCORE-1566: Update llama stack to 0.6.0#1396
tisnik merged 1 commit into
lightspeed-core:mainfrom
jrobertboos:lcore-1566

jrobertboos commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

tisnik commented Mar 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

major commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tisnik left a comment

Uh oh!

tisnik commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jrobertboos commented Mar 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Uh oh!

tisnik commented Mar 30, 2026

PR Reviewer Guide 🔍

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

major commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jrobertboos commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading