LCORE-1566: Update llama stack to 0.6.0#1396
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 23 minutes and 10 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (12)
WalkthroughDependencies are bumped to Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
62e29cc to
291a873
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyproject.toml`:
- Line 76: Add a brief justification for introducing "pypdf>=6.9.2" to the repo
by updating the PR description and the project metadata (e.g., README or
CHANGELOG) to state why pypdf is required; mention the exact dependency string
"pypdf>=6.9.2" and whether it is a direct runtime requirement, a transitive
dependency pinned to avoid breaking changes, or included for a specific feature;
if it was added mistakenly, remove "pypdf>=6.9.2" from pyproject.toml and
document that removal in the PR so the dependency list matches actual
requirements.
In `@requirements.hashes.source.txt`:
- Around line 877-885: The requirements.lock contains duplicate pypdf==6.9.2
entries (the two identical blocks listing sha256:662c... and sha256:7f85...) —
remove the redundant block so only a single pypdf==6.9.2 entry with its two
hashes remains; search for the duplicate pypdf==6.9.2 blocks and delete one,
then run any lockfile validation or dependency tooling you use to ensure no
other duplicates remain.
In `@src/app/endpoints/conversations_v1.py`:
- Line 6: The import for the ConversationNotFoundError is pointing at the wrong
module; update the import to reference the correct module path where the
exception is defined by replacing the current root-package import of
ConversationNotFoundError with an import from llama_stack_api.common.errors so
code that catches or raises ConversationNotFoundError (e.g., in conversations_v1
handlers) uses the actual exception class.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7a6991d5-a6ba-4d57-8077-41e4cedc565d
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
pyproject.tomlrequirements-build.txtrequirements.hashes.source.txtrequirements.hashes.wheel.txtsrc/app/endpoints/conversations_v1.pysrc/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/constants.pytests/e2e/features/info.feature
💤 Files with no reviewable changes (1)
- requirements.hashes.wheel.txt
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (3)
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (AGENTS.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/info.feature
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules (e.g.,from authentication import get_auth_dependency)
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack imports
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Use complete type annotations for function parameters and return types
Use union types with modern syntax:str | intinstead ofUnion[str, int]
UseOptional[Type]for optional types in type annotations
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Use complete type annotations for all class attributes; avoid usingAny
Follow Google Python docstring conventions for all modules, classes, and functions
IncludeParameters:,Returns:,Raises:sections in function docstrings as needed
Files:
src/constants.pysrc/app/endpoints/query.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: Usefrom fastapi import APIRouter, HTTPException, Request, status, Dependsfor FastAPI dependencies
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/conversations_v1.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/responses.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Handle `APIConnectionError` from Llama Stack in error handling
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack imports
Applied to files:
pyproject.tomlrequirements.hashes.source.txt
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/**/*.py : Handle `APIConnectionError` from Llama Stack in error handling
Applied to files:
src/app/endpoints/conversations_v1.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use `from fastapi import APIRouter, HTTPException, Request, status, Depends` for FastAPI dependencies
Applied to files:
src/app/endpoints/conversations_v1.py
📚 Learning: 2026-04-05T12:19:36.009Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-05T12:19:36.009Z
Learning: Applies to src/app/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
src/app/endpoints/responses.py
🔇 Additional comments (16)
tests/e2e/features/info.feature (1)
19-19: LGTM!The expected llama-stack version is correctly updated to
0.6.0, consistent with the dependency bump inpyproject.tomland theMAXIMAL_SUPPORTED_LLAMA_STACK_VERSIONconstant insrc/constants.py.src/constants.py (1)
4-5: LGTM!The maximum supported version is correctly bumped to
0.6.0, matching the dependency version inpyproject.toml. The supported version range[0.2.17, 0.6.0]is consistent with the upgrade.pyproject.toml (1)
31-33: LGTM!The llama-stack ecosystem packages are correctly pinned to exact version
0.6.0, ensuring consistent builds across all three packages.src/app/endpoints/query.py (1)
305-310: LGTM!The error handling correctly normalizes the message once and checks for both
"context_length"and"context length"patterns, maintaining consistency with the same handling instreaming_query.py,responses.py, andrlsapi_v1.py. This accommodates potential variations in llama-stack error message formatting.requirements-build.txt (2)
67-68: LGTM!The
maturinversion bump from 1.10.2 to 1.12.6 is a build dependency update that aligns with the overall dependency refresh.
19-19: Transitive dependency addition noted.The
pypdfentry underflit-corereflects the newpypdf>=6.9.2runtime dependency added inpyproject.toml.src/app/endpoints/responses.py (2)
337-342: LGTM!The streaming response handler correctly normalizes the error message once and checks for both prompt-too-long patterns, consistent with other endpoints.
699-704: LGTM!The non-streaming response handler mirrors the same dual-check pattern, maintaining consistency with the streaming handler and other query endpoints.
src/app/endpoints/streaming_query.py (3)
356-361: LGTM!The
retrieve_response_generatorcorrectly handles prompt-too-longRuntimeErrorwith the dual-check pattern, consistent with other endpoints.
591-597: LGTM!The
generate_responseerror handling correctly normalizes once and checks both patterns in the conditional, producing eitherPromptTooLongResponseor a generic internal server error.
838-844: LGTM!The
response_generatorhandlesresponse.incompleteandresponse.failedevents by checking both context length patterns in the error message, consistent with the prompt-too-long detection elsewhere.src/app/endpoints/conversations_v1.py (3)
279-285: LGTM!The GET handler correctly catches both
APIStatusError(remote mode) andConversationNotFoundError(library mode), ensuring consistent 404 responses. The inline comment clearly explains the library-mode exception behavior.
387-392: LGTM!The DELETE handler appropriately treats both exception types as "already deleted" scenarios, with clear logging. This is the correct behavior for idempotent delete operations.
525-531: LGTM!The UPDATE handler consistently maps both exception types to a 404 response, maintaining parity with the GET handler behavior.
requirements.hashes.source.txt (2)
512-520: Version/hash bump blocks look consistent.The updated pinned versions and hash pairs for
llama-stack*,python-multipart, anduvicornare structurally correct for a hash-locked requirements file.Also applies to: 889-891, 1048-1050
512-520:⚠️ Potential issue | 🟠 MajorRemove duplicate
pypdf==6.9.2entry and add hashes forpillow==12.1.1to both lock files.The hash lock file is missing entries for
pillow==12.1.1, which is declared inrequirements.overrides.txt. Without these hashes, reproducible installs will fail. Additionally,pypdf==6.9.2appears twice in this file with identical hashes—remove the duplicate to clean up the lock manifest.⛔ Skipped due to learnings
Learnt from: CR Repo: lightspeed-core/lightspeed-stack PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-04-05T12:19:36.009Z Learning: Applies to src/**/*.py : Use `from llama_stack_client import AsyncLlamaStackClient` for Llama Stack imports
|
Very exciting! 🚀 |
fix missing dep potential fix konflux addressed comments chore(deps): update konflux references Signed-off-by: red-hat-konflux-kflux-prd-rh02 <190377777+red-hat-konflux-kflux-prd-rh02[bot]@users.noreply.github.com> Fixes docstrings in conversation cache unit tests Fixed docstrings in Splunk unit tests Test docstrings in quota limiters unit tests LCORE-1596: Branching graphs feat: add shield moderation to rlsapi_v1 /infer endpoint Wire run_shield_moderation() into the rlsapi_v1 /infer endpoint so CLA requests go through the same safety checks as responses, query, and streaming_query. When a shield blocks input, the refusal message is returned as normal response text and the LLM call is skipped entirely. No-op when no shields are configured. RSPEED-2809 Signed-off-by: Major Hayden <major@redhat.com> refactor: extract _check_shield_moderation helper, fix integration tests Move the moderation logic out of infer_endpoint into a private helper to avoid increasing the function's cyclomatic complexity (stays at C 13). Add shield moderation mock to the integration tests so they don't hit the real run_shield_moderation path with non-async client mocks. Signed-off-by: Major Hayden <major@redhat.com> LCORE-1441: Updated Konflux dependencies Specific rule name when types are ignored: integration tests Specific rule name when types are ignored: sources Specific rule name when types are ignored: end to end tests LCORE-1715: Fixes in LiteLLM package LCORE-1472: use single config set (lightspeed-core#1467) * LCORE-1472: use single config set refactor: reduce infer_endpoint cyclomatic complexity from C(13) to B(7) Extract helpers from infer_endpoint to eliminate the verbose mode branching that inflated its complexity: - _call_llm: transport-only LLM call (no metrics side effects) - _is_verbose_enabled: the 3-way config+request check - _build_infer_response: verbose vs minimal response construction, keyed on response object presence rather than a boolean flag retrieve_simple_response now delegates to _call_llm internally and handles its own token usage extraction. The verbose failure path in infer_endpoint is preserved: if the LLM call succeeded but later processing fails, token usage is still recorded. No behavior changes, pure refactor. Signed-off-by: Major Hayden <major@redhat.com>
|
/retest |
Description
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Release Notes
Updates
Bug Fixes