LCORE-1872: Launch Llama Stack container via Makefile orchestration#1760
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughThis PR transitions the service from local ChangesContainer-backed Llama Stack execution
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Makefile`:
- Around line 50-51: The Makefile hardcodes run.yaml and lightspeed-stack.yaml
in the container volume mounts, so overrides passed as CONFIG or
LLAMA_STACK_CONFIG aren’t propagated; update the mount lines that currently
reference run.yaml and lightspeed-stack.yaml to use the Makefile variables
(e.g., $(CONFIG) and $(LLAMA_STACK_CONFIG)) instead (with sensible defaults if
needed) so the target that invokes the container mounts the configured files
from the host into the container; change the two -v mount entries in the recipe
to reference $(PWD)/$(CONFIG):/opt/app-root/run.yaml:ro,z and
$(PWD)/$(LLAMA_STACK_CONFIG):/opt/app-root/lightspeed-stack.yaml:ro,z (or their
default variable values) so overrides are honored.
- Line 24: Add a .PHONY declaration listing all orchestration targets so Make
won’t treat same-named files as up-to-date; explicitly mark run,
ensure-llama-stack-container, build-llama-stack-image,
stop-llama-stack-container, start-llama-stack-container,
wait-for-llama-stack-health, and clean-llama-stack as phony by adding a .PHONY:
line that includes those target names (referencing the target symbols
ensure-llama-stack-container, build-llama-stack-image,
stop-llama-stack-container, start-llama-stack-container,
wait-for-llama-stack-health, clean-llama-stack, and run).
- Around line 35-37: The current conditional uses substring matching with `ps
--filter "name="` and `grep` which can falsely match similarly named containers;
replace that check with an exact existence test using the container runtime's
inspect command (referencing CONTAINER_RUNTIME and LLAMA_STACK_CONTAINER_NAME) —
call $(CONTAINER_RUNTIME) inspect $(LLAMA_STACK_CONTAINER_NAME) and test its
exit status (redirecting output to /dev/null) to decide whether to echo
"Stopping existing llama-stack container..." and run $(CONTAINER_RUNTIME) rm -f
$(LLAMA_STACK_CONTAINER_NAME) so only the exact container name triggers removal.
- Around line 54-77: The docker run env flags currently interpolate secrets on
the command line (entries like -e OPENAI_API_KEY=$${OPENAI_API_KEY}, -e
CLIENT_SECRET=$${CLIENT_SECRET:-}, -e WATSONX_API_KEY=$${WATSONX_API_KEY:-}, -e
AWS_BEARER_TOKEN_BEDROCK=$${AWS_BEARER_TOKEN_BEDROCK:-}, etc. in the Makefile)
which exposes them; change each of these to pass-through form (-e KEY) instead
of embedding values so the container inherits the secret from the parent
environment (replace patterns like -e VAR=$${VAR:-...} with -e VAR for all
secret vars in the docker run invocation).
In `@README.md`:
- Around line 181-184: Add a blank line before and after the fenced code block
under the list item "start LCS server" in README.md to satisfy markdownlint rule
MD031; locate the list item containing the ```bash make run ``` block and ensure
there is an empty line above the opening ``` and an empty line below the closing
``` so the fenced block is separated from surrounding list text.
In `@src/models/config.py`:
- Around line 604-605: The AnyHttpUrl field default is currently a raw string in
the declaration "url: AnyHttpUrl = Field(default=\"http://localhost:8321\",
...)" which causes type-check failures; update the default to be an AnyHttpUrl
instance by wrapping the string with the AnyHttpUrl constructor (use
AnyHttpUrl("http://localhost:8321") as the default for the url Field) so
Pydantic and static type checkers accept the value.
- Around line 1901-1902: The Field declaration for llama_stack uses a class
reference where default_factory expects a zero-arg callable; change the
default_factory to a zero-argument factory (e.g. a lambda or small helper) that
returns a new LlamaStackConfiguration instance so static checks pass—update the
llama_stack: LlamaStackConfiguration = Field(default_factory=...) to use a
callable like lambda: LlamaStackConfiguration() consistent with the other nested
config defaults in this module.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c2672555-59a0-48a8-845d-4970a95ee638
📒 Files selected for processing (6)
MakefileREADME.mdlightspeed-stack.yamlsrc/client.pysrc/models/config.pytests/unit/test_client.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: build-pr
- GitHub Check: Pylinter
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/client.pysrc/models/config.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/config.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/test_client.py
🧠 Learnings (2)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🪛 checkmake (0.3.2)
Makefile
[warning] 26-26: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (6).
(maxbodylength)
[warning] 40-40: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (41).
(maxbodylength)
[warning] 83-83: Target body for "wait-for-llama-stack-health" exceeds allowed length of 5 lines (14).
(maxbodylength)
[warning] 24-24: Target "ensure-llama-stack-container" should be declared PHONY.
(phonydeclared)
🪛 GitHub Actions: Pyright / 0_Pyright.txt
src/models/config.py
[error] 604-604: pyright error (reportAssignmentType): Type "str" is not assignable to declared type "AnyHttpUrl". "str" is not assignable to "AnyHttpUrl".
[error] 1902-1902: pyright error (reportArgumentType): Argument of type "type[LlamaStackConfiguration]" cannot be assigned to parameter "default_factory" of type "(() -> _T@Field) | ((dict[str, Any]) -> _T@Field)" in function "Field"; function signature mismatch (expected 0 but received 1).
🪛 GitHub Actions: Pyright / Pyright
src/models/config.py
[error] 604-604: pyright (reportAssignmentType): Type "str" is not assignable to declared type "AnyHttpUrl". "str" is not assignable to "AnyHttpUrl".
[error] 1902-1902: pyright (reportArgumentType): Argument of type "type[LlamaStackConfiguration]" cannot be assigned to parameter "default_factory" of type "(() -> _T@Field) | ((dict[str, Any]) -> _T@Field)" in function "Field"; function accepts too many positional parameters and URL "Literal['http://localhost:8321']" is not assignable to "AnyHttpUrl".
🪛 GitHub Actions: Type checks / 0_mypy.txt
src/models/config.py
[error] 604-604: mypy: Incompatible types in assignment (expression has type "str", variable has type "AnyHttpUrl") [assignment]
[error] 1902-1902: mypy: Argument "default_factory" to "Field" has incompatible type "type[LlamaStackConfiguration]"; expected "Callable[[], Never] | Callable[[dict[str, Any]], Never]" [arg-type]
🪛 GitHub Actions: Type checks / mypy
src/models/config.py
[error] 604-604: mypy: Incompatible types in assignment (expression has type "str", variable has type "AnyHttpUrl") [assignment]
[error] 1902-1902: mypy: Argument "default_factory" to "Field" has incompatible type "type[LlamaStackConfiguration]"; expected "Callable[[], Never] | Callable[[dict[str, Any]], Never]" [arg-type]
🪛 markdownlint-cli2 (0.22.1)
README.md
[warning] 182-182: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
🔇 Additional comments (3)
tests/unit/test_client.py (1)
40-47: LGTM!Also applies to: 49-54, 57-58, 157-163
README.md (1)
188-196: LGTM!src/client.py (1)
21-47: LGTM!Also applies to: 66-97
d8be544 to
546efbf
Compare
| fi | ||
| $(CONTAINER_RUNTIME) build -f deploy/llama-stack/test.containerfile -t $(LLAMA_STACK_IMAGE) . | ||
|
|
||
| stop-llama-stack-container: ## Stop and remove existing llama-stack container |
There was a problem hiding this comment.
the stop name is misleading, the content of this is actually removing the container, not just stopping it
There was a problem hiding this comment.
renamed to remove-llama-stack-container
| done; \ | ||
| echo "✗ ERROR: Llama-stack did not become healthy within 60 seconds"; \ | ||
| echo "Container logs:"; \ | ||
| $(CONTAINER_RUNTIME) logs --tail 50 $(LLAMA_STACK_CONTAINER_NAME); \ |
There was a problem hiding this comment.
there is no reason to limit the logs when you are starting the container
There was a problem hiding this comment.
that's fair, removed the limitation 👍🏽
546efbf to
9a1adb3
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lightspeed-stack.yaml`:
- Around line 10-12: Update the misleading comment about llama_stack: change the
note to state that when using `make run` a container is launched at
http://localhost:8321 but the `llama_stack` configuration is still applied to
lightspeed-core because `make run` starts lightspeed-core with the `-c
$(CONFIG)` flag; explicitly state that this block controls service-side
connection settings so it is not ignored. Reference `llama_stack`, `make run`,
`lightspeed-core`, and the `-c $(CONFIG)` flag in the revised comment to avoid
confusion.
In `@Makefile`:
- Around line 26-27: The current parallelizable prerequisite list for
ensure-llama-stack-container causes race conditions; enforce a strict sequence
by chaining the targets: make build-llama-stack-image depend on
remove-llama-stack-container, make start-llama-stack-container depend on
build-llama-stack-image, and change ensure-llama-stack-container to depend only
on start-llama-stack-container (alternatively use GNU make's .WAIT between the
existing prerequisites) so removal → build → start always run in order.
In `@README.md`:
- Around line 181-197: The README still documents the removed Make target
run-llama-stack; update the README targets table and any duplicate mentions
(including the other occurrence noted) to remove or replace the run-llama-stack
entry so it matches the current Makefile workflow—locate the targets table and
any lines referencing run-llama-stack and either delete that row or replace it
with the correct command (e.g., make run or the new workflow), ensuring the
“Llama Stack as separate server” / container runtime sections remain consistent
with the Makefile changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 27a7cb0b-ea0c-4197-a5fd-b1f8b5831374
📒 Files selected for processing (3)
MakefileREADME.mdlightspeed-stack.yaml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: build-pr
- GitHub Check: Pylinter
- GitHub Check: unit_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
🪛 checkmake (0.3.2)
Makefile
[warning] 28-28: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (6).
(maxbodylength)
[warning] 42-42: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (41).
(maxbodylength)
[warning] 85-85: Target body for "wait-for-llama-stack-health" exceeds allowed length of 5 lines (14).
(maxbodylength)
[warning] 21-21: Required target "all" is missing from the Makefile.
(minphony)
[warning] 21-21: Required target "clean" is missing from the Makefile.
(minphony)
[warning] 21-21: Required target "test" is missing from the Makefile.
(minphony)
🪛 markdownlint-cli2 (0.22.1)
README.md
[warning] 182-182: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 184-184: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
Enables LCORE to automatically launch Llama Stack as a containerized service through Makefile orchestration. Running make run now handles all infrastructure setup: i) building the container image ii) stopping any existing instance iii) launching a fresh llama-stack container iv) waiting for started container's health check v) and finally starting the lightspeed-stack service. Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
9a1adb3 to
3cfb68c
Compare
Description
Enables LCORE to automatically launch Llama Stack as a containerized service through Makefile orchestration. Running
make runnow handles all infrastructure setup:i) building the container image
ii) stopping any existing instance
iii) launching a fresh llama-stack container
iv) waiting for started container's health check
v) and finally starting the lightspeed-stack service.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
make runcommand with automatic container lifecycle orchestration.Documentation