LCORE-1872: Launch Llama Stack container via Makefile orchestration by anik120 · Pull Request #1760 · lightspeed-core/lightspeed-stack

anik120 · 2026-05-18T19:17:14Z

Description

Enables LCORE to automatically launch Llama Stack as a containerized service through Makefile orchestration. Running make run now handles all infrastructure setup:
i) building the container image
ii) stopping any existing instance
iii) launching a fresh llama-stack container
iv) waiting for started container's health check
v) and finally starting the lightspeed-stack service.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Containerized llama-stack service with automatic Docker or Podman runtime detection and health monitoring.
- Simplified local development setup via single make run command with automatic container lifecycle orchestration.
Documentation
- Updated run instructions to reflect containerized workflow and local endpoint configuration.
- Added container runtime requirements documentation for Docker or Podman support.

coderabbitai · 2026-05-18T19:17:27Z

Warning

Rate limit exceeded

@anik120 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 58 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a29a476c-5f86-4fc6-aee7-3f704679008b

📥 Commits

Reviewing files that changed from the base of the PR and between 9a1adb3 and 3cfb68c.

📒 Files selected for processing (3)

Makefile
README.md
lightspeed-stack.yaml

Walkthrough

This PR transitions the service from local uv run execution to containerized llama-stack service delivery. It adds Makefile-based container orchestration with auto-detected runtime support, updates configuration to reference localhost, and simplifies documentation of the new workflow.

Changes

Container-backed Llama Stack execution

Layer / File(s)	Summary
Makefile container lifecycle management `Makefile`	Adds container configuration variables (name, image, port), auto-detects podman/docker availability, and introduces six new targets (build-llama-stack-image, remove-llama-stack-container, start-llama-stack-container, wait-for-llama-stack-health, ensure-llama-stack-container, clean-llama-stack) that orchestrate image building, container startup with port mapping and health checks, container removal, and cleanup. The run target now depends on ensure-llama-stack-container orchestration.
Configuration defaults and documentation updates `lightspeed-stack.yaml`, `README.md`	Updates lightspeed-stack.yaml llama_stack configuration to use localhost:8321 endpoint with api_key commented out, adds documentation noting that make run launches a container at this address. README simplifies local startup to use make run and adds a new "Container Runtime Requirements" section documenting Podman/Docker prerequisites and Makefile auto-detection.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main change: introducing Makefile orchestration to automatically launch Llama Stack as a containerized service when running `make run`.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Makefile`:
- Around line 50-51: The Makefile hardcodes run.yaml and lightspeed-stack.yaml
in the container volume mounts, so overrides passed as CONFIG or
LLAMA_STACK_CONFIG aren’t propagated; update the mount lines that currently
reference run.yaml and lightspeed-stack.yaml to use the Makefile variables
(e.g., $(CONFIG) and $(LLAMA_STACK_CONFIG)) instead (with sensible defaults if
needed) so the target that invokes the container mounts the configured files
from the host into the container; change the two -v mount entries in the recipe
to reference $(PWD)/$(CONFIG):/opt/app-root/run.yaml:ro,z and
$(PWD)/$(LLAMA_STACK_CONFIG):/opt/app-root/lightspeed-stack.yaml:ro,z (or their
default variable values) so overrides are honored.
- Line 24: Add a .PHONY declaration listing all orchestration targets so Make
won’t treat same-named files as up-to-date; explicitly mark run,
ensure-llama-stack-container, build-llama-stack-image,
stop-llama-stack-container, start-llama-stack-container,
wait-for-llama-stack-health, and clean-llama-stack as phony by adding a .PHONY:
line that includes those target names (referencing the target symbols
ensure-llama-stack-container, build-llama-stack-image,
stop-llama-stack-container, start-llama-stack-container,
wait-for-llama-stack-health, clean-llama-stack, and run).
- Around line 35-37: The current conditional uses substring matching with `ps
--filter "name="` and `grep` which can falsely match similarly named containers;
replace that check with an exact existence test using the container runtime's
inspect command (referencing CONTAINER_RUNTIME and LLAMA_STACK_CONTAINER_NAME) —
call $(CONTAINER_RUNTIME) inspect $(LLAMA_STACK_CONTAINER_NAME) and test its
exit status (redirecting output to /dev/null) to decide whether to echo
"Stopping existing llama-stack container..." and run $(CONTAINER_RUNTIME) rm -f
$(LLAMA_STACK_CONTAINER_NAME) so only the exact container name triggers removal.
- Around line 54-77: The docker run env flags currently interpolate secrets on
the command line (entries like -e OPENAI_API_KEY=$${OPENAI_API_KEY}, -e
CLIENT_SECRET=$${CLIENT_SECRET:-}, -e WATSONX_API_KEY=$${WATSONX_API_KEY:-}, -e
AWS_BEARER_TOKEN_BEDROCK=$${AWS_BEARER_TOKEN_BEDROCK:-}, etc. in the Makefile)
which exposes them; change each of these to pass-through form (-e KEY) instead
of embedding values so the container inherits the secret from the parent
environment (replace patterns like -e VAR=$${VAR:-...} with -e VAR for all
secret vars in the docker run invocation).

In `@README.md`:
- Around line 181-184: Add a blank line before and after the fenced code block
under the list item "start LCS server" in README.md to satisfy markdownlint rule
MD031; locate the list item containing the ```bash make run ``` block and ensure
there is an empty line above the opening ``` and an empty line below the closing
``` so the fenced block is separated from surrounding list text.

In `@src/models/config.py`:
- Around line 604-605: The AnyHttpUrl field default is currently a raw string in
the declaration "url: AnyHttpUrl = Field(default=\"http://localhost:8321\",
...)" which causes type-check failures; update the default to be an AnyHttpUrl
instance by wrapping the string with the AnyHttpUrl constructor (use
AnyHttpUrl("http://localhost:8321") as the default for the url Field) so
Pydantic and static type checkers accept the value.
- Around line 1901-1902: The Field declaration for llama_stack uses a class
reference where default_factory expects a zero-arg callable; change the
default_factory to a zero-argument factory (e.g. a lambda or small helper) that
returns a new LlamaStackConfiguration instance so static checks pass—update the
llama_stack: LlamaStackConfiguration = Field(default_factory=...) to use a
callable like lambda: LlamaStackConfiguration() consistent with the other nested
config defaults in this module.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c2672555-59a0-48a8-845d-4970a95ee638

📥 Commits

Reviewing files that changed from the base of the PR and between fd4acfa and ad72ddd.

📒 Files selected for processing (6)

Makefile
README.md
lightspeed-stack.yaml
src/client.py
src/models/config.py
tests/unit/test_client.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: build-pr
GitHub Check: Pylinter
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

🧰 Additional context used

📓 Path-based instructions (3)

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

src/client.py
src/models/config.py

src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

src/models/config.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

tests/unit/test_client.py

🧠 Learnings (2)

📚 Learning: 2026-01-12T10:58:40.230Z

Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

src/models/config.py

📚 Learning: 2026-02-25T07:46:33.545Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

src/models/config.py

🪛 checkmake (0.3.2)

Makefile

[warning] 26-26: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (6).

(maxbodylength)

[warning] 40-40: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (41).

(maxbodylength)

[warning] 83-83: Target body for "wait-for-llama-stack-health" exceeds allowed length of 5 lines (14).

(maxbodylength)

[warning] 24-24: Target "ensure-llama-stack-container" should be declared PHONY.

(phonydeclared)

🪛 GitHub Actions: Pyright / 0_Pyright.txt

src/models/config.py

[error] 604-604: pyright error (reportAssignmentType): Type "str" is not assignable to declared type "AnyHttpUrl". "str" is not assignable to "AnyHttpUrl".

[error] 1902-1902: pyright error (reportArgumentType): Argument of type "type[LlamaStackConfiguration]" cannot be assigned to parameter "default_factory" of type "(() -> _T@Field) | ((dict[str, Any]) -> _T@Field)" in function "Field"; function signature mismatch (expected 0 but received 1).

🪛 GitHub Actions: Pyright / Pyright

src/models/config.py

[error] 604-604: pyright (reportAssignmentType): Type "str" is not assignable to declared type "AnyHttpUrl". "str" is not assignable to "AnyHttpUrl".

[error] 1902-1902: pyright (reportArgumentType): Argument of type "type[LlamaStackConfiguration]" cannot be assigned to parameter "default_factory" of type "(() -> _T@Field) | ((dict[str, Any]) -> _T@Field)" in function "Field"; function accepts too many positional parameters and URL "Literal['http://localhost:8321']" is not assignable to "AnyHttpUrl".

🪛 GitHub Actions: Type checks / 0_mypy.txt

src/models/config.py

[error] 604-604: mypy: Incompatible types in assignment (expression has type "str", variable has type "AnyHttpUrl") [assignment]

[error] 1902-1902: mypy: Argument "default_factory" to "Field" has incompatible type "type[LlamaStackConfiguration]"; expected "Callable[[], Never] | Callable[[dict[str, Any]], Never]" [arg-type]

🪛 GitHub Actions: Type checks / mypy

src/models/config.py

[error] 604-604: mypy: Incompatible types in assignment (expression has type "str", variable has type "AnyHttpUrl") [assignment]

[error] 1902-1902: mypy: Argument "default_factory" to "Field" has incompatible type "type[LlamaStackConfiguration]"; expected "Callable[[], Never] | Callable[[dict[str, Any]], Never]" [arg-type]

🪛 markdownlint-cli2 (0.22.1)

README.md

[warning] 182-182: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🔇 Additional comments (3)

tests/unit/test_client.py (1)

40-47: LGTM!

Also applies to: 49-54, 57-58, 157-163

README.md (1)

188-196: LGTM!

src/client.py (1)

21-47: LGTM!

Also applies to: 66-97

tisnik

LGTM

radofuchs · 2026-05-19T12:59:45Z

+	fi
+	$(CONTAINER_RUNTIME) build -f deploy/llama-stack/test.containerfile -t $(LLAMA_STACK_IMAGE) .
+
+stop-llama-stack-container: ## Stop and remove existing llama-stack container


the stop name is misleading, the content of this is actually removing the container, not just stopping it

renamed to remove-llama-stack-container

radofuchs · 2026-05-19T13:18:19Z

+	done; \
+	echo "✗ ERROR: Llama-stack did not become healthy within 60 seconds"; \
+	echo "Container logs:"; \
+	$(CONTAINER_RUNTIME) logs --tail 50 $(LLAMA_STACK_CONTAINER_NAME); \


there is no reason to limit the logs when you are starting the container

that's fair, removed the limitation 👍🏽

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lightspeed-stack.yaml`:
- Around line 10-12: Update the misleading comment about llama_stack: change the
note to state that when using `make run` a container is launched at
http://localhost:8321 but the `llama_stack` configuration is still applied to
lightspeed-core because `make run` starts lightspeed-core with the `-c
$(CONFIG)` flag; explicitly state that this block controls service-side
connection settings so it is not ignored. Reference `llama_stack`, `make run`,
`lightspeed-core`, and the `-c $(CONFIG)` flag in the revised comment to avoid
confusion.

In `@Makefile`:
- Around line 26-27: The current parallelizable prerequisite list for
ensure-llama-stack-container causes race conditions; enforce a strict sequence
by chaining the targets: make build-llama-stack-image depend on
remove-llama-stack-container, make start-llama-stack-container depend on
build-llama-stack-image, and change ensure-llama-stack-container to depend only
on start-llama-stack-container (alternatively use GNU make's .WAIT between the
existing prerequisites) so removal → build → start always run in order.

In `@README.md`:
- Around line 181-197: The README still documents the removed Make target
run-llama-stack; update the README targets table and any duplicate mentions
(including the other occurrence noted) to remove or replace the run-llama-stack
entry so it matches the current Makefile workflow—locate the targets table and
any lines referencing run-llama-stack and either delete that row or replace it
with the correct command (e.g., make run or the new workflow), ensuring the
“Llama Stack as separate server” / container runtime sections remain consistent
with the Makefile changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 27a7cb0b-ea0c-4197-a5fd-b1f8b5831374

📥 Commits

Reviewing files that changed from the base of the PR and between ad72ddd and 9a1adb3.

📒 Files selected for processing (3)

Makefile
README.md
lightspeed-stack.yaml

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: build-pr
GitHub Check: Pylinter
GitHub Check: unit_tests (3.12)
GitHub Check: unit_tests (3.13)
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

🪛 checkmake (0.3.2)

Makefile

[warning] 28-28: Target body for "build-llama-stack-image" exceeds allowed length of 5 lines (6).

(maxbodylength)

[warning] 42-42: Target body for "start-llama-stack-container" exceeds allowed length of 5 lines (41).

(maxbodylength)

[warning] 85-85: Target body for "wait-for-llama-stack-health" exceeds allowed length of 5 lines (14).

(maxbodylength)

[warning] 21-21: Required target "all" is missing from the Makefile.

(minphony)

[warning] 21-21: Required target "clean" is missing from the Makefile.

(minphony)

[warning] 21-21: Required target "test" is missing from the Makefile.

(minphony)

🪛 markdownlint-cli2 (0.22.1)

README.md

[warning] 182-182: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

[warning] 184-184: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

Enables LCORE to automatically launch Llama Stack as a containerized service through Makefile orchestration. Running make run now handles all infrastructure setup: i) building the container image ii) stopping any existing instance iii) launching a fresh llama-stack container iv) waiting for started container's health check v) and finally starting the lightspeed-stack service. Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>

radofuchs

LGTM

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread Makefile Outdated

Comment thread Makefile Outdated

Comment thread Makefile Outdated

Comment thread Makefile Outdated

Comment thread README.md

Comment thread src/models/config.py Outdated

Comment thread src/models/config.py Outdated

anik120 force-pushed the launch-lls-container branch 4 times, most recently from d8be544 to 546efbf Compare May 18, 2026 20:15

tisnik requested a review from radofuchs May 19, 2026 11:35

tisnik approved these changes May 19, 2026

View reviewed changes

radofuchs requested changes May 19, 2026

View reviewed changes

anik120 force-pushed the launch-lls-container branch from 546efbf to 9a1adb3 Compare May 19, 2026 17:20

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Comment thread lightspeed-stack.yaml Outdated

Comment thread Makefile Outdated

Comment thread README.md

anik120 force-pushed the launch-lls-container branch from 9a1adb3 to 3cfb68c Compare May 19, 2026 17:58

radofuchs approved these changes May 20, 2026

View reviewed changes

tisnik merged commit 77425c2 into lightspeed-core:main May 20, 2026
30 checks passed

This was referenced May 26, 2026

LCORE-1872: Fix llama-stack container startup issues #1800

Merged

LCORE-1869: Add graceful teardown and automatic cleanup for llama-stack container #1802

Merged

LCORE-2395: remove read-only mount for llama-stack config #1815

Merged

This was referenced Jun 3, 2026

LCORE-1874: Document LLS container startup, teardown, and customization options #1845

Merged

Split llama-stack container build into deps and app layers for faster rebuilds #1895

Open

Conversation

anik120 commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

radofuchs May 19, 2026

Choose a reason for hiding this comment

Uh oh!

anik120 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

radofuchs May 19, 2026

Choose a reason for hiding this comment

Uh oh!

anik120 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

radofuchs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anik120 commented May 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading