LCORE-1869: Add graceful teardown and automatic cleanup for llama-stack container#1802
Conversation
WalkthroughMakefile updates add a trap in ChangesContainer Lifecycle Management
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
b180b20 to
0b2ff52
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Makefile (1)
15-15:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake E2E container name match
LLAMA_STACK_CONTAINER_NAME
MakefilesetsLLAMA_STACK_CONTAINER_NAME ?= lightspeed-llama-stack, but E2E scripts hardcode the Docker container name"llama-stack"when callingdocker inspect/stop/start(e.g.,tests/e2e/features/steps/health.py,tests/e2e/features/environment.py) and don’t referenceLLAMA_STACK_CONTAINER_NAME.- Fix by updating E2E to use
LLAMA_STACK_CONTAINER_NAME(via a shared helper/config/env var) or by ensuringmake runstarts the container with the"llama-stack"name.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` at line 15, E2E scripts hardcode the Docker name "llama-stack" while the Makefile defines LLAMA_STACK_CONTAINER_NAME; update the E2E code (e.g., in tests/e2e/features/steps/health.py and tests/e2e/features/environment.py) to read the container name from the environment variable LLAMA_STACK_CONTAINER_NAME (use os.environ.get('LLAMA_STACK_CONTAINER_NAME', 'llama-stack')) and use that value for all docker inspect/stop/start invocations so the tests follow the Makefile-controlled container name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@Makefile`:
- Line 15: E2E scripts hardcode the Docker name "llama-stack" while the Makefile
defines LLAMA_STACK_CONTAINER_NAME; update the E2E code (e.g., in
tests/e2e/features/steps/health.py and tests/e2e/features/environment.py) to
read the container name from the environment variable LLAMA_STACK_CONTAINER_NAME
(use os.environ.get('LLAMA_STACK_CONTAINER_NAME', 'llama-stack')) and use that
value for all docker inspect/stop/start invocations so the tests follow the
Makefile-controlled container name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 984d99d4-8f79-40fb-89cc-a77c41f14088
📒 Files selected for processing (1)
Makefile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: unit_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: spectral
- GitHub Check: integration_tests (3.12)
- GitHub Check: build-pr
- GitHub Check: Pylinter
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
🪛 checkmake (0.3.2)
Makefile
[warning] 21-21: Required target "all" is missing from the Makefile.
(minphony)
[warning] 21-21: Required target "clean" is missing from the Makefile.
(minphony)
[warning] 21-21: Required target "test" is missing from the Makefile.
(minphony)
[warning] 35-35: Target body for "stop-llama-stack-container" exceeds allowed length of 5 lines (11).
(maxbodylength)
[warning] 48-48: Target body for "remove-llama-stack-container" exceeds allowed length of 5 lines (7).
(maxbodylength)
🔇 Additional comments (4)
Makefile (4)
20-20: LGTM!
24-25: LGTM!
35-46: LGTM!
48-55: LGTM!
radofuchs
left a comment
There was a problem hiding this comment.
LGTM, just one detail that might be useful to fix
| echo "✓ Container stopped gracefully"; \ | ||
| else \ | ||
| echo "⚠ Container did not stop gracefully, capturing logs..."; \ | ||
| $(CONTAINER_RUNTIME) logs --tail 100 $(LLAMA_STACK_CONTAINER_NAME) > /tmp/llama-stack-failure.log 2>&1 || true; \ |
There was a problem hiding this comment.
100 lines of code is usually not enough to see the error especially if it happened in llama-stack enrichment, increase it to at least 200
There was a problem hiding this comment.
I actually got rid of the tail altogether, so that we get the whole container logs dumped in the temp file 👍🏽
…ck container **Key Changes:** - Auto-stop container when exiting `make run` (Ctrl+C) via trap handler - Graceful shutdown with 10s timeout before force-kill - Preserve logs to `/tmp/llama-stack-*.log` on shutdown/removal **Developer Experience:** - No manual cleanup needed - just Ctrl+C - Logs preserved after container removal - Clearer feedback during shutdown Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
0b2ff52 to
72d473c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Makefile`:
- Line 27: The trap in the Makefile's run path only calls
stop-llama-stack-container which leaves the stopped lightspeed-llama-stack
container and skips the log-preserving cleanup in remove-llama-stack-container;
update the trap to call remove-llama-stack-container (or both
stop-llama-stack-container then remove-llama-stack-container) so
/tmp/llama-stack-last-run.log is always written on EXIT/INT/TERM, and make the
same change for the other trap(s) referenced around lines 51-57 to ensure
consistent cleanup across all exit paths.
- Around line 25-28: The run recipe's trap is never installed before Makefile
prerequisites run (so start-llama-stack-container can fail and leave containers
running) and the trap only calls stop-llama-stack-container (not
remove-llama-stack-container which saves/removes logs). Fix by moving the trap
into a wrapper recipe that executes start-llama-stack-container and run-stack
inside the same shell (so the trap is active for startup failures), change the
trap handler to call remove-llama-stack-container (ensuring
/tmp/llama-stack-last-run.log is saved/removed), and update
start-llama-stack-container or wait-for-llama-stack-health to perform cleanup on
timeout/failure (call the same removal path) so no container or logs are left
orphaned; reference targets: run, start-llama-stack-container, run-stack,
wait-for-llama-stack-health, stop-llama-stack-container,
remove-llama-stack-container and the /tmp/llama-stack-last-run.log artifact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: dfb92330-7a65-41be-8084-d673b2e58cf1
📒 Files selected for processing (1)
Makefile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: Pylinter
- GitHub Check: spectral
- GitHub Check: build-pr
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
🪛 checkmake (0.3.2)
Makefile
[warning] 21-21: Required target "all" is missing from the Makefile.
(minphony)
[warning] 21-21: Required target "clean" is missing from the Makefile.
(minphony)
[warning] 21-21: Required target "test" is missing from the Makefile.
(minphony)
[warning] 38-38: Target body for "stop-llama-stack-container" exceeds allowed length of 5 lines (11).
(maxbodylength)
[warning] 51-51: Target body for "remove-llama-stack-container" exceeds allowed length of 5 lines (7).
(maxbodylength)
| run: start-llama-stack-container ## Run the service locally with dependent services | ||
| @echo "Starting Lightspeed Core Stack..." | ||
| @trap 'echo ""; echo "Stopping services..."; $(MAKE) stop-llama-stack-container' EXIT INT TERM; \ | ||
| $(MAKE) run-stack |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect Makefile around the cited line ranges with line numbers
if [ -f Makefile ]; then
wc -l Makefile
sed -n '1,220p' Makefile | nl -ba | sed -n '1,220p'
else
echo "Makefile not found at repo root"
fiRepository: lightspeed-core/lightspeed-stack
Length of output: 150
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Makefile line 15-40 =="
awk 'NR>=15 && NR<=40 {printf "%5d\t%s\n", NR, $0}' Makefile
echo "== Makefile line 55-130 =="
awk 'NR>=55 && NR<=130 {printf "%5d\t%s\n", NR, $0}' Makefile
echo "== Makefile line 130-200 =="
awk 'NR>=130 && NR<=200 {printf "%5d\t%s\n", NR, $0}' MakefileRepository: lightspeed-core/lightspeed-stack
Length of output: 9588
🏁 Script executed:
#!/bin/bash
set -euo pipefail
awk 'NR>=30 && NR<=90 {printf "%5d\t%s\n", NR, $0}' MakefileRepository: lightspeed-core/lightspeed-stack
Length of output: 3415
Fix Makefile cleanup: trap doesn’t cover prerequisite startup failures, and shutdown doesn’t remove/save logs
runinstalls thetraponly inside therunrecipe, butstart-llama-stack-containerruns first as a prerequisite; it starts the container (run -d) and callswait-for-llama-stack-health, whichexit 1s after the timeout. When this fails, therunrecipe (and trap) never executes, so the container can be left running.- The trap calls
stop-llama-stack-container(stop/kill only), while log saving + removal (/tmp/llama-stack-last-run.log) happens only inremove-llama-stack-container. On normalmake runshutdown you’ll stop but not remove the container or preserve the “last run” logs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Makefile` around lines 25 - 28, The run recipe's trap is never installed
before Makefile prerequisites run (so start-llama-stack-container can fail and
leave containers running) and the trap only calls stop-llama-stack-container
(not remove-llama-stack-container which saves/removes logs). Fix by moving the
trap into a wrapper recipe that executes start-llama-stack-container and
run-stack inside the same shell (so the trap is active for startup failures),
change the trap handler to call remove-llama-stack-container (ensuring
/tmp/llama-stack-last-run.log is saved/removed), and update
start-llama-stack-container or wait-for-llama-stack-health to perform cleanup on
timeout/failure (call the same removal path) so no container or logs are left
orphaned; reference targets: run, start-llama-stack-container, run-stack,
wait-for-llama-stack-health, stop-llama-stack-container,
remove-llama-stack-container and the /tmp/llama-stack-last-run.log artifact.
|
|
||
| run: start-llama-stack-container ## Run the service locally with dependent services | ||
| @echo "Starting Lightspeed Core Stack..." | ||
| @trap 'echo ""; echo "Stopping services..."; $(MAKE) stop-llama-stack-container' EXIT INT TERM; \ |
There was a problem hiding this comment.
make run still leaves the container behind on the common exit path.
Line 27 only invokes stop-llama-stack-container, but the actual log-preserving cleanup lives in remove-llama-stack-container. After a normal Ctrl+C/EXIT you keep a stopped lightspeed-llama-stack container around and never write /tmp/llama-stack-last-run.log, which misses the stated automatic cleanup behavior.
Also applies to: 51-57
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Makefile` at line 27, The trap in the Makefile's run path only calls
stop-llama-stack-container which leaves the stopped lightspeed-llama-stack
container and skips the log-preserving cleanup in remove-llama-stack-container;
update the trap to call remove-llama-stack-container (or both
stop-llama-stack-container then remove-llama-stack-container) so
/tmp/llama-stack-last-run.log is always written on EXIT/INT/TERM, and make the
same change for the other trap(s) referenced around lines 51-57 to ensure
consistent cleanup across all exit paths.
Description
Key Changes:
make run(Ctrl+C) via trap handler/tmp/llama-stack-*.logon shutdown/removalDeveloper Experience:
Signed-off-by: Anik Bhattacharjee anbhatta@redhat.com
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit