Skip to content

LCORE-1869: Add graceful teardown and automatic cleanup for llama-stack container#1802

Merged
tisnik merged 1 commit into
lightspeed-core:mainfrom
anik120:container-cleanup-routine
Jun 2, 2026
Merged

LCORE-1869: Add graceful teardown and automatic cleanup for llama-stack container#1802
tisnik merged 1 commit into
lightspeed-core:mainfrom
anik120:container-cleanup-routine

Conversation

@anik120

@anik120 anik120 commented May 26, 2026

Copy link
Copy Markdown
Contributor

Description

Key Changes:

  • Auto-stop container when exiting make run (Ctrl+C) via trap handler
  • Graceful shutdown with 10s timeout before force-kill
  • Preserve logs to /tmp/llama-stack-*.log on shutdown/removal

Developer Experience:

  • No manual cleanup needed - just Ctrl+C
  • Logs preserved after container removal
  • Clearer feedback during shutdown

Signed-off-by: Anik Bhattacharjee anbhatta@redhat.com

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Chores
    • Improved container lifecycle: added graceful shutdown and automatic cleanup when processes exit or are interrupted.
    • Enhanced diagnostics: capture and persist container logs on failures and before removal to aid troubleshooting.

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Makefile updates add a trap in run to call stop-llama-stack-container on EXIT/INT/TERM, introduce stop-llama-stack-container for a 10s graceful stop with log capture and force-kill fallback, and change remove-llama-stack-container to save container logs before removal.

Changes

Container Lifecycle Management

Layer / File(s) Summary
Signal handling and lifecycle declarations
Makefile
.PHONY targets expanded to include stop-llama-stack-container; run target registers an EXIT/INT/TERM trap that calls the stop target.
Container graceful shutdown and cleanup
Makefile
Added stop-llama-stack-container implementing a 10s graceful stop, fallback log capture to /tmp/llama-stack-failure.log, and force-kill; updated remove-llama-stack-container to save logs to /tmp/llama-stack-last-run.log before removal, both guarded by the container-exists check.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • tisnik
  • radofuchs
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: adding graceful teardown and automatic cleanup for the llama-stack container, which are the core objectives of this changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@anik120 anik120 force-pushed the container-cleanup-routine branch from b180b20 to 0b2ff52 Compare May 27, 2026 18:58

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Makefile (1)

15-15: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make E2E container name match LLAMA_STACK_CONTAINER_NAME

  • Makefile sets LLAMA_STACK_CONTAINER_NAME ?= lightspeed-llama-stack, but E2E scripts hardcode the Docker container name "llama-stack" when calling docker inspect/stop/start (e.g., tests/e2e/features/steps/health.py, tests/e2e/features/environment.py) and don’t reference LLAMA_STACK_CONTAINER_NAME.
  • Fix by updating E2E to use LLAMA_STACK_CONTAINER_NAME (via a shared helper/config/env var) or by ensuring make run starts the container with the "llama-stack" name.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` at line 15, E2E scripts hardcode the Docker name "llama-stack"
while the Makefile defines LLAMA_STACK_CONTAINER_NAME; update the E2E code
(e.g., in tests/e2e/features/steps/health.py and
tests/e2e/features/environment.py) to read the container name from the
environment variable LLAMA_STACK_CONTAINER_NAME (use
os.environ.get('LLAMA_STACK_CONTAINER_NAME', 'llama-stack')) and use that value
for all docker inspect/stop/start invocations so the tests follow the
Makefile-controlled container name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@Makefile`:
- Line 15: E2E scripts hardcode the Docker name "llama-stack" while the Makefile
defines LLAMA_STACK_CONTAINER_NAME; update the E2E code (e.g., in
tests/e2e/features/steps/health.py and tests/e2e/features/environment.py) to
read the container name from the environment variable LLAMA_STACK_CONTAINER_NAME
(use os.environ.get('LLAMA_STACK_CONTAINER_NAME', 'llama-stack')) and use that
value for all docker inspect/stop/start invocations so the tests follow the
Makefile-controlled container name.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 984d99d4-8f79-40fb-89cc-a77c41f14088

📥 Commits

Reviewing files that changed from the base of the PR and between b180b20 and 0b2ff52.

📒 Files selected for processing (1)
  • Makefile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: spectral
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: build-pr
  • GitHub Check: Pylinter
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
🪛 checkmake (0.3.2)
Makefile

[warning] 21-21: Required target "all" is missing from the Makefile.

(minphony)


[warning] 21-21: Required target "clean" is missing from the Makefile.

(minphony)


[warning] 21-21: Required target "test" is missing from the Makefile.

(minphony)


[warning] 35-35: Target body for "stop-llama-stack-container" exceeds allowed length of 5 lines (11).

(maxbodylength)


[warning] 48-48: Target body for "remove-llama-stack-container" exceeds allowed length of 5 lines (7).

(maxbodylength)

🔇 Additional comments (4)
Makefile (4)

20-20: LGTM!


24-25: LGTM!


35-46: LGTM!


48-55: LGTM!

@tisnik tisnik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik requested a review from radofuchs May 28, 2026 18:14

@radofuchs radofuchs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one detail that might be useful to fix

Comment thread Makefile Outdated
echo "✓ Container stopped gracefully"; \
else \
echo "⚠ Container did not stop gracefully, capturing logs..."; \
$(CONTAINER_RUNTIME) logs --tail 100 $(LLAMA_STACK_CONTAINER_NAME) > /tmp/llama-stack-failure.log 2>&1 || true; \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 lines of code is usually not enough to see the error especially if it happened in llama-stack enrichment, increase it to at least 200

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually got rid of the tail altogether, so that we get the whole container logs dumped in the temp file 👍🏽

…ck container

**Key Changes:**
- Auto-stop container when exiting `make run` (Ctrl+C) via trap handler
- Graceful shutdown with 10s timeout before force-kill
- Preserve logs to `/tmp/llama-stack-*.log` on shutdown/removal

**Developer Experience:**
- No manual cleanup needed - just Ctrl+C
- Logs preserved after container removal
- Clearer feedback during shutdown

Signed-off-by: Anik Bhattacharjee <anbhatta@redhat.com>
@anik120 anik120 force-pushed the container-cleanup-routine branch from 0b2ff52 to 72d473c Compare June 2, 2026 15:37

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Makefile`:
- Line 27: The trap in the Makefile's run path only calls
stop-llama-stack-container which leaves the stopped lightspeed-llama-stack
container and skips the log-preserving cleanup in remove-llama-stack-container;
update the trap to call remove-llama-stack-container (or both
stop-llama-stack-container then remove-llama-stack-container) so
/tmp/llama-stack-last-run.log is always written on EXIT/INT/TERM, and make the
same change for the other trap(s) referenced around lines 51-57 to ensure
consistent cleanup across all exit paths.
- Around line 25-28: The run recipe's trap is never installed before Makefile
prerequisites run (so start-llama-stack-container can fail and leave containers
running) and the trap only calls stop-llama-stack-container (not
remove-llama-stack-container which saves/removes logs). Fix by moving the trap
into a wrapper recipe that executes start-llama-stack-container and run-stack
inside the same shell (so the trap is active for startup failures), change the
trap handler to call remove-llama-stack-container (ensuring
/tmp/llama-stack-last-run.log is saved/removed), and update
start-llama-stack-container or wait-for-llama-stack-health to perform cleanup on
timeout/failure (call the same removal path) so no container or logs are left
orphaned; reference targets: run, start-llama-stack-container, run-stack,
wait-for-llama-stack-health, stop-llama-stack-container,
remove-llama-stack-container and the /tmp/llama-stack-last-run.log artifact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dfb92330-7a65-41be-8084-d673b2e58cf1

📥 Commits

Reviewing files that changed from the base of the PR and between 0b2ff52 and 72d473c.

📒 Files selected for processing (1)
  • Makefile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: Pylinter
  • GitHub Check: spectral
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
🪛 checkmake (0.3.2)
Makefile

[warning] 21-21: Required target "all" is missing from the Makefile.

(minphony)


[warning] 21-21: Required target "clean" is missing from the Makefile.

(minphony)


[warning] 21-21: Required target "test" is missing from the Makefile.

(minphony)


[warning] 38-38: Target body for "stop-llama-stack-container" exceeds allowed length of 5 lines (11).

(maxbodylength)


[warning] 51-51: Target body for "remove-llama-stack-container" exceeds allowed length of 5 lines (7).

(maxbodylength)

Comment thread Makefile
Comment on lines 25 to 28
run: start-llama-stack-container ## Run the service locally with dependent services
@echo "Starting Lightspeed Core Stack..."
@trap 'echo ""; echo "Stopping services..."; $(MAKE) stop-llama-stack-container' EXIT INT TERM; \
$(MAKE) run-stack

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Inspect Makefile around the cited line ranges with line numbers
if [ -f Makefile ]; then
  wc -l Makefile
  sed -n '1,220p' Makefile | nl -ba | sed -n '1,220p'
else
  echo "Makefile not found at repo root"
fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 150


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Makefile line 15-40 =="
awk 'NR>=15 && NR<=40 {printf "%5d\t%s\n", NR, $0}' Makefile

echo "== Makefile line 55-130 =="
awk 'NR>=55 && NR<=130 {printf "%5d\t%s\n", NR, $0}' Makefile

echo "== Makefile line 130-200 =="
awk 'NR>=130 && NR<=200 {printf "%5d\t%s\n", NR, $0}' Makefile

Repository: lightspeed-core/lightspeed-stack

Length of output: 9588


🏁 Script executed:

#!/bin/bash
set -euo pipefail

awk 'NR>=30 && NR<=90 {printf "%5d\t%s\n", NR, $0}' Makefile

Repository: lightspeed-core/lightspeed-stack

Length of output: 3415


Fix Makefile cleanup: trap doesn’t cover prerequisite startup failures, and shutdown doesn’t remove/save logs

  • run installs the trap only inside the run recipe, but start-llama-stack-container runs first as a prerequisite; it starts the container (run -d) and calls wait-for-llama-stack-health, which exit 1s after the timeout. When this fails, the run recipe (and trap) never executes, so the container can be left running.
  • The trap calls stop-llama-stack-container (stop/kill only), while log saving + removal (/tmp/llama-stack-last-run.log) happens only in remove-llama-stack-container. On normal make run shutdown you’ll stop but not remove the container or preserve the “last run” logs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 25 - 28, The run recipe's trap is never installed
before Makefile prerequisites run (so start-llama-stack-container can fail and
leave containers running) and the trap only calls stop-llama-stack-container
(not remove-llama-stack-container which saves/removes logs). Fix by moving the
trap into a wrapper recipe that executes start-llama-stack-container and
run-stack inside the same shell (so the trap is active for startup failures),
change the trap handler to call remove-llama-stack-container (ensuring
/tmp/llama-stack-last-run.log is saved/removed), and update
start-llama-stack-container or wait-for-llama-stack-health to perform cleanup on
timeout/failure (call the same removal path) so no container or logs are left
orphaned; reference targets: run, start-llama-stack-container, run-stack,
wait-for-llama-stack-health, stop-llama-stack-container,
remove-llama-stack-container and the /tmp/llama-stack-last-run.log artifact.

Comment thread Makefile

run: start-llama-stack-container ## Run the service locally with dependent services
@echo "Starting Lightspeed Core Stack..."
@trap 'echo ""; echo "Stopping services..."; $(MAKE) stop-llama-stack-container' EXIT INT TERM; \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

make run still leaves the container behind on the common exit path.

Line 27 only invokes stop-llama-stack-container, but the actual log-preserving cleanup lives in remove-llama-stack-container. After a normal Ctrl+C/EXIT you keep a stopped lightspeed-llama-stack container around and never write /tmp/llama-stack-last-run.log, which misses the stated automatic cleanup behavior.

Also applies to: 51-57

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` at line 27, The trap in the Makefile's run path only calls
stop-llama-stack-container which leaves the stopped lightspeed-llama-stack
container and skips the log-preserving cleanup in remove-llama-stack-container;
update the trap to call remove-llama-stack-container (or both
stop-llama-stack-container then remove-llama-stack-container) so
/tmp/llama-stack-last-run.log is always written on EXIT/INT/TERM, and make the
same change for the other trap(s) referenced around lines 51-57 to ensure
consistent cleanup across all exit paths.

@tisnik tisnik merged commit 6738771 into lightspeed-core:main Jun 2, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants