Skip to content

fix(soul): keep agent loop alive while background tasks are running#1802

Open
RealKai42 wants to merge 10 commits intomainfrom
kaiyi/taipei-v1
Open

fix(soul): keep agent loop alive while background tasks are running#1802
RealKai42 wants to merge 10 commits intomainfrom
kaiyi/taipei-v1

Conversation

@RealKai42
Copy link
Copy Markdown
Collaborator

@RealKai42 RealKai42 commented Apr 8, 2026

Summary

  • The agent loop exited prematurely when the LLM returned a text-only response while background tasks were still running, killing all pending background agents
  • Added a wait mechanism in _agent_loop() that waits for the next background task completion event instead of exiting, with per-completion semantics (each completion triggers one new step)
  • The wait polls every 2s to check for user steers, preserving interactive responsiveness
  • Only applies to stop_reason="no_tool_calls"tool_rejected and other stop reasons exit immediately as before

Test plan

  • 6 new tests covering: core fix, no-bg-tasks regression, cancellation, race condition, steer handling, tool_rejected bypass
  • Full test suite: 2204 passed, 0 failed
  • Pre-commit hooks (ruff + pyright) pass clean

Open with Devin

The agent loop exited prematurely when the LLM returned a text-only
response (no tool calls) while background tasks were still running.
This caused all pending background agents to be killed on session
teardown, losing their work.

Add a wait mechanism in _agent_loop() that detects active background
tasks before exiting and waits for the next completion event instead.
Each completion triggers one new step so the LLM can process
notifications promptly. The wait polls every 2s to check for user
steers, ensuring interactive responsiveness is preserved.

Only applies to stop_reason="no_tool_calls" — other stop reasons
like "tool_rejected" exit the turn immediately as before.
Copilot AI review requested due to automatic review settings April 8, 2026 12:36
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa8cb768db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +780 to +783
event.clear()
# Re-check after clearing to avoid a race where a task
# completed between has_active_tasks() and clear().
if self._runtime.background_tasks.has_active_tasks():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid clearing a valid completion signal before waiting

In _agent_loop, completion_event is cleared unconditionally before the wait starts. If one background task has just completed (so the event is already set) while another task is still active, this drops the completion signal and forces the loop to wait for a later completion or steer. In that case, the notification for the already-completed task stays pending and the LLM won’t process it promptly (or at all if the remaining task is long-lived), which breaks the intended per-completion progression.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents the root agent loop from exiting when the LLM returns a text-only response while background tasks are still running, by waiting for background task completion signals and continuing the loop.

Changes:

  • Add background-task-aware waiting logic to KimiSoul._agent_loop() for stop_reason="no_tool_calls".
  • Introduce BackgroundTaskManager.has_active_tasks() to detect non-terminal tasks.
  • Add a focused test suite covering background-wait behavior, cancellation, steers, and stop-reason bypass.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

File Description
src/kimi_cli/soul/kimisoul.py Keeps the agent loop alive by waiting for background completion events when appropriate.
src/kimi_cli/background/manager.py Adds a helper to detect whether any background tasks are still active.
tests/core/test_kimisoul_background_wait.py Adds regression and edge-case tests for the new background wait behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +779 to +783
event = self._runtime.background_tasks.completion_event
event.clear()
# Re-check after clearing to avoid a race where a task
# completed between has_active_tasks() and clear().
if self._runtime.background_tasks.has_active_tasks():
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential missed background completion signal: this code clears completion_event before checking whether it’s already set (or whether there are pending LLM notifications). If a background task completes and the notification pump sets the event while the LLM step is running, event may already be set when we reach here; clearing it can drop the signal and then the loop may wait until the next completion (or a steer), delaying/never processing the already-pending notification. Consider first checking event.is_set() and/or self._runtime.notifications.has_pending_for_sink("llm") (similar to the shell watcher) and immediately continuing to the next step in that case, only entering the timed wait when there’s truly nothing pending yet.

Copilot uses AI. Check for mistakes.
Signed-off-by: Kai <me@kaiyi.cool>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f06bef46a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +787 to +789
await asyncio.wait_for(event.wait(), timeout=2.0)
event.clear()
break # A task completed — let next step handle it
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Consume steers before handling a completion wake-up

In the background-wait loop, a successful event.wait() path clears the event and breaks immediately, but it does not check _consume_pending_steers() first. If a user steer arrives during the same wait window as a completion signal, the next iteration runs _step() for the completion-triggered follow-up before injecting the steer, which defeats the intended steer priority and can execute one extra LLM/tool step the user was trying to interrupt.

Useful? React with 👍 / 👎.

If a background task completes during _step() and the notification
pump sets completion_event before the wait logic runs, the event was
being cleared unconditionally — dropping the signal and forcing a 2s
timeout before the notification could be processed.

Check event.is_set() first and immediately continue to the next step
when a completion is already pending, preserving per-completion
progression.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21097db0ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +775 to +778
step_outcome.stop_reason == "no_tool_calls"
and self._runtime.role == "root"
and self._runtime.background_tasks.has_active_tasks()
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict background wait to turns that launched work

This condition makes every no_tool_calls turn block whenever any session background task is active, including long-running tasks started in earlier turns that are unrelated to the current user request. Because shell run_soul(...) only returns after _agent_loop() returns, the prompt never comes back and users cannot continue normal interaction while those jobs run, effectively turning background tasks into foreground blocking work; please gate this path to contexts where process exit would kill tasks (e.g., one-shot modes) or to tasks started in the current turn.

Useful? React with 👍 / 👎.

Signed-off-by: Kai <me@kaiyi.cool>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6a4e5dba2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +775 to +778
step_outcome.stop_reason == "no_tool_calls"
and self._runtime.role == "root"
and self._runtime.background_tasks.has_active_tasks()
):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope background wait to pending completion notifications

This condition waits on every no_tool_calls step whenever any non-terminal background task exists, even if that task was started in a prior turn and has no new completion to process. In practice, a long-lived background task (for example one started with run_in_background=true) keeps has_active_tasks() true, so later text-only turns can stay in the wait loop indefinitely and run_soul() does not return except via steer/cancel; that blocks normal interaction flow (queued inputs are only drained after run_soul returns in src/kimi_cli/ui/shell/__init__.py).

Useful? React with 👍 / 👎.

Signed-off-by: Kai <me@kaiyi.cool>
The prior fix added a wait loop inside KimiSoul._agent_loop() that
blocked every no_tool_calls turn whenever any background task was
active, even tasks unrelated to the current turn. Review feedback
flagged this as turning background work into foreground blocking.

Move the wait logic to Print.run() instead, where it only runs in
one-shot text mode (the only context where process exit would kill
still-running background agents). stream-json multi-turn mode is
unaffected so prompts never block each other.

The loop drives reconcile() itself (the notification pump inside
run_soul is no longer running after it returns), re-enters the soul
whenever has_pending_for_sink("llm") is True — even with active
sibling tasks so per-task progress is not blocked — and does a final
reconcile before break to close the race where a worker writes
terminal status between the top-of-loop reconcile and the
has_active_tasks check. Ctrl+C during the wait now raises
RunCancelled so the process exits with failure instead of success.

Tests: 8 unit tests (mocked) + 4 integration tests using real
BackgroundTaskManager + NotificationManager + file-backed stores,
including a race-window regression guard.
…ei-v1

# Conflicts:
#	CHANGELOG.md
#	docs/en/release-notes/changelog.md
#	docs/zh/release-notes/changelog.md
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60870da825

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

#
# stream-json mode is multi-turn: background tasks
# from one command must not block the next command.
if runtime and runtime.role == "root" and self.input_format == "text":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip print background wait when keep_alive_on_exit is enabled

The new post-run_soul wait loop runs for every root/text print turn, but it ignores runtime.config.background.keep_alive_on_exit. That flag is already honored by KimiCLI.shutdown_background_tasks (src/kimi_cli/app.py), which means exiting would not kill active tasks in this mode; with the current condition, one-shot --print can still block until all background tasks finish (or forever for long-lived jobs), which defeats the configured keep-alive behavior.

Useful? React with 👍 / 👎.

Review feedback: the new Print wait loop blocked one-shot runs
whenever any background task was active, even when
background.keep_alive_on_exit is configured. That flag opts into
"fire and forget" semantics where background subprocesses are
meant to outlive the CLI process, so blocking defeats the
configured behavior.

Skip the wait loop entirely when keep_alive_on_exit is true. Any
pending completion notifications remain in the notification store
and will be surfaced in the next interactive session.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants