kimi-code-worker hangs on Shell tool via WebSocket API

# Bug Report: `kimi-code-worker` hangs on `Shell` tool via WebSocket API

**Repository:** [MoonshotAI/kimi-cli](https://github.com/MoonshotAI/kimi-cli)
**Version:** `kimi-cli 1.44.0`
**Environment:** Linux, Python 3.12/3.13, `kimi web --host 127.0.0.1 --port 5494`

## Summary

When using the WebSocket API (`kimi web`), `kimi-code-worker` hangs indefinitely after receiving a `Shell` tool call. The worker remains in `busy` state with 0% CPU and never returns a `ToolResult`, causing the session to stall forever.

## Steps to Reproduce

1. Start `kimi web`:
   ```bash
   kimi web --host 127.0.0.1 --port 5494
   ```

2. Create a session via REST:
   ```bash
   curl -X POST http://127.0.0.1:5494/api/sessions/ \
     -H "Content-Type: application/json" \
     -d '{"work_dir": "/path/to/project", "create_dir": false}'
   ```

3. Connect to the session via WebSocket and send a prompt that triggers a `Shell` tool call:
   ```
   "what is my IP address?"
   ```
   (or in Russian: `"скажи ip своего хоста"`)

4. Observe the WebSocket event stream.

## Expected Behavior

- `ToolCall` is received.
- Worker executes the `Shell` command.
- `ToolResult` is returned with command output.
- `TurnEnd` is sent.
- Session completes successfully.

## Actual Behavior

- `ToolCall` (or `ToolCallPart` streaming fragments) is received.
- `StatusUpdate` may follow.
- **No `ToolResult` is ever returned.**
- Worker process remains in `S (sleeping)` state with 0% CPU.
- Session status stays `busy` indefinitely.
- `kimi.log` shows no `Tool Shell completed` entry for this session.

## Observations

1. **Commands themselves are not the issue:**
   - `hostname -I` and `curl -s --max-time 5 ifconfig.me` execute instantly when run manually in the same shell.

2. **Shell tool works in direct CLI mode:**
   - When running `kimi` directly in the terminal (not via `kimi web`), the `Shell` tool executes correctly for commands like `ruff check`, `pytest`, `systemctl`, etc.

3. **Streaming tool calls:**
   - In the WebSocket mode, we observed `ToolCallPart` events (streaming fragments of a tool call) being delivered before the full `ToolCall`. The worker may not handle streaming tool calls correctly.

4. **Parallel tool calls:**
   - In another reproduction, the LLM returned **two consecutive `ToolCall` events** (parallel calls). The worker also hung in that scenario.

5. **Process state:**
   - `kimi-code-worker` PID shows `Sl` state, 0% CPU, no child processes.
   - `strace` shows only `epoll_wait` loops — the worker is idle, not blocked on I/O.

## Wire Protocol Evidence

### Hung session (WebSocket):
```json
{"type": "ToolCall", "payload": {"type": "function", "id": "tool_xxx", "function": {"name": "Shell", "arguments": "{\"command\": \"hostname -I\"}"}}}
{"type": "ToolCall", "payload": {"type": "function", "id": "tool_yyy", "function": {"name": "Shell", "arguments": "{\"command\": \"curl -s ifconfig.me\"}"}}}
// ... silence — no ToolResult, no TurnEnd
```

### Working session (direct CLI):
```json
{"type": "ToolCall", "payload": {"type": "function", "id": "tool_zzz", "function": {"name": "Shell", "arguments": "{\"command\": \"ruff check .\"}"}}}
{"type": "ToolResult", "payload": {"tool_call_id": "tool_zzz", "return_value": {"is_error": false, "output": "All checks passed!\n"}}}
```

## Hypothesis

The `kimi-code-worker` has a bug in its WebSocket event loop when handling:
- **Streaming tool calls (`ToolCallPart`)**, or
- **Parallel/batched `ToolCall` events**

The worker receives the tool call(s) but fails to dispatch them to the tool execution layer, leaving the session stuck in `busy` state.

## Impact

Any WebSocket client (including third-party integrations like Telegram bots) that relies on `kimi web` will experience **permanent hangs** whenever the LLM decides to invoke the `Shell` tool for certain prompts.

## Suggested Fix / Workaround

- Ensure `ToolCallPart` streaming fragments are correctly buffered and assembled into a complete `ToolCall` before execution.
- Verify that parallel `ToolCall` events are properly queued or executed without deadlock.
- Consider adding a watchdog/timeout inside `kimi-code-worker` to abort stuck tool executions.

---

*Reported by a downstream integrator using `kimi web` as a backend for a Telegram bot gateway.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kimi-code-worker hangs on Shell tool via WebSocket API #2365

Bug Report: `kimi-code-worker` hangs on `Shell` tool via WebSocket API

Summary

Steps to Reproduce

Expected Behavior

Actual Behavior

Observations

Wire Protocol Evidence

Hung session (WebSocket):

Working session (direct CLI):

Hypothesis

Impact

Suggested Fix / Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

kimi-code-worker hangs on Shell tool via WebSocket API #2365

Description

Bug Report: kimi-code-worker hangs on Shell tool via WebSocket API

Summary

Steps to Reproduce

Expected Behavior

Actual Behavior

Observations

Wire Protocol Evidence

Hung session (WebSocket):

Working session (direct CLI):

Hypothesis

Impact

Suggested Fix / Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug Report: `kimi-code-worker` hangs on `Shell` tool via WebSocket API