Skip to content

fix: stabilize DeepSeek streaming output#461

Open
avir4er wants to merge 4 commits into
aliasrobotics:mainfrom
avir4er:fix/deepseek-streaming-reasoning
Open

fix: stabilize DeepSeek streaming output#461
avir4er wants to merge 4 commits into
aliasrobotics:mainfrom
avir4er:fix/deepseek-streaming-reasoning

Conversation

@avir4er

@avir4er avir4er commented Jul 3, 2026

Copy link
Copy Markdown

Summary

This patch stabilizes DeepSeek/LiteLLM streaming in CAI by:

  • avoiding duplicate litellm.acompletion() calls when stream=True
  • fixing the DeepSeek/Claude thinking context construction path that referenced an undefined panel
  • making DeepSeek raw reasoning_content display opt-in via CAI_SHOW_REASONING=true (or legacy CAI_SHOW_THINKING=true) to prevent token-by-token terminal flooding
  • keeping Claude reasoning display behavior unchanged
  • adding bounded LiteLLM model request waits via CAI_MODEL_TIMEOUT (default 180 seconds, CAI_LLM_TIMEOUT accepted as an alias; <=0 disables CAI's injected timeout)
  • enforcing an outer asyncio timeout around LiteLLM calls, in case a provider/client stalls before LiteLLM raises
  • enforcing a per-chunk idle timeout for streamed LiteLLM responses, so a returned stream cannot wait forever for the next SSE chunk
  • retrying transient streaming provider/proxy disconnects such as DeepSeek Server disconnected / LiteLLM InternalServerError
  • routing remaining runtime litellm.acompletion() call sites through the shared timeout wrapper, except an existing REPL recovery helper that already uses asyncio.wait_for
  • converting streamed provider disconnects into typed LLMProviderUnavailable errors so headless mode shows the existing concise provider-load message instead of a full traceback unless CAI_DEBUG=2

Why

During interactive Web App Pentester usage with deepseek/deepseek-v4-pro, long tasks could remain stuck "in flight", flood the terminal with token-by-token DeepSeek reasoning, or print a long traceback when DeepSeek/LiteLLM disconnected while opening a stream.

Four issues contributed:

  1. The streaming LiteLLM adapter opened a stream and discarded it before opening the returned stream.
  2. LiteLLM model calls did not receive a bounded timeout, so a slow/stalled provider or proxy could leave CAI waiting indefinitely.
  3. Some LiteLLM/provider combinations can return a stream object, then stall while awaiting the next streamed chunk; the new stream-idle wrapper bounds that phase too.
  4. Streaming high-level recovery retried timeouts and rate limits, but not transient provider disconnects such as LiteLLM InternalServerError: DeepseekException - Server disconnected.

DeepSeek reasoning deltas were also printed by default, while the Rich thinking context path hit name 'panel' is not defined.

Tests

uv run --frozen pytest \
  tests/cli/test_cli_headless_cancellation.py \
  tests/core/test_openai_chatcompletions_stream.py::test_stream_response_retries_transient_provider_disconnect \
  tests/sdk/test_litellm_adapter_streaming.py \
  tests/util/test_thinking_display.py \
  -q

uv run --frozen python -m py_compile \
  src/cai/sdk/agents/models/chatcompletions/litellm_adapter.py \
  src/cai/sdk/agents/models/openai_chatcompletions.py \
  src/cai/cli_headless.py \
  src/cai/continuation.py \
  src/cai/ctr/digest.py \
  src/cai/tui/components/agent_creator_panel.py \
  tests/cli/test_cli_headless_cancellation.py \
  tests/core/test_openai_chatcompletions_stream.py \
  tests/sdk/test_litellm_adapter_streaming.py \
  tests/util/test_thinking_display.py

Result: 15 passed for the targeted pytest slice.

Additional audit:

rg -n "litellm\.acompletion" src/cai tests

Runtime direct call sites now go through the shared timeout wrapper; the remaining runtime exception-recovery call already has an explicit asyncio.wait_for, and the remaining direct references are tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant