Symptom
ml-intern requests extended thinking via thinking={"type": "adaptive"} (see agent/core/llm_params.py:153), but the assistant messages it stores in history never include thinking_blocks or reasoning_content. On every tool-continuation turn, LiteLLM logs:
LiteLLM:WARNING: Dropping 'thinking' param because the last assistant message
with tool_calls has no thinking_blocks. The model won't use extended thinking
for this turn.
This silently disables extended thinking for ~every turn after the first tool call — effectively turning Opus into non-thinking-mode Opus for the bulk of an agent's work, which is a meaningful reasoning-quality degradation for what ml-intern is used for.
Reproduction
Any agent run that goes through a tool call:
ml-intern "run bash 'date +%Y'. then tell me the year"
…produces the warning on turn 2. With a thinking-worthy prompt, the model does produce thinking on turn 1, but it's discarded before turn 2, so the warning still fires.
Root cause
In agent/core/agent_loop.py:
_call_llm_streaming and _call_llm_non_streaming don't capture thinking_blocks / reasoning_content from responses.
LLMResult has no fields for them.
- The three
Message(role="assistant", ...) construction sites in the loop drop the thinking state entirely.
So the next acompletion call sees a tool-call-bearing assistant message with no thinking_blocks, and LiteLLM strips the thinking param to avoid an API error.
Proposed fix
~30 lines, no behavior change for non-thinking models:
- Add
thinking_blocks + reasoning_content fields to LLMResult.
- Streaming path: collect raw chunks during iteration; after the stream finishes, call
litellm.stream_chunk_builder(chunks) to get a reassembled ModelResponse, and pull message.thinking_blocks / message.reasoning_content off it. Best-effort — wrap in try/except so unfamiliar providers just fall back to no thinking reassembly.
- Non-streaming path: read them directly off
response.choices[0].message.
- Attach both to every
Message(role="assistant", ...) in the loop (truncation-hint site, no-tool-calls site, with-tool-calls site).
Verified locally: stream_chunk_builder handles Anthropic adaptive-thinking deltas correctly (1 thinking block + full reasoning_content reassembled from the streamed chunks); the warning disappears on turns where the model actually produced thinking; trivial prompts where adaptive thinking legitimately skips still show the warning because there's genuinely nothing to attach — which is semantically correct.
PR to follow.
Symptom
ml-internrequests extended thinking viathinking={"type": "adaptive"}(seeagent/core/llm_params.py:153), but the assistant messages it stores in history never includethinking_blocksorreasoning_content. On every tool-continuation turn, LiteLLM logs:This silently disables extended thinking for ~every turn after the first tool call — effectively turning Opus into non-thinking-mode Opus for the bulk of an agent's work, which is a meaningful reasoning-quality degradation for what ml-intern is used for.
Reproduction
Any agent run that goes through a tool call:
…produces the warning on turn 2. With a thinking-worthy prompt, the model does produce thinking on turn 1, but it's discarded before turn 2, so the warning still fires.
Root cause
In
agent/core/agent_loop.py:_call_llm_streamingand_call_llm_non_streamingdon't capturethinking_blocks/reasoning_contentfrom responses.LLMResulthas no fields for them.Message(role="assistant", ...)construction sites in the loop drop the thinking state entirely.So the next
acompletioncall sees a tool-call-bearing assistant message with nothinking_blocks, and LiteLLM strips thethinkingparam to avoid an API error.Proposed fix
~30 lines, no behavior change for non-thinking models:
thinking_blocks+reasoning_contentfields toLLMResult.litellm.stream_chunk_builder(chunks)to get a reassembledModelResponse, and pullmessage.thinking_blocks/message.reasoning_contentoff it. Best-effort — wrap in try/except so unfamiliar providers just fall back to no thinking reassembly.response.choices[0].message.Message(role="assistant", ...)in the loop (truncation-hint site, no-tool-calls site, with-tool-calls site).Verified locally:
stream_chunk_builderhandles Anthropic adaptive-thinking deltas correctly (1 thinking block + fullreasoning_contentreassembled from the streamed chunks); the warning disappears on turns where the model actually produced thinking; trivial prompts where adaptive thinking legitimately skips still show the warning because there's genuinely nothing to attach — which is semantically correct.PR to follow.