Extended thinking silently dropped on tool-continuation turns (thinking_blocks not threaded through history)

## Symptom

`ml-intern` requests extended thinking via `thinking={"type": "adaptive"}` (see `agent/core/llm_params.py:153`), but the assistant messages it stores in history never include `thinking_blocks` or `reasoning_content`. On every tool-continuation turn, LiteLLM logs:

```
LiteLLM:WARNING: Dropping 'thinking' param because the last assistant message
with tool_calls has no thinking_blocks. The model won't use extended thinking
for this turn.
```

This silently disables extended thinking for ~every turn after the first tool call — effectively turning Opus into non-thinking-mode Opus for the bulk of an agent's work, which is a meaningful reasoning-quality degradation for what ml-intern is used for.

## Reproduction

Any agent run that goes through a tool call:

```
ml-intern "run bash 'date +%Y'. then tell me the year"
```

…produces the warning on turn 2. With a thinking-worthy prompt, the model *does* produce thinking on turn 1, but it's discarded before turn 2, so the warning still fires.

## Root cause

In `agent/core/agent_loop.py`:

1. `_call_llm_streaming` and `_call_llm_non_streaming` don't capture `thinking_blocks` / `reasoning_content` from responses.
2. `LLMResult` has no fields for them.
3. The three `Message(role="assistant", ...)` construction sites in the loop drop the thinking state entirely.

So the next `acompletion` call sees a tool-call-bearing assistant message with no `thinking_blocks`, and LiteLLM strips the `thinking` param to avoid an API error.

## Proposed fix

~30 lines, no behavior change for non-thinking models:

- Add `thinking_blocks` + `reasoning_content` fields to `LLMResult`.
- **Streaming path**: collect raw chunks during iteration; after the stream finishes, call `litellm.stream_chunk_builder(chunks)` to get a reassembled `ModelResponse`, and pull `message.thinking_blocks` / `message.reasoning_content` off it. Best-effort — wrap in try/except so unfamiliar providers just fall back to no thinking reassembly.
- **Non-streaming path**: read them directly off `response.choices[0].message`.
- Attach both to every `Message(role="assistant", ...)` in the loop (truncation-hint site, no-tool-calls site, with-tool-calls site).

Verified locally: `stream_chunk_builder` handles Anthropic adaptive-thinking deltas correctly (1 thinking block + full `reasoning_content` reassembled from the streamed chunks); the warning disappears on turns where the model actually produced thinking; trivial prompts where adaptive thinking legitimately skips still show the warning because there's genuinely nothing to attach — which is semantically correct.

PR to follow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended thinking silently dropped on tool-continuation turns (thinking_blocks not threaded through history) #87

Symptom

Reproduction

Root cause

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Extended thinking silently dropped on tool-continuation turns (thinking_blocks not threaded through history) #87

Description

Symptom

Reproduction

Root cause

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions