Improve <think>/<thinking> tag identification in streaming responses by mludvig · Pull Request #40 · danny-avila/agents

mludvig · 2025-12-10T01:23:35Z

Some models like Amazon Nova send their thoughts wrapped in .. or .. tags. It is not guaranteed that these strings arrive as single streaming events, in fact more often than not they don't and we may receive '<', 'thinking', '>' tokens separately.

This patch introduces a state machine that deals with such a fragmented stream, identifies the tags and properly emits the thoughts content as ContentTypes.THINK for a correct frontend rendering. Fixes the output from at least Amazon Nova and quite likely from other models that emit these tags too.

Copilot

Pull request overview

This PR replaces the previous batch-processing approach to detecting <think> and <thinking> tags with a streaming state machine that handles tags split across multiple chunks. This is essential for models like Amazon Nova that emit these tags in fragments rather than complete units.

Key changes:

Implements a 4-state machine ('normal', 'buffering_open', 'thinking', 'buffering_close') to detect and strip thinking tags from streaming content
Removes the old parseThinkingContent function that required complete strings
Adds state tracking fields (thinkingState, tagBuffer) to AgentContext for maintaining parser state across chunks

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
src/stream.ts	Removed old batch `parseThinkingContent` function and replaced conditional logic with streaming state machine that processes content character-by-character while buffering potential tag sequences
src/specs/fragmented-thinking.test.ts	Added new test file to verify the state machine correctly handles thinking tags split by whitespace boundaries using the fake streaming model
src/agents/AgentContext.ts	Added `thinkingState` and `tagBuffer` fields to maintain state machine context across streaming chunks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mludvig · 2025-12-10T03:12:05Z

@danny-avila copilot comments resolved

mludvig · 2026-01-27T00:43:51Z

Hi @danny-avila
Any news on this one? Would like to have Amazon Nova thinking properly supported.
Thanks!

Add state machine to detect <thinking> and <think> tags that arrive split across multiple streaming chunks. This fixes Amazon Nova and similar models that send tokens like '<thinking', '>', content, '</', 'thinking', '>' as separate stream events. The state machine buffers content starting with '<' and waits for enough characters to determine if it's a thinking tag before routing to TEXT or THINK content types.

Verifies that <thinking> tags are correctly detected and stripped from streamed content, with thinking content routed to THINK type and regular text routed to TEXT type.

The state machine now strips <think>/<thinking> tags from content, so update the test to expect capture group 1 (content inside tags) instead of capture group 0 (entire match including tags).

Ensures partial tags aren't lost when stream ends mid-buffer

Audit verified: all 5 valid (real bypass shapes / API contract violations / OOM risks). P1 #37 — destructive path normalization. Patterns like \`rm -rf \$HOME/\`, \`rm -rf ~/\`, \`rm -rf "\$HOME/"\` slipped past the bare + quoted destructive guards because the trailing slash broke the end-anchor / quote-pair shapes. Extracted a shared \`DESTRUCTIVE_TARGET\` (\`(?:\\/|~|\\\$\\{?HOME\\}?|\\.)\\/?\`) used by both pattern lists so spelling equivalences are kept consistent. 9 tests pinned (all the spelling variants + a benign no-regression). P2 #38 — \`resolveWorkspacePathSafe\` used host \`fs/promises.realpath\` instead of the configured \`WorkspaceFS.realpath\`. On a custom or remote engine the host realpath would fail and silently fall back to lexical containment, leaving the symlink-escape clamp ineffective. \`realpathOrSelf\` and \`realpathOfPathOrAncestor\` now take the realpath impl as a parameter; \`resolveWorkspacePathSafe\` threads \`getWorkspaceFS(config).realpath\` through both. P2 #39 — direct-path \`additionalContexts\` were silently swallowed. Hosts that returned \`additionalContext\` from PreToolUse / PostToolUse / PostToolUseFailure for direct tools (which is every local-engine tool) had their context discarded — broken hook API contract. Added \`RunToolBatchContext.additionalContextsSink\`; \`runDirectToolWithLifecycleHooks\` pushes hook contexts into it; \`run()\` materializes the accumulated strings as a single \`HumanMessage\` appended to outputs, matching the event-driven path's \`injected[]\` shape. PostToolUseFailure was also changed from fire-and-forget to await so its contexts are captured (the hook is still observational w.r.t. the tool result). P2 #40 — syntax-check probe cache was keyed only on spawn backend. Same shape as P1 #34 (rg cache). Now nested \`WeakMap<spawn, Map<envHash, ProbeCache>>\`. Stable JSON over sorted env entries. P2 #41 — fallback grep read each candidate file fully via \`readFile\` then \`split('\\n')\`. The wall-clock budget only checked between files, so a single multi-GB log could OOM the process even with the regex DoS guards in place. Added a per-file \`FALLBACK_GREP_MAX_FILE_BYTES = 5 MiB\` cap (stat first, skip with sentinel if oversize) plus a deadline re-check after each read. Hosts needing larger files should install ripgrep. 830 tests passing across all suites (was 817), lint baseline unchanged.

Copilot AI review requested due to automatic review settings December 10, 2025 01:23

Copilot started reviewing on behalf of mludvig December 10, 2025 01:24 View session

Copilot AI reviewed Dec 10, 2025

View reviewed changes

Comment thread src/specs/fragmented-thinking.test.ts

Comment thread src/stream.ts

Comment thread src/specs/fragmented-thinking.test.ts Outdated

Comment thread src/specs/fragmented-thinking.test.ts Outdated

Comment thread src/agents/AgentContext.ts Outdated

mludvig force-pushed the feat/nova-thinking branch from a9abb21 to 0a493ef Compare December 10, 2025 03:09

mludvig added 6 commits January 27, 2026 17:57

test: Add test for fragmented thinking tag parsing

39f4236

Verifies that <thinking> tags are correctly detected and stripped from streamed content, with thinking content routed to THINK type and regular text routed to TEXT type.

test: Update reasoning test to expect stripped tags

fe19197

The state machine now strips <think>/<thinking> tags from content, so update the test to expect capture group 1 (content inside tags) instead of capture group 0 (entire match including tags).

fix: Flush thinking buffer at stream end in ModelEndHandler

c8f8c05

Ensures partial tags aren't lost when stream ends mid-buffer

test: Add more test cases for thinking tag handling

6e8ef9b

doc: Describe state machine states

6ec4814

mludvig force-pushed the feat/nova-thinking branch from 0a493ef to 6ec4814 Compare January 27, 2026 05:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve <think>/<thinking> tag identification in streaming responses#40

Improve <think>/<thinking> tag identification in streaming responses#40
mludvig wants to merge 6 commits into
danny-avila:mainfrom
mludvig:feat/nova-thinking

mludvig commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mludvig commented Dec 10, 2025

Uh oh!

mludvig commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mludvig commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mludvig commented Dec 10, 2025

Uh oh!

mludvig commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants