Improve <think>/<thinking> tag identification in streaming responses#40
Open
mludvig wants to merge 6 commits into
Open
Improve <think>/<thinking> tag identification in streaming responses#40mludvig wants to merge 6 commits into
mludvig wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces the previous batch-processing approach to detecting <think> and <thinking> tags with a streaming state machine that handles tags split across multiple chunks. This is essential for models like Amazon Nova that emit these tags in fragments rather than complete units.
Key changes:
- Implements a 4-state machine ('normal', 'buffering_open', 'thinking', 'buffering_close') to detect and strip thinking tags from streaming content
- Removes the old
parseThinkingContentfunction that required complete strings - Adds state tracking fields (
thinkingState,tagBuffer) to AgentContext for maintaining parser state across chunks
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/stream.ts | Removed old batch parseThinkingContent function and replaced conditional logic with streaming state machine that processes content character-by-character while buffering potential tag sequences |
| src/specs/fragmented-thinking.test.ts | Added new test file to verify the state machine correctly handles thinking tags split by whitespace boundaries using the fake streaming model |
| src/agents/AgentContext.ts | Added thinkingState and tagBuffer fields to maintain state machine context across streaming chunks |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a9abb21 to
0a493ef
Compare
Author
|
@danny-avila copilot comments resolved |
Author
|
Hi @danny-avila |
Add state machine to detect <thinking> and <think> tags that arrive split across multiple streaming chunks. This fixes Amazon Nova and similar models that send tokens like '<thinking', '>', content, '</', 'thinking', '>' as separate stream events. The state machine buffers content starting with '<' and waits for enough characters to determine if it's a thinking tag before routing to TEXT or THINK content types.
Verifies that <thinking> tags are correctly detected and stripped from streamed content, with thinking content routed to THINK type and regular text routed to TEXT type.
The state machine now strips <think>/<thinking> tags from content, so update the test to expect capture group 1 (content inside tags) instead of capture group 0 (entire match including tags).
Ensures partial tags aren't lost when stream ends mid-buffer
0a493ef to
6ec4814
Compare
danny-avila
added a commit
that referenced
this pull request
May 5, 2026
Audit verified: all 5 valid (real bypass shapes / API contract violations / OOM risks). P1 #37 — destructive path normalization. Patterns like \`rm -rf \$HOME/\`, \`rm -rf ~/\`, \`rm -rf "\$HOME/"\` slipped past the bare + quoted destructive guards because the trailing slash broke the end-anchor / quote-pair shapes. Extracted a shared \`DESTRUCTIVE_TARGET\` (\`(?:\\/|~|\\\$\\{?HOME\\}?|\\.)\\/?\`) used by both pattern lists so spelling equivalences are kept consistent. 9 tests pinned (all the spelling variants + a benign no-regression). P2 #38 — \`resolveWorkspacePathSafe\` used host \`fs/promises.realpath\` instead of the configured \`WorkspaceFS.realpath\`. On a custom or remote engine the host realpath would fail and silently fall back to lexical containment, leaving the symlink-escape clamp ineffective. \`realpathOrSelf\` and \`realpathOfPathOrAncestor\` now take the realpath impl as a parameter; \`resolveWorkspacePathSafe\` threads \`getWorkspaceFS(config).realpath\` through both. P2 #39 — direct-path \`additionalContexts\` were silently swallowed. Hosts that returned \`additionalContext\` from PreToolUse / PostToolUse / PostToolUseFailure for direct tools (which is every local-engine tool) had their context discarded — broken hook API contract. Added \`RunToolBatchContext.additionalContextsSink\`; \`runDirectToolWithLifecycleHooks\` pushes hook contexts into it; \`run()\` materializes the accumulated strings as a single \`HumanMessage\` appended to outputs, matching the event-driven path's \`injected[]\` shape. PostToolUseFailure was also changed from fire-and-forget to await so its contexts are captured (the hook is still observational w.r.t. the tool result). P2 #40 — syntax-check probe cache was keyed only on spawn backend. Same shape as P1 #34 (rg cache). Now nested \`WeakMap<spawn, Map<envHash, ProbeCache>>\`. Stable JSON over sorted env entries. P2 #41 — fallback grep read each candidate file fully via \`readFile\` then \`split('\\n')\`. The wall-clock budget only checked between files, so a single multi-GB log could OOM the process even with the regex DoS guards in place. Added a per-file \`FALLBACK_GREP_MAX_FILE_BYTES = 5 MiB\` cap (stat first, skip with sentinel if oversize) plus a deadline re-check after each read. Hosts needing larger files should install ripgrep. 830 tests passing across all suites (was 817), lint baseline unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Some models like Amazon Nova send their thoughts wrapped in .. or .. tags. It is not guaranteed that these strings arrive as single streaming events, in fact more often than not they don't and we may receive '<', 'thinking', '>' tokens separately.
This patch introduces a state machine that deals with such a fragmented stream, identifies the tags and properly emits the thoughts content as
ContentTypes.THINKfor a correct frontend rendering. Fixes the output from at least Amazon Nova and quite likely from other models that emit these tags too.