feature about chat completions reasoning, support gemini-3-pro thinking and support claude model enable thinking and interleaved thinking#163
Conversation
…g order when stream=false and exclude reasoning_opaque from token calculation in calculateMessageTokens
…llback VSCode version
…inking budget integration
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive support for reasoning/thinking blocks in the translation layer between OpenAI/Copilot and Anthropic message formats, with a focus on enabling "interleaved thinking" for Claude models. The changes include significant refactoring of the streaming and non-streaming translation logic, API versioning updates, and infrastructure improvements.
Key changes:
- Added signature field to thinking blocks and implemented bidirectional translation of reasoning content between OpenAI and Anthropic formats
- Introduced thinking budget calculation with min/max constraints and automatic injection of Claude-specific system prompts for interleaved thinking workflows
- Refactored streaming translation into separate handler functions for improved maintainability and added state tracking for thinking blocks
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| src/lib/api-config.ts | Updated Copilot API version to 2025-10-01, version to 0.35.0, changed openai-intent header, and refactored base URL logic to remove conditional for individual accounts |
| src/lib/tokenizer.ts | Added logic to skip reasoning_opaque field when calculating token counts |
| src/routes/messages/anthropic-types.ts | Added signature field to AnthropicThinkingBlock and thinkingBlockOpen state to AnthropicStreamState |
| src/routes/messages/handler.ts | Initialized thinkingBlockOpen field in streaming state |
| src/routes/messages/non-stream-translation.ts | Implemented thinking budget calculation, Claude-specific system prompt injection for interleaved thinking, enhanced assistant message handling to extract and filter thinking blocks with signatures, and updated response translation to include thinking blocks |
| src/routes/messages/stream-translation.ts | Major refactoring to separate concerns into handleMessageStart, handleThinkingText, handleContent, handleToolCalls, and handleFinish functions; added logic to handle reasoning_text and reasoning_opaque in streaming responses with proper thinking block state management |
| src/services/copilot/create-chat-completions.ts | Added reasoning_text, reasoning_opaque fields to Delta, ResponseMessage, and Message interfaces; added thinking_budget to ChatCompletionsPayload; exported Delta and Choice interfaces |
| src/services/copilot/get-models.ts | Added max_thinking_budget and min_thinking_budget fields to ModelSupports interface |
| src/services/get-vscode-version.ts | Updated fallback VSCode version from 1.104.3 to 1.107.0 |
| src/start.ts | Added idleTimeout: 0 configuration to prevent server timeout during idle periods |
| tests/anthropic-request.test.ts | Updated tests to include signature field in thinking blocks and verify reasoning_text instead of content for thinking content |
| tests/anthropic-response.test.ts | Added thinkingBlockOpen field initialization to streaming state in tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| events.push( | ||
| { | ||
| type: "content_block_delta", | ||
| index: state.contentBlockIndex, | ||
| delta: { | ||
| type: "signature_delta", | ||
| signature: "", | ||
| }, | ||
| }, | ||
| { | ||
| type: "content_block_stop", | ||
| index: state.contentBlockIndex, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
The signature_delta event is emitted with an empty string when closing a thinking block. According to Anthropic's streaming protocol, an empty signature may not be a valid value for signature_delta events. Consider either omitting this event entirely or using a proper signature value. If an empty signature is intentional for Claude models, add a comment explaining this behavior.
| events.push( | |
| { | |
| type: "content_block_delta", | |
| index: state.contentBlockIndex, | |
| delta: { | |
| type: "signature_delta", | |
| signature: "", | |
| }, | |
| }, | |
| { | |
| type: "content_block_stop", | |
| index: state.contentBlockIndex, | |
| }, | |
| ) | |
| events.push({ | |
| type: "content_block_stop", | |
| index: state.contentBlockIndex, | |
| }) |
|
|
||
| allTextBlocks.push(...textBlocks) | ||
| allToolUseBlocks.push(...toolUseBlocks) | ||
| assistantContentBlocks.push(...thinkBlocks, ...textBlocks, ...toolUseBlocks) |
There was a problem hiding this comment.
The thinking blocks are placed before text blocks in the response (line 362), which means the thinking content will always appear first in the response regardless of its original position. This may not accurately represent the interleaved thinking flow if text was generated before thinking or if there were multiple rounds of thinking and text. Consider tracking the original order of blocks or documenting why thinking blocks must always come first in the response.
| assistantContentBlocks.push(...thinkBlocks, ...textBlocks, ...toolUseBlocks) | |
| assistantContentBlocks.push(...textBlocks, ...thinkBlocks, ...toolUseBlocks) |
| }) | ||
| state.contentBlockOpen = false | ||
| state.contentBlockIndex++ | ||
| if (!toolBlockOpen) { |
There was a problem hiding this comment.
In the handleFinish function, when a content block is open and needs to be closed before finishing, the function calls handleReasoningOpaque only if the tool block is not open (line 67-69). However, this logic doesn't consider whether there's actually reasoning_opaque data in the delta. If choice.delta.reasoning_opaque is empty or undefined, handleReasoningOpaque will not emit any events, which is correct, but the conditional check creates unnecessary coupling. Consider adding a guard in handleReasoningOpaque itself or passing the delta more explicitly.
| if (!toolBlockOpen) { | |
| if (!toolBlockOpen && choice.delta?.reasoning_opaque) { |
| content: | ||
| "<system-reminder>Please strictly follow Interleaved thinking</system-reminder>", | ||
| } as Message | ||
| return [...systemMessages, thinkingMessage, ...otherMessages] |
There was a problem hiding this comment.
The system-reminder message is placed immediately after system messages and before all other messages. This placement may break the expected message order when there are existing user/assistant message exchanges. The reminder should ideally be placed at the end of the messages array to avoid disrupting the conversation flow, or inserted more strategically based on the context. Consider moving it to the end or documenting why this specific placement is required.
| return [...systemMessages, thinkingMessage, ...otherMessages] | |
| return [...systemMessages, ...otherMessages, thinkingMessage] |
| const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget) | ||
| return Math.max( | ||
| budgetTokens, | ||
| model.capabilities.supports.min_thinking_budget ?? 1024, | ||
| ) |
There was a problem hiding this comment.
The thinking budget calculation could return values that don't satisfy the minimum requirement. When thinking.budget_tokens is less than min_thinking_budget, the Math.max ensures the minimum is met. However, this could exceed maxThinkingBudget if the minimum is larger than the maximum. Consider validating that min_thinking_budget <= maxThinkingBudget before the calculation, or returning undefined if the constraints cannot be satisfied.
| const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget) | |
| return Math.max( | |
| budgetTokens, | |
| model.capabilities.supports.min_thinking_budget ?? 1024, | |
| ) | |
| const minThinkingBudget = | |
| model.capabilities.supports.min_thinking_budget ?? 1024 | |
| // If the minimum required budget exceeds the maximum allowed, the | |
| // constraints cannot be satisfied; fall back to no thinking budget. | |
| if (minThinkingBudget > maxThinkingBudget) { | |
| return undefined | |
| } | |
| const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget) | |
| return Math.max(budgetTokens, minThinkingBudget) |
| }, | ||
| ) | ||
| state.contentBlockIndex++ | ||
| state.thinkingBlockOpen = false |
There was a problem hiding this comment.
The thinking block state is set to false without checking if it was actually open. On line 237, state.thinkingBlockOpen is set to false unconditionally, but there's no check ensuring it was true before. This could lead to inconsistent state tracking. Consider only setting it to false if it was previously true, or add assertions to ensure the state transitions are valid.
| state.thinkingBlockOpen = false | |
| if (state.thinkingBlockOpen) { | |
| state.thinkingBlockOpen = false | |
| } |
| function getThinkingBudget( | ||
| payload: AnthropicMessagesPayload, | ||
| model: Model | undefined, | ||
| ): number | undefined { | ||
| const thinking = payload.thinking | ||
| if (model && thinking) { | ||
| const maxThinkingBudget = Math.min( | ||
| model.capabilities.supports.max_thinking_budget ?? 0, | ||
| (model.capabilities.limits.max_output_tokens ?? 0) - 1, | ||
| ) | ||
| if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) { | ||
| const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget) | ||
| return Math.max( | ||
| budgetTokens, | ||
| model.capabilities.supports.min_thinking_budget ?? 1024, | ||
| ) | ||
| } | ||
| } | ||
| return undefined | ||
| } |
There was a problem hiding this comment.
The new thinking budget calculation logic (getThinkingBudget function) and interleaved thinking prompt injection for Claude models lack test coverage. These are significant new features that handle complex logic including min/max budget constraints and model-specific behavior. Consider adding tests that verify: 1) budget calculation with various model capabilities, 2) the system prompt injection for Claude models with thinking enabled, 3) the system-reminder message insertion.
| index: state.contentBlockIndex, | ||
| }) | ||
| state.contentBlockIndex++ | ||
| state.contentBlockOpen = false |
There was a problem hiding this comment.
In handleReasoningOpaqueInToolCalls, when closing a content block that is not a tool block, the function doesn't check or update the thinkingBlockOpen state. If a thinking block was open when tool calls arrive, this could lead to inconsistent state where thinkingBlockOpen remains true even though the block has been closed. Consider calling closeThinkingBlockIfOpen before handling reasoning_opaque in tool calls, or explicitly set thinkingBlockOpen to false when appropriate.
| state.contentBlockOpen = false | |
| state.contentBlockOpen = false | |
| // Ensure thinking block state is consistent when closing a non-tool content block. | |
| state.thinkingBlockOpen = false |
| state.accountType === "individual" ? | ||
| "https://api.githubcopilot.com" | ||
| : `https://api.${state.accountType}.githubcopilot.com` | ||
| `https://api.${state.accountType}.githubcopilot.com` |
There was a problem hiding this comment.
The refactored copilotBaseUrl function now unconditionally uses the template https://api.${state.accountType}.githubcopilot.com, which will result in https://api.individual.githubcopilot.com for individual accounts. The previous implementation used https://api.githubcopilot.com (without the subdomain) for individual accounts. This is a breaking change that may cause API requests to fail for individual account users. Please verify that the GitHub Copilot API supports the new URL format for individual accounts, or restore the conditional logic.
| `https://api.${state.accountType}.githubcopilot.com` | |
| state.accountType === "individual" | |
| ? "https://api.githubcopilot.com" | |
| : `https://api.${state.accountType}.githubcopilot.com` |
| let thinkingBlocks = message.content.filter( | ||
| (block): block is AnthropicThinkingBlock => block.type === "thinking", | ||
| ) | ||
|
|
||
| // Combine text and thinking blocks, as OpenAI doesn't have separate thinking blocks | ||
| const allTextContent = [ | ||
| ...textBlocks.map((b) => b.text), | ||
| ...thinkingBlocks.map((b) => b.thinking), | ||
| ].join("\n\n") | ||
| if (modelId.startsWith("claude")) { | ||
| thinkingBlocks = thinkingBlocks.filter( | ||
| (b) => | ||
| b.thinking | ||
| && b.thinking.length > 0 | ||
| && b.signature | ||
| && b.signature.length > 0 | ||
| // gpt signature has @ in it, so filter those out for claude models | ||
| && !b.signature.includes("@"), | ||
| ) | ||
| } | ||
|
|
||
| const thinkingContents = thinkingBlocks | ||
| .filter((b) => b.thinking && b.thinking.length > 0) | ||
| .map((b) => b.thinking) |
There was a problem hiding this comment.
The thinking blocks are filtered twice - once on line 200-202 to extract all thinking blocks, then again on lines 216-218 to filter those with non-empty thinking content. This redundant filtering is inefficient. Consider combining these filters or restructuring the logic to avoid processing the same blocks multiple times.
|
@zzb54321 master branch works ok? |
yeah master is working well |
@zzb54321 you are not individual plan? if it works , i will commit code to fix it. |
|
@caozhiyuan Hello. Happy new year... where and how do you get API_VERSION and COPILOT_VERSION ? thanks. |
@caozhiyuan It's working now after applying the suggested fix. BTW, how I can verify gemini-3-pro thinking is working on chat client side? Does it work on the openai-compatible api, or only on the Anthropic format api? |
@zzb54321 use message api. It is not a standard OpenAI-compatible protocol . you can use --account-type business if not applying the suggested fix. |
…reasoning_opaque in different deltas
When account type is not specified or set to 'individual', use the default api.githubcopilot.com URL instead of constructing a subdomain-based URL. This restores previous behavior where business users could work without explicitly specifying their account type, as the default URL works for both individual and business accounts. Only constructs account-type-specific URLs (api.business.githubcopilot.com, api.enterprise.githubcopilot.com) when those account types are explicitly specified.
fix: use default API URL when account type is individual
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (reasoningText && reasoningText.length > 0) { | ||
| return [ | ||
| { | ||
| type: "thinking", | ||
| thinking: reasoningText, | ||
| signature: reasoningOpaque || "", |
There was a problem hiding this comment.
The signature field in AnthropicThinkingBlock is now required (line 59 in anthropic-types.ts), but when reasoningOpaque is not provided, it defaults to an empty string (line 431). This is a breaking API change that could affect API consumers. Consider: 1) making the signature field optional to maintain backwards compatibility, 2) documenting that signature can be an empty string and what that means semantically, or 3) only including thinking blocks when both thinking and signature are non-empty to avoid exposing incomplete thinking blocks.
| if (reasoningText && reasoningText.length > 0) { | |
| return [ | |
| { | |
| type: "thinking", | |
| thinking: reasoningText, | |
| signature: reasoningOpaque || "", | |
| if ( | |
| reasoningText && | |
| reasoningText.length > 0 && | |
| reasoningOpaque && | |
| reasoningOpaque.length > 0 | |
| ) { | |
| return [ | |
| { | |
| type: "thinking", | |
| thinking: reasoningText, | |
| signature: reasoningOpaque, |
| handleReasoningOpaque(choice.delta, events, state) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
In the handleFinish function, when a finish_reason is received, the code checks if contentBlockOpen is true and closes it, but it doesn't check if thinkingBlockOpen is true. This means if a thinking block is still open when the message finishes (which could happen if reasoning_text arrives without a subsequent reasoning_opaque or content), the thinking block won't be properly closed, leaving the stream in an inconsistent state. Consider adding a check for state.thinkingBlockOpen and calling closeThinkingBlockIfOpen(state, events) before closing the message.
| if (state.thinkingBlockOpen) { | |
| closeThinkingBlockIfOpen(state, events) | |
| } |
| // handle for claude model | ||
| if ( | ||
| delta.content === "" | ||
| && delta.reasoning_opaque | ||
| && delta.reasoning_opaque.length > 0 | ||
| && state.thinkingBlockOpen | ||
| ) { |
There was a problem hiding this comment.
The comment on line 216 states "handle for claude model", but the code that follows (lines 217-222) doesn't actually check if the model is a Claude model. This logic will execute for any model that sends an empty content string with reasoning_opaque when a thinking block is open. Either add a model check (e.g., checking if the model ID starts with "claude") or update the comment to accurately reflect that this is a general handling for a specific streaming pattern, not Claude-specific behavior.
| extraPrompt = ` | ||
| <interleaved_thinking_protocol> | ||
| ABSOLUTE REQUIREMENT - NON-NEGOTIABLE: | ||
| The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully , MUST output a thinking block |
There was a problem hiding this comment.
There's a grammatical issue on line 135: "think carefully ," has an extra space before the comma. It should be "think carefully," without the space.
| The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully , MUST output a thinking block | |
| The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully, MUST output a thinking block |
| delta.content = delta.reasoning_text | ||
| delta.reasoning_text = undefined | ||
| return |
There was a problem hiding this comment.
Direct mutation of the delta object is problematic here. The function modifies the incoming delta parameter by setting delta.content = delta.reasoning_text and delta.reasoning_text = undefined. This mutates shared state that may be used elsewhere in the call stack, potentially causing unexpected side effects or making debugging difficult. Consider creating a copy of the delta object or handling this case differently without mutation, such as by tracking the state separately or processing the reasoning_text as intended.
| if (modelId.startsWith("claude") && thinkingBudget) { | ||
| const reminder = | ||
| "<system-reminder>you MUST follow interleaved_thinking_protocol</system-reminder>" | ||
| const firstUserIndex = otherMessages.findIndex((m) => m.role === "user") | ||
| if (firstUserIndex !== -1) { | ||
| const userMessage = otherMessages[firstUserIndex] | ||
| if (typeof userMessage.content === "string") { | ||
| userMessage.content = reminder + "\n\n" + userMessage.content | ||
| } else if (Array.isArray(userMessage.content)) { | ||
| userMessage.content = [ | ||
| { type: "text", text: reminder }, | ||
| ...userMessage.content, | ||
| ] as Array<ContentPart> | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The system prompt injection logic (lines 102-117 and 131-143) only activates when thinkingBudget is truthy. However, getThinkingBudget returns undefined in several cases: when model is not found, when payload.thinking is not provided, when thinking.budget_tokens is undefined, or when maxThinkingBudget is 0 or negative. This means the interleaved thinking protocol instructions won't be injected unless all these conditions are met. Consider whether the protocol instructions should be injected whenever payload.thinking exists, regardless of budget calculation success, or document this behavior clearly so users understand when thinking protocol is enabled.
|
|
||
| // handle for claude model | ||
| if ( | ||
| delta.content === "" |
There was a problem hiding this comment.
The condition on line 218 checks if delta.content === "", which only matches exactly an empty string. However, this doesn't handle cases where delta.content is null or undefined. If the API can send delta.content as null or undefined along with reasoning_opaque, this condition won't match and the logic won't execute. Consider using !delta.content or explicitly checking for all falsy values: (delta.content === "" || delta.content === null || delta.content === undefined).
| delta.content === "" | |
| (delta.content === "" || delta.content == null) |
| if (modelId.startsWith("claude")) { | ||
| thinkingBlocks = thinkingBlocks.filter( | ||
| (b) => | ||
| b.thinking | ||
| && b.thinking.length > 0 | ||
| && b.signature | ||
| && b.signature.length > 0 | ||
| // gpt signature has @ in it, so filter those out for claude models | ||
| && !b.signature.includes("@"), | ||
| ) | ||
| } | ||
|
|
||
| const thinkingContents = thinkingBlocks | ||
| .filter((b) => b.thinking && b.thinking.length > 0) | ||
| .map((b) => b.thinking) |
There was a problem hiding this comment.
The filtering on line 231 checks if b.thinking && b.thinking.length > 0, which is redundant for Claude models because the same check was already done on lines 221-222. While this doesn't cause incorrect behavior, it adds unnecessary processing. Consider restructuring to avoid double filtering - for example, apply the thinking content filter before the Claude-specific signature filter, or ensure thinking blocks always have valid thinking content when they're created.
| if (modelId.startsWith("claude")) { | |
| thinkingBlocks = thinkingBlocks.filter( | |
| (b) => | |
| b.thinking | |
| && b.thinking.length > 0 | |
| && b.signature | |
| && b.signature.length > 0 | |
| // gpt signature has @ in it, so filter those out for claude models | |
| && !b.signature.includes("@"), | |
| ) | |
| } | |
| const thinkingContents = thinkingBlocks | |
| .filter((b) => b.thinking && b.thinking.length > 0) | |
| .map((b) => b.thinking) | |
| // First, ensure all thinking blocks have non-empty thinking content | |
| thinkingBlocks = thinkingBlocks.filter( | |
| (b) => b.thinking && b.thinking.length > 0, | |
| ) | |
| if (modelId.startsWith("claude")) { | |
| thinkingBlocks = thinkingBlocks.filter( | |
| (b) => | |
| b.signature | |
| && b.signature.length > 0 | |
| // gpt signature has @ in it, so filter those out for claude models | |
| && !b.signature.includes("@"), | |
| ) | |
| } | |
| const thinkingContents = thinkingBlocks.map((b) => b.thinking) |
| function getThinkingBudget( | ||
| payload: AnthropicMessagesPayload, | ||
| model: Model | undefined, | ||
| ): number | undefined { | ||
| const thinking = payload.thinking | ||
| if (model && thinking) { | ||
| const maxThinkingBudget = Math.min( | ||
| model.capabilities.supports.max_thinking_budget ?? 0, | ||
| (model.capabilities.limits.max_output_tokens ?? 0) - 1, | ||
| ) | ||
| if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) { | ||
| const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget) | ||
| return Math.max( | ||
| budgetTokens, | ||
| model.capabilities.supports.min_thinking_budget ?? 1024, | ||
| ) | ||
| } | ||
| } | ||
| return undefined | ||
| } | ||
|
|
||
| function translateModelName(model: string): string { | ||
| // Subagent requests use a specific model number which Copilot doesn't support | ||
| if (model.startsWith("claude-sonnet-4-")) { | ||
| return model.replace(/^claude-sonnet-4-.*/, "claude-sonnet-4") | ||
| } else if (model.startsWith("claude-opus-")) { | ||
| } else if (model.startsWith("claude-opus-4-")) { | ||
| return model.replace(/^claude-opus-4-.*/, "claude-opus-4") | ||
| } | ||
| return model | ||
| } | ||
|
|
||
| function translateAnthropicMessagesToOpenAI( | ||
| anthropicMessages: Array<AnthropicMessage>, | ||
| system: string | Array<AnthropicTextBlock> | undefined, | ||
| payload: AnthropicMessagesPayload, | ||
| modelId: string, | ||
| thinkingBudget: number | undefined, | ||
| ): Array<Message> { | ||
| const systemMessages = handleSystemPrompt(system) | ||
|
|
||
| const otherMessages = anthropicMessages.flatMap((message) => | ||
| const systemMessages = handleSystemPrompt( | ||
| payload.system, | ||
| modelId, | ||
| thinkingBudget, | ||
| ) | ||
| const otherMessages = payload.messages.flatMap((message) => | ||
| message.role === "user" ? | ||
| handleUserMessage(message) | ||
| : handleAssistantMessage(message), | ||
| : handleAssistantMessage(message, modelId), | ||
| ) | ||
|
|
||
| if (modelId.startsWith("claude") && thinkingBudget) { | ||
| const reminder = | ||
| "<system-reminder>you MUST follow interleaved_thinking_protocol</system-reminder>" | ||
| const firstUserIndex = otherMessages.findIndex((m) => m.role === "user") | ||
| if (firstUserIndex !== -1) { | ||
| const userMessage = otherMessages[firstUserIndex] | ||
| if (typeof userMessage.content === "string") { | ||
| userMessage.content = reminder + "\n\n" + userMessage.content | ||
| } else if (Array.isArray(userMessage.content)) { | ||
| userMessage.content = [ | ||
| { type: "text", text: reminder }, | ||
| ...userMessage.content, | ||
| ] as Array<ContentPart> | ||
| } | ||
| } | ||
| } | ||
| return [...systemMessages, ...otherMessages] | ||
| } | ||
|
|
||
| function handleSystemPrompt( | ||
| system: string | Array<AnthropicTextBlock> | undefined, | ||
| modelId: string, | ||
| thinkingBudget: number | undefined, | ||
| ): Array<Message> { | ||
| if (!system) { | ||
| return [] | ||
| } | ||
|
|
||
| let extraPrompt = "" | ||
| if (modelId.startsWith("claude") && thinkingBudget) { | ||
| extraPrompt = ` | ||
| <interleaved_thinking_protocol> | ||
| ABSOLUTE REQUIREMENT - NON-NEGOTIABLE: | ||
| The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully , MUST output a thinking block | ||
| RULES: | ||
| Tool result → thinking block (ALWAYS, no exceptions) | ||
| This is NOT optional - it is a hard requirement | ||
| The thinking block must contain substantive reasoning (minimum 3-5 sentences) | ||
| Think about: what the results mean, what to do next, how to answer the user | ||
| NEVER skip this step, even if the result seems simple or obvious | ||
| </interleaved_thinking_protocol>` | ||
| } |
There was a problem hiding this comment.
The new thinking budget calculation logic (lines 56-75) and system prompt injection logic (lines 102-117, 131-143) lack test coverage. These are critical features that manipulate model behavior and user inputs. Consider adding tests that verify: 1) thinking budget is correctly calculated when thinking.budget_tokens is provided, 2) thinking budget respects min/max boundaries from model capabilities, 3) system prompt injection happens only for Claude models with thinking budget, 4) the interleaved thinking protocol reminder is correctly prepended to the first user message.
| function handleReasoningOpaque( | ||
| delta: Delta, | ||
| events: Array<AnthropicStreamEventData>, | ||
| state: AnthropicStreamState, | ||
| ) { | ||
| if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) { | ||
| events.push( | ||
| { | ||
| type: "message_stop", | ||
| type: "content_block_start", | ||
| index: state.contentBlockIndex, | ||
| content_block: { | ||
| type: "thinking", | ||
| thinking: "", | ||
| }, | ||
| }, | ||
| { | ||
| type: "content_block_delta", | ||
| index: state.contentBlockIndex, | ||
| delta: { | ||
| type: "thinking_delta", | ||
| thinking: "", | ||
| }, | ||
| }, | ||
| { | ||
| type: "content_block_delta", | ||
| index: state.contentBlockIndex, | ||
| delta: { | ||
| type: "signature_delta", | ||
| signature: delta.reasoning_opaque, | ||
| }, | ||
| }, | ||
| { | ||
| type: "content_block_stop", | ||
| index: state.contentBlockIndex, | ||
| }, | ||
| ) | ||
| state.contentBlockIndex++ | ||
| } | ||
| } | ||
|
|
||
| return events | ||
| function handleThinkingText( | ||
| delta: Delta, | ||
| state: AnthropicStreamState, | ||
| events: Array<AnthropicStreamEventData>, | ||
| ) { | ||
| if (delta.reasoning_text && delta.reasoning_text.length > 0) { | ||
| // compatible with copilot API returning content->reasoning_text->reasoning_opaque in different deltas | ||
| // this is an extremely abnormal situation, probably a server-side bug | ||
| // only occurs in the claude model, with a very low probability of occurrence | ||
| if (state.contentBlockOpen) { | ||
| delta.content = delta.reasoning_text | ||
| delta.reasoning_text = undefined | ||
| return | ||
| } | ||
|
|
||
| if (!state.thinkingBlockOpen) { | ||
| events.push({ | ||
| type: "content_block_start", | ||
| index: state.contentBlockIndex, | ||
| content_block: { | ||
| type: "thinking", | ||
| thinking: "", | ||
| }, | ||
| }) | ||
| state.thinkingBlockOpen = true | ||
| } | ||
|
|
||
| events.push({ | ||
| type: "content_block_delta", | ||
| index: state.contentBlockIndex, | ||
| delta: { | ||
| type: "thinking_delta", | ||
| thinking: delta.reasoning_text, | ||
| }, | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| function closeThinkingBlockIfOpen( | ||
| state: AnthropicStreamState, | ||
| events: Array<AnthropicStreamEventData>, | ||
| ): void { | ||
| if (state.thinkingBlockOpen) { | ||
| events.push( | ||
| { | ||
| type: "content_block_delta", | ||
| index: state.contentBlockIndex, | ||
| delta: { | ||
| type: "signature_delta", | ||
| signature: "", | ||
| }, | ||
| }, | ||
| { | ||
| type: "content_block_stop", | ||
| index: state.contentBlockIndex, | ||
| }, | ||
| ) | ||
| state.contentBlockIndex++ | ||
| state.thinkingBlockOpen = false | ||
| } | ||
| } |
There was a problem hiding this comment.
The new streaming translation logic for handling thinking blocks and reasoning_opaque (lines 275-313, 315-375) lacks test coverage. This is critical functionality that manages complex state transitions during streaming, including thinking block opening/closing and signature handling. Consider adding tests that verify: 1) reasoning_text is correctly translated to thinking_delta events, 2) reasoning_opaque creates appropriate signature_delta events, 3) thinking blocks are properly closed before content or tool call blocks, 4) the state.thinkingBlockOpen flag is managed correctly throughout the streaming lifecycle.

This pull request introduces significant improvements to how "thinking blocks" are handled and translated between Anthropic and OpenAI message formats. The changes ensure that reasoning and signatures are preserved during translation, add support for model-specific thinking budgets, and update protocol reminders for Claude models. Additionally, there are updates to API versioning and header intent values.
Thinking block and reasoning support:
signaturetoAnthropicThinkingBlockand support forthinkingBlockOpeninAnthropicStreamStateto track reasoning blocks in streaming and non-streaming message translations. [1] [2] [3]reasoning_textandreasoning_opaque(signature) when converting between Anthropic and OpenAI formats, including new functions for extracting and injecting thinking blocks. [1] [2] [3] [4] [5]Model-specific protocol and budget enforcement:
thinking_budgetbased on model capabilities. [1] [2]API and header updates:
openai-intentheader fromconversation-paneltoconversation-agentfor requests. [1] [2]Token counting improvements:
reasoning_opaquefrom token counting when calculating message tokens to avoid miscounting opaque reasoning signatures.Streaming translation refactor:
These updates collectively enhance the fidelity and protocol compliance of message translations between Anthropic and OpenAI, especially for Claude models.