feature about chat completions reasoning, support gemini-3-pro thinking and support claude model enable thinking and interleaved thinking by caozhiyuan · Pull Request #163 · ericc-ch/copilot-api

caozhiyuan · 2025-12-31T07:29:53Z

This pull request introduces significant improvements to how "thinking blocks" are handled and translated between Anthropic and OpenAI message formats. The changes ensure that reasoning and signatures are preserved during translation, add support for model-specific thinking budgets, and update protocol reminders for Claude models. Additionally, there are updates to API versioning and header intent values.

Thinking block and reasoning support:

Added signature to AnthropicThinkingBlock and support for thinkingBlockOpen in AnthropicStreamState to track reasoning blocks in streaming and non-streaming message translations. [1] [2] [3]
Updated translation logic to preserve reasoning_text and reasoning_opaque (signature) when converting between Anthropic and OpenAI formats, including new functions for extracting and injecting thinking blocks. [1] [2] [3] [4] [5]

Model-specific protocol and budget enforcement:

Added logic to inject system reminders and enforce interleaved thinking protocol for Claude models, and to calculate and pass a thinking_budget based on model capabilities. [1] [2]

API and header updates:

Updated Copilot and API version numbers, and changed openai-intent header from conversation-panel to conversation-agent for requests. [1] [2]

Token counting improvements:

Excluded reasoning_opaque from token counting when calculating message tokens to avoid miscounting opaque reasoning signatures.

Streaming translation refactor:

Refactored streaming translation logic to support thinking block handling and reasoning extraction, improving event sequencing for message starts, content, tool calls, and finish events. [1] [2] [3]

These updates collectively enhance the fidelity and protocol compliance of message translations between Anthropic and OpenAI, especially for Claude models.

…g order when stream=false and exclude reasoning_opaque from token calculation in calculateMessageTokens

…ool block state

…arity

…llback VSCode version

… 1.107.0

…alue

…inking budget integration

Copilot

Pull request overview

This pull request adds comprehensive support for reasoning/thinking blocks in the translation layer between OpenAI/Copilot and Anthropic message formats, with a focus on enabling "interleaved thinking" for Claude models. The changes include significant refactoring of the streaming and non-streaming translation logic, API versioning updates, and infrastructure improvements.

Key changes:

Added signature field to thinking blocks and implemented bidirectional translation of reasoning content between OpenAI and Anthropic formats
Introduced thinking budget calculation with min/max constraints and automatic injection of Claude-specific system prompts for interleaved thinking workflows
Refactored streaming translation into separate handler functions for improved maintainability and added state tracking for thinking blocks

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
src/lib/api-config.ts	Updated Copilot API version to 2025-10-01, version to 0.35.0, changed openai-intent header, and refactored base URL logic to remove conditional for individual accounts
src/lib/tokenizer.ts	Added logic to skip reasoning_opaque field when calculating token counts
src/routes/messages/anthropic-types.ts	Added signature field to AnthropicThinkingBlock and thinkingBlockOpen state to AnthropicStreamState
src/routes/messages/handler.ts	Initialized thinkingBlockOpen field in streaming state
src/routes/messages/non-stream-translation.ts	Implemented thinking budget calculation, Claude-specific system prompt injection for interleaved thinking, enhanced assistant message handling to extract and filter thinking blocks with signatures, and updated response translation to include thinking blocks
src/routes/messages/stream-translation.ts	Major refactoring to separate concerns into handleMessageStart, handleThinkingText, handleContent, handleToolCalls, and handleFinish functions; added logic to handle reasoning_text and reasoning_opaque in streaming responses with proper thinking block state management
src/services/copilot/create-chat-completions.ts	Added reasoning_text, reasoning_opaque fields to Delta, ResponseMessage, and Message interfaces; added thinking_budget to ChatCompletionsPayload; exported Delta and Choice interfaces
src/services/copilot/get-models.ts	Added max_thinking_budget and min_thinking_budget fields to ModelSupports interface
src/services/get-vscode-version.ts	Updated fallback VSCode version from 1.104.3 to 1.107.0
src/start.ts	Added idleTimeout: 0 configuration to prevent server timeout during idle periods
tests/anthropic-request.test.ts	Updated tests to include signature field in thinking blocks and verify reasoning_text instead of content for thinking content
tests/anthropic-response.test.ts	Added thinkingBlockOpen field initialization to streaming state in tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-31T07:34:43Z

+    events.push(
+      {
+        type: "content_block_delta",
+        index: state.contentBlockIndex,
+        delta: {
+          type: "signature_delta",
+          signature: "",
+        },
+      },
+      {
+        type: "content_block_stop",
+        index: state.contentBlockIndex,
+      },
+    )


The signature_delta event is emitted with an empty string when closing a thinking block. According to Anthropic's streaming protocol, an empty signature may not be a valid value for signature_delta events. Consider either omitting this event entirely or using a proper signature value. If an empty signature is intentional for Claude models, add a comment explaining this behavior.

Suggested change

events.push(

{

type: "content_block_delta",

index: state.contentBlockIndex,

delta: {

type: "signature_delta",

signature: "",

},

},

{

type: "content_block_stop",

index: state.contentBlockIndex,

},

)

events.push({

type: "content_block_stop",

index: state.contentBlockIndex,

})

Copilot · 2025-12-31T07:34:43Z


-    allTextBlocks.push(...textBlocks)
-    allToolUseBlocks.push(...toolUseBlocks)
+    assistantContentBlocks.push(...thinkBlocks, ...textBlocks, ...toolUseBlocks)


The thinking blocks are placed before text blocks in the response (line 362), which means the thinking content will always appear first in the response regardless of its original position. This may not accurately represent the interleaved thinking flow if text was generated before thinking or if there were multiple rounds of thinking and text. Consider tracking the original order of blocks or documenting why thinking blocks must always come first in the response.

Suggested change

assistantContentBlocks.push(...thinkBlocks, ...textBlocks, ...toolUseBlocks)

assistantContentBlocks.push(...textBlocks, ...thinkBlocks, ...toolUseBlocks)

Copilot · 2025-12-31T07:34:43Z

+      })
+      state.contentBlockOpen = false
+      state.contentBlockIndex++
+      if (!toolBlockOpen) {


In the handleFinish function, when a content block is open and needs to be closed before finishing, the function calls handleReasoningOpaque only if the tool block is not open (line 67-69). However, this logic doesn't consider whether there's actually reasoning_opaque data in the delta. If choice.delta.reasoning_opaque is empty or undefined, handleReasoningOpaque will not emit any events, which is correct, but the conditional check creates unnecessary coupling. Consider adding a guard in handleReasoningOpaque itself or passing the delta more explicitly.

Suggested change

if (!toolBlockOpen) {

if (!toolBlockOpen && choice.delta?.reasoning_opaque) {

Copilot · 2025-12-31T07:34:44Z

+      content:
+        "<system-reminder>Please strictly follow Interleaved thinking</system-reminder>",
+    } as Message
+    return [...systemMessages, thinkingMessage, ...otherMessages]


The system-reminder message is placed immediately after system messages and before all other messages. This placement may break the expected message order when there are existing user/assistant message exchanges. The reminder should ideally be placed at the end of the messages array to avoid disrupting the conversation flow, or inserted more strategically based on the context. Consider moving it to the end or documenting why this specific placement is required.

Suggested change

return [...systemMessages, thinkingMessage, ...otherMessages]

return [...systemMessages, ...otherMessages, thinkingMessage]

Copilot · 2025-12-31T07:34:44Z

+      const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)
+      return Math.max(
+        budgetTokens,
+        model.capabilities.supports.min_thinking_budget ?? 1024,
+      )


The thinking budget calculation could return values that don't satisfy the minimum requirement. When thinking.budget_tokens is less than min_thinking_budget, the Math.max ensures the minimum is met. However, this could exceed maxThinkingBudget if the minimum is larger than the maximum. Consider validating that min_thinking_budget <= maxThinkingBudget before the calculation, or returning undefined if the constraints cannot be satisfied.

Suggested change

const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)

return Math.max(

budgetTokens,

model.capabilities.supports.min_thinking_budget ?? 1024,

)

const minThinkingBudget =

model.capabilities.supports.min_thinking_budget ?? 1024

// If the minimum required budget exceeds the maximum allowed, the

// constraints cannot be satisfied; fall back to no thinking budget.

if (minThinkingBudget > maxThinkingBudget) {

return undefined

}

const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)

return Math.max(budgetTokens, minThinkingBudget)

Copilot · 2025-12-31T07:34:45Z

+      },
+    )
+    state.contentBlockIndex++
+    state.thinkingBlockOpen = false


The thinking block state is set to false without checking if it was actually open. On line 237, state.thinkingBlockOpen is set to false unconditionally, but there's no check ensuring it was true before. This could lead to inconsistent state tracking. Consider only setting it to false if it was previously true, or add assertions to ensure the state transitions are valid.

Suggested change

state.thinkingBlockOpen = false

if (state.thinkingBlockOpen) {

state.thinkingBlockOpen = false

}

Copilot · 2025-12-31T07:34:45Z

+function getThinkingBudget(
+  payload: AnthropicMessagesPayload,
+  model: Model | undefined,
+): number | undefined {
+  const thinking = payload.thinking
+  if (model && thinking) {
+    const maxThinkingBudget = Math.min(
+      model.capabilities.supports.max_thinking_budget ?? 0,
+      (model.capabilities.limits.max_output_tokens ?? 0) - 1,
+    )
+    if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) {
+      const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)
+      return Math.max(
+        budgetTokens,
+        model.capabilities.supports.min_thinking_budget ?? 1024,
+      )
+    }
+  }
+  return undefined
+}


The new thinking budget calculation logic (getThinkingBudget function) and interleaved thinking prompt injection for Claude models lack test coverage. These are significant new features that handle complex logic including min/max budget constraints and model-specific behavior. Consider adding tests that verify: 1) budget calculation with various model capabilities, 2) the system prompt injection for Claude models with thinking enabled, 3) the system-reminder message insertion.

Copilot · 2025-12-31T07:34:45Z

+      index: state.contentBlockIndex,
+    })
+    state.contentBlockIndex++
+    state.contentBlockOpen = false


In handleReasoningOpaqueInToolCalls, when closing a content block that is not a tool block, the function doesn't check or update the thinkingBlockOpen state. If a thinking block was open when tool calls arrive, this could lead to inconsistent state where thinkingBlockOpen remains true even though the block has been closed. Consider calling closeThinkingBlockIfOpen before handling reasoning_opaque in tool calls, or explicitly set thinkingBlockOpen to false when appropriate.

Suggested change

state.contentBlockOpen = false

state.contentBlockOpen = false

// Ensure thinking block state is consistent when closing a non-tool content block.

state.thinkingBlockOpen = false

Copilot · 2025-12-31T07:34:46Z

-  state.accountType === "individual" ?
-    "https://api.githubcopilot.com"
-  : `https://api.${state.accountType}.githubcopilot.com`
+  `https://api.${state.accountType}.githubcopilot.com`


The refactored copilotBaseUrl function now unconditionally uses the template https://api.${state.accountType}.githubcopilot.com, which will result in https://api.individual.githubcopilot.com for individual accounts. The previous implementation used https://api.githubcopilot.com (without the subdomain) for individual accounts. This is a breaking change that may cause API requests to fail for individual account users. Please verify that the GitHub Copilot API supports the new URL format for individual accounts, or restore the conditional logic.

Suggested change

`https://api.${state.accountType}.githubcopilot.com`

state.accountType === "individual"

? "https://api.githubcopilot.com"

: `https://api.${state.accountType}.githubcopilot.com`

Copilot · 2025-12-31T07:34:46Z

+  let thinkingBlocks = message.content.filter(
    (block): block is AnthropicThinkingBlock => block.type === "thinking",
  )

-  // Combine text and thinking blocks, as OpenAI doesn't have separate thinking blocks
-  const allTextContent = [
-    ...textBlocks.map((b) => b.text),
-    ...thinkingBlocks.map((b) => b.thinking),
-  ].join("\n\n")
+  if (modelId.startsWith("claude")) {
+    thinkingBlocks = thinkingBlocks.filter(
+      (b) =>
+        b.thinking
+        && b.thinking.length > 0
+        && b.signature
+        && b.signature.length > 0
+        // gpt signature has @ in it, so filter those out for claude models
+        && !b.signature.includes("@"),
+    )
+  }
+
+  const thinkingContents = thinkingBlocks
+    .filter((b) => b.thinking && b.thinking.length > 0)
+    .map((b) => b.thinking)


The thinking blocks are filtered twice - once on line 200-202 to extract all thinking blocks, then again on lines 216-218 to filter those with non-empty thinking content. This redundant filtering is inefficient. Consider combining these filters or restructuring the logic to avoid processing the same blocks multiple times.

zzb54321 · 2026-01-01T14:45:03Z

I tried your branch locally, and encountered the following error.

caozhiyuan · 2026-01-01T15:06:00Z

@zzb54321 master branch works ok?

zzb54321 · 2026-01-01T15:36:39Z

@zzb54321 master branch works ok?

yeah master is working well

caozhiyuan · 2026-01-01T15:38:38Z

@zzb54321 master branch works ok?

yeah master is working well

@zzb54321 you are not individual plan?
you can change this code in api-config.ts

export const copilotBaseUrl = (state: State) =>
  state.accountType === "individual" ?
    "https://api.githubcopilot.com"
  : `https://api.${state.accountType}.githubcopilot.com`

if it works , i will commit code to fix it.

gonzalez962 · 2026-01-02T00:59:50Z

@caozhiyuan Hello. Happy new year... where and how do you get API_VERSION and COPILOT_VERSION ? thanks.

caozhiyuan · 2026-01-02T01:29:13Z

@gonzalez962 https://www.npmjs.com/package/@vscode/copilot-api?activeTab=code and https://github.com/microsoft/vscode-copilot-chat/

zzb54321 · 2026-01-02T02:17:10Z

@zzb54321 master branch works ok?

yeah master is working well

@zzb54321 you are not individual plan? you can change this code in api-config.ts
export const copilotBaseUrl = (state: State) =>
  state.accountType === "individual" ?
    "https://api.githubcopilot.com"
  : `https://api.${state.accountType}.githubcopilot.com`
if it works , i will commit code to fix it.

@caozhiyuan It's working now after applying the suggested fix.

BTW, how I can verify gemini-3-pro thinking is working on chat client side? Does it work on the openai-compatible api, or only on the Anthropic format api?

caozhiyuan · 2026-01-02T05:46:05Z

@zzb54321 master branch works ok?

yeah master is working well

@zzb54321 you are not individual plan? you can change this code in api-config.ts
export const copilotBaseUrl = (state: State) =>
  state.accountType === "individual" ?
    "https://api.githubcopilot.com"
  : `https://api.${state.accountType}.githubcopilot.com`
if it works , i will commit code to fix it.
@caozhiyuan It's working now after applying the suggested fix.

BTW, how I can verify gemini-3-pro thinking is working on chat client side? Does it work on the openai-compatible api, or only on the Anthropic format api?

@zzb54321 use message api. It is not a standard OpenAI-compatible protocol . you can use --account-type business if not applying the suggested fix.

…reasoning_opaque in different deltas

When account type is not specified or set to 'individual', use the default api.githubcopilot.com URL instead of constructing a subdomain-based URL. This restores previous behavior where business users could work without explicitly specifying their account type, as the default URL works for both individual and business accounts. Only constructs account-type-specific URLs (api.business.githubcopilot.com, api.enterprise.githubcopilot.com) when those account types are explicitly specified.

fix: use default API URL when account type is individual

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 13 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-04T12:56:44Z

+  if (reasoningText && reasoningText.length > 0) {
+    return [
+      {
+        type: "thinking",
+        thinking: reasoningText,
+        signature: reasoningOpaque || "",


The signature field in AnthropicThinkingBlock is now required (line 59 in anthropic-types.ts), but when reasoningOpaque is not provided, it defaults to an empty string (line 431). This is a breaking API change that could affect API consumers. Consider: 1) making the signature field optional to maintain backwards compatibility, 2) documenting that signature can be an empty string and what that means semantically, or 3) only including thinking blocks when both thinking and signature are non-empty to avoid exposing incomplete thinking blocks.

Suggested change

if (reasoningText && reasoningText.length > 0) {

return [

{

type: "thinking",

thinking: reasoningText,

signature: reasoningOpaque || "",

if (

reasoningText &&

reasoningText.length > 0 &&

reasoningOpaque &&

reasoningOpaque.length > 0

) {

return [

{

type: "thinking",

thinking: reasoningText,

signature: reasoningOpaque,

Copilot · 2026-01-04T12:56:44Z

+        handleReasoningOpaque(choice.delta, events, state)
+      }
+    }
+


In the handleFinish function, when a finish_reason is received, the code checks if contentBlockOpen is true and closes it, but it doesn't check if thinkingBlockOpen is true. This means if a thinking block is still open when the message finishes (which could happen if reasoning_text arrives without a subsequent reasoning_opaque or content), the thinking block won't be properly closed, leaving the stream in an inconsistent state. Consider adding a check for state.thinkingBlockOpen and calling closeThinkingBlockIfOpen(state, events) before closing the message.

Suggested change

if (state.thinkingBlockOpen) {

closeThinkingBlockIfOpen(state, events)

}

Copilot · 2026-01-04T12:56:44Z

+  // handle for claude model
+  if (
+    delta.content === ""
+    && delta.reasoning_opaque
+    && delta.reasoning_opaque.length > 0
+    && state.thinkingBlockOpen
+  ) {


The comment on line 216 states "handle for claude model", but the code that follows (lines 217-222) doesn't actually check if the model is a Claude model. This logic will execute for any model that sends an empty content string with reasoning_opaque when a thinking block is open. Either add a model check (e.g., checking if the model ID starts with "claude") or update the comment to accurately reflect that this is a general handling for a specific streaming pattern, not Claude-specific behavior.

Copilot · 2026-01-04T12:56:45Z

+    extraPrompt = `
+<interleaved_thinking_protocol>
+ABSOLUTE REQUIREMENT - NON-NEGOTIABLE:
+The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully , MUST output a thinking block


There's a grammatical issue on line 135: "think carefully ," has an extra space before the comma. It should be "think carefully," without the space.

Suggested change

The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully , MUST output a thinking block

The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully, MUST output a thinking block

Copilot · 2026-01-04T12:56:45Z

+      delta.content = delta.reasoning_text
+      delta.reasoning_text = undefined
+      return


Direct mutation of the delta object is problematic here. The function modifies the incoming delta parameter by setting delta.content = delta.reasoning_text and delta.reasoning_text = undefined. This mutates shared state that may be used elsewhere in the call stack, potentially causing unexpected side effects or making debugging difficult. Consider creating a copy of the delta object or handling this case differently without mutation, such as by tracking the state separately or processing the reasoning_text as intended.

Copilot · 2026-01-04T12:56:46Z

+  if (modelId.startsWith("claude") && thinkingBudget) {
+    const reminder =
+      "<system-reminder>you MUST follow interleaved_thinking_protocol</system-reminder>"
+    const firstUserIndex = otherMessages.findIndex((m) => m.role === "user")
+    if (firstUserIndex !== -1) {
+      const userMessage = otherMessages[firstUserIndex]
+      if (typeof userMessage.content === "string") {
+        userMessage.content = reminder + "\n\n" + userMessage.content
+      } else if (Array.isArray(userMessage.content)) {
+        userMessage.content = [
+          { type: "text", text: reminder },
+          ...userMessage.content,
+        ] as Array<ContentPart>
+      }
+    }
+  }


The system prompt injection logic (lines 102-117 and 131-143) only activates when thinkingBudget is truthy. However, getThinkingBudget returns undefined in several cases: when model is not found, when payload.thinking is not provided, when thinking.budget_tokens is undefined, or when maxThinkingBudget is 0 or negative. This means the interleaved thinking protocol instructions won't be injected unless all these conditions are met. Consider whether the protocol instructions should be injected whenever payload.thinking exists, regardless of budget calculation success, or document this behavior clearly so users understand when thinking protocol is enabled.

Copilot · 2026-01-04T12:56:46Z

+
+  // handle for claude model
+  if (
+    delta.content === ""


The condition on line 218 checks if delta.content === "", which only matches exactly an empty string. However, this doesn't handle cases where delta.content is null or undefined. If the API can send delta.content as null or undefined along with reasoning_opaque, this condition won't match and the logic won't execute. Consider using !delta.content or explicitly checking for all falsy values: (delta.content === "" || delta.content === null || delta.content === undefined).

Suggested change

delta.content === ""

(delta.content === "" || delta.content == null)

Copilot · 2026-01-04T12:56:46Z

+  if (modelId.startsWith("claude")) {
+    thinkingBlocks = thinkingBlocks.filter(
+      (b) =>
+        b.thinking
+        && b.thinking.length > 0
+        && b.signature
+        && b.signature.length > 0
+        // gpt signature has @ in it, so filter those out for claude models
+        && !b.signature.includes("@"),
+    )
+  }
+
+  const thinkingContents = thinkingBlocks
+    .filter((b) => b.thinking && b.thinking.length > 0)
+    .map((b) => b.thinking)


The filtering on line 231 checks if b.thinking && b.thinking.length > 0, which is redundant for Claude models because the same check was already done on lines 221-222. While this doesn't cause incorrect behavior, it adds unnecessary processing. Consider restructuring to avoid double filtering - for example, apply the thinking content filter before the Claude-specific signature filter, or ensure thinking blocks always have valid thinking content when they're created.

Suggested change

if (modelId.startsWith("claude")) {

thinkingBlocks = thinkingBlocks.filter(

(b) =>

b.thinking

&& b.thinking.length > 0

&& b.signature

&& b.signature.length > 0

// gpt signature has @ in it, so filter those out for claude models

&& !b.signature.includes("@"),

)

}

const thinkingContents = thinkingBlocks

.filter((b) => b.thinking && b.thinking.length > 0)

.map((b) => b.thinking)

// First, ensure all thinking blocks have non-empty thinking content

thinkingBlocks = thinkingBlocks.filter(

(b) => b.thinking && b.thinking.length > 0,

)

if (modelId.startsWith("claude")) {

thinkingBlocks = thinkingBlocks.filter(

(b) =>

b.signature

&& b.signature.length > 0

// gpt signature has @ in it, so filter those out for claude models

&& !b.signature.includes("@"),

)

}

const thinkingContents = thinkingBlocks.map((b) => b.thinking)

Copilot · 2026-01-04T12:56:46Z

+function getThinkingBudget(
+  payload: AnthropicMessagesPayload,
+  model: Model | undefined,
+): number | undefined {
+  const thinking = payload.thinking
+  if (model && thinking) {
+    const maxThinkingBudget = Math.min(
+      model.capabilities.supports.max_thinking_budget ?? 0,
+      (model.capabilities.limits.max_output_tokens ?? 0) - 1,
+    )
+    if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) {
+      const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)
+      return Math.max(
+        budgetTokens,
+        model.capabilities.supports.min_thinking_budget ?? 1024,
+      )
+    }
+  }
+  return undefined
+}
+
 function translateModelName(model: string): string {
  // Subagent requests use a specific model number which Copilot doesn't support
  if (model.startsWith("claude-sonnet-4-")) {
    return model.replace(/^claude-sonnet-4-.*/, "claude-sonnet-4")
-  } else if (model.startsWith("claude-opus-")) {
+  } else if (model.startsWith("claude-opus-4-")) {
    return model.replace(/^claude-opus-4-.*/, "claude-opus-4")
  }
  return model
 }

 function translateAnthropicMessagesToOpenAI(
-  anthropicMessages: Array<AnthropicMessage>,
-  system: string | Array<AnthropicTextBlock> | undefined,
+  payload: AnthropicMessagesPayload,
+  modelId: string,
+  thinkingBudget: number | undefined,
 ): Array<Message> {
-  const systemMessages = handleSystemPrompt(system)
-
-  const otherMessages = anthropicMessages.flatMap((message) =>
+  const systemMessages = handleSystemPrompt(
+    payload.system,
+    modelId,
+    thinkingBudget,
+  )
+  const otherMessages = payload.messages.flatMap((message) =>
    message.role === "user" ?
      handleUserMessage(message)
-    : handleAssistantMessage(message),
+    : handleAssistantMessage(message, modelId),
  )
-
+  if (modelId.startsWith("claude") && thinkingBudget) {
+    const reminder =
+      "<system-reminder>you MUST follow interleaved_thinking_protocol</system-reminder>"
+    const firstUserIndex = otherMessages.findIndex((m) => m.role === "user")
+    if (firstUserIndex !== -1) {
+      const userMessage = otherMessages[firstUserIndex]
+      if (typeof userMessage.content === "string") {
+        userMessage.content = reminder + "\n\n" + userMessage.content
+      } else if (Array.isArray(userMessage.content)) {
+        userMessage.content = [
+          { type: "text", text: reminder },
+          ...userMessage.content,
+        ] as Array<ContentPart>
+      }
+    }
+  }
  return [...systemMessages, ...otherMessages]
 }

 function handleSystemPrompt(
  system: string | Array<AnthropicTextBlock> | undefined,
+  modelId: string,
+  thinkingBudget: number | undefined,
 ): Array<Message> {
  if (!system) {
    return []
  }

+  let extraPrompt = ""
+  if (modelId.startsWith("claude") && thinkingBudget) {
+    extraPrompt = `
+<interleaved_thinking_protocol>
+ABSOLUTE REQUIREMENT - NON-NEGOTIABLE:
+The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully , MUST output a thinking block
+RULES:
+Tool result → thinking block (ALWAYS, no exceptions)
+This is NOT optional - it is a hard requirement
+The thinking block must contain substantive reasoning (minimum 3-5 sentences)
+Think about: what the results mean, what to do next, how to answer the user
+NEVER skip this step, even if the result seems simple or obvious
+</interleaved_thinking_protocol>`
+  }


The new thinking budget calculation logic (lines 56-75) and system prompt injection logic (lines 102-117, 131-143) lack test coverage. These are critical features that manipulate model behavior and user inputs. Consider adding tests that verify: 1) thinking budget is correctly calculated when thinking.budget_tokens is provided, 2) thinking budget respects min/max boundaries from model capabilities, 3) system prompt injection happens only for Claude models with thinking budget, 4) the interleaved thinking protocol reminder is correctly prepended to the first user message.

Copilot · 2026-01-04T12:56:47Z

+function handleReasoningOpaque(
+  delta: Delta,
+  events: Array<AnthropicStreamEventData>,
+  state: AnthropicStreamState,
+) {
+  if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) {
+    events.push(
      {
-        type: "message_stop",
+        type: "content_block_start",
+        index: state.contentBlockIndex,
+        content_block: {
+          type: "thinking",
+          thinking: "",
+        },
+      },
+      {
+        type: "content_block_delta",
+        index: state.contentBlockIndex,
+        delta: {
+          type: "thinking_delta",
+          thinking: "",
+        },
+      },
+      {
+        type: "content_block_delta",
+        index: state.contentBlockIndex,
+        delta: {
+          type: "signature_delta",
+          signature: delta.reasoning_opaque,
+        },
+      },
+      {
+        type: "content_block_stop",
+        index: state.contentBlockIndex,
      },
    )
+    state.contentBlockIndex++
  }
+}

-  return events
+function handleThinkingText(
+  delta: Delta,
+  state: AnthropicStreamState,
+  events: Array<AnthropicStreamEventData>,
+) {
+  if (delta.reasoning_text && delta.reasoning_text.length > 0) {
+    // compatible with copilot API returning content->reasoning_text->reasoning_opaque in different deltas
+    // this is an extremely abnormal situation, probably a server-side bug
+    // only occurs in the claude model, with a very low probability of occurrence
+    if (state.contentBlockOpen) {
+      delta.content = delta.reasoning_text
+      delta.reasoning_text = undefined
+      return
+    }
+
+    if (!state.thinkingBlockOpen) {
+      events.push({
+        type: "content_block_start",
+        index: state.contentBlockIndex,
+        content_block: {
+          type: "thinking",
+          thinking: "",
+        },
+      })
+      state.thinkingBlockOpen = true
+    }
+
+    events.push({
+      type: "content_block_delta",
+      index: state.contentBlockIndex,
+      delta: {
+        type: "thinking_delta",
+        thinking: delta.reasoning_text,
+      },
+    })
+  }
+}
+
+function closeThinkingBlockIfOpen(
+  state: AnthropicStreamState,
+  events: Array<AnthropicStreamEventData>,
+): void {
+  if (state.thinkingBlockOpen) {
+    events.push(
+      {
+        type: "content_block_delta",
+        index: state.contentBlockIndex,
+        delta: {
+          type: "signature_delta",
+          signature: "",
+        },
+      },
+      {
+        type: "content_block_stop",
+        index: state.contentBlockIndex,
+      },
+    )
+    state.contentBlockIndex++
+    state.thinkingBlockOpen = false
+  }
 }


The new streaming translation logic for handling thinking blocks and reasoning_opaque (lines 275-313, 315-375) lacks test coverage. This is critical functionality that manages complex state transitions during streaming, including thinking block opening/closing and signature handling. Consider adding tests that verify: 1) reasoning_text is correctly translated to thinking_delta events, 2) reasoning_opaque creates appropriate signature_delta events, 3) thinking blocks are properly closed before content or tool call blocks, 4) the state.thinkingBlockOpen flag is managed correctly throughout the streaming lifecycle.

caozhiyuan and others added 13 commits November 19, 2025 21:32

feat: support copilot reasoning_opaque and reasoning_text

29668ce

feat: add signature field to AnthropicThinkingBlock

a2467d3

feat: add idleTimeout configuration for bun server

58f7a45

feat: enhance reasoning handling in tool calls and change the thinkin…

3fa5519

…g order when stream=false and exclude reasoning_opaque from token calculation in calculateMessageTokens

feat: conditionally handle reasoningOpaque in handleFinish based on t…

dfb40d2

…ool block state

fix: handleReasoningOpaqueInToolCalls add isToolBlockOpen judge

7657d87

feat: support claude model thinking block

7f8187b

feat: enhance thinking budget calculation and rename variables for cl…

cbe12eb

…arity

feat: update Copilot version and API version in api-config; adjust fa…

ebcacb2

…llback VSCode version

feat: update Copilot version to 0.35.0 and fallback VSCode version to…

0d6f7aa

… 1.107.0

fix: simplify copilotBaseUrl logic and correct openai-intent header v…

dcafbe1

…alue

feat: interleaved thinking support

5175245

feat: enhance system prompt handling for interleaved thinking with th…

dd80c8d

…inking budget integration

Copilot AI review requested due to automatic review settings December 31, 2025 07:29

Copilot started reviewing on behalf of caozhiyuan December 31, 2025 07:30 View session

Copilot AI reviewed Dec 31, 2025

View reviewed changes

caozhiyuan and others added 4 commits January 2, 2026 22:21

feat: compatible with copilot API returning content->reasoning_text->…

e45c6db

…reasoning_opaque in different deltas

Merge pull request #65 from HyunggyuJang/fix-individual-account-url

8191930

fix: use default API URL when account type is individual

feat: enforce interleaved thinking protocol in message handling

67b357a

caozhiyuan requested a review from Copilot January 4, 2026 12:51

Copilot started reviewing on behalf of caozhiyuan January 4, 2026 12:51 View session

Copilot AI reviewed Jan 4, 2026

View reviewed changes

caozhiyuan closed this Jan 4, 2026

	assistantContentBlocks.push(...thinkBlocks, ...textBlocks, ...toolUseBlocks)
	assistantContentBlocks.push(...textBlocks, ...thinkBlocks, ...toolUseBlocks)

	if (!toolBlockOpen) {
	if (!toolBlockOpen && choice.delta?.reasoning_opaque) {

	return [...systemMessages, thinkingMessage, ...otherMessages]
	return [...systemMessages, ...otherMessages, thinkingMessage]

	The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully , MUST output a thinking block
	The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully, MUST output a thinking block

	delta.content === ""
	(delta.content === "" \|\| delta.content == null)

Uh oh!

Conversation

caozhiyuan commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

zzb54321 commented Jan 1, 2026

Uh oh!

caozhiyuan commented Jan 1, 2026

Uh oh!

zzb54321 commented Jan 1, 2026

Uh oh!

caozhiyuan commented Jan 1, 2026

Uh oh!

gonzalez962 commented Jan 2, 2026

Uh oh!

caozhiyuan commented Jan 2, 2026

Uh oh!

zzb54321 commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

caozhiyuan commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

caozhiyuan commented Dec 31, 2025 •

edited

Loading

zzb54321 commented Jan 2, 2026 •

edited

Loading

caozhiyuan commented Jan 2, 2026 •

edited

Loading