feat: enable Responses API context management by default and update related configurations

caozhiyuan · caozhiyuan · commit fef57a3edb50 · 2026-05-31T10:15:23.000+08:00
diff --git a/README.md b/README.md
@@ -228,7 +228,7 @@ The following command line options are available for the `start` command:
       "gpt-5.4": "<built-in commentary prompt>"
     },
     "smallModel": "gpt-5-mini",
-    "responsesApiContextManagementModels": [],
+    "useResponsesApiContextManagement": true,
     "modelReasoningEfforts": {
       "gpt-5-mini": "low",
       "gpt-5.3-codex": "xhigh",
@@ -259,7 +259,7 @@ The following command line options are available for the `start` command:
     - `supportPdf` (optional): Controls whether the model supports PDF/document content. Defaults to `false`; unsupported PDFs are converted to a text notice. Set it to `true` to send PDF/document blocks as OpenAI Chat Completions file parts.
     - `toolContentSupportType` (optional): Tool result content capabilities for that model, as an array of `array`, `image`, and `pdf`. Provider routes default to string-only tool content when omitted. If `supportPdf` is `true` but this list does not include `pdf`, file parts in tool results are moved to user role messages. This provider default does not change the Copilot main flow, which continues to support array + image and not PDF.
 - **smallModel:** Fallback model used for tool-less warmup messages (e.g., Claude Code probe requests); defaults to gpt-5-mini.
-- **responsesApiContextManagementModels:** List of GPT model IDs that should receive Responses API `context_management` compaction instructions. This defaults to `[]`, so you need to opt in explicitly. A good starting point is `["gpt-5-mini", "gpt-5.3-codex", "gpt-5.4-mini", "gpt-5.4"]`. When enabled, the request includes `context_management` in the body and keeps only the latest compaction carrier on follow-up turns. The actual compaction is handled server-side and appears to begin when usage approaches roughly 90% of the model's `maxPromptTokens`, which makes it especially useful for long-running tasks. In practice, the effective `compact_threshold` also appears to be fixed on the server side, so changing it in this project does not currently alter compaction behavior. At the moment, this optimization is intended for GPT-family models only.
+- **useResponsesApiContextManagement:** When `true`, the proxy adds Responses API `context_management` compaction instructions. Defaults to `true`. Set it to `false` to disable this globally. When enabled, the request includes `context_management` in the body and keeps only the latest compaction carrier on follow-up turns. This is especially useful for long-running tasks.
 - **modelReasoningEfforts:** Per-model `reasoning.effort` sent to the Copilot Responses API. Allowed values are `none`, `minimal`, `low`, `medium`, `high`, and `xhigh`. If a model isn’t listed, `high` is used by default.
 - **useMessagesApi:** When `true`, Claude-family models that support Copilot's native `/v1/messages` endpoint will use the Messages API; otherwise they fall back to `/chat/completions`. Set to `false` to disable Messages API routing and always use `/chat/completions`. Defaults to `true`.
 - **useResponsesApiWebSocket:** When `true`, Responses API requests use Copilot's websocket transport for models that advertise `ws:/responses`; models that only advertise `/responses` continue to use HTTP. Set to `false` to disable websocket routing and use HTTP `/responses` whenever the selected model supports it. Defaults to `true`.
diff --git a/README.zh-CN.md b/README.zh-CN.md
@@ -230,7 +230,7 @@ Copilot API 现在使用子命令结构，主要命令包括：
       "gpt-5.4": "<built-in commentary prompt>"
     },
     "smallModel": "gpt-5-mini",
-    "responsesApiContextManagementModels": [],
+    "useResponsesApiContextManagement": true,
     "modelReasoningEfforts": {
       "gpt-5-mini": "low",
       "gpt-5.3-codex": "xhigh",
@@ -261,7 +261,7 @@ Copilot API 现在使用子命令结构，主要命令包括：
     - `supportPdf`：可选，控制该模型是否支持 PDF/document content。默认 `false`，不支持时会把 PDF 转成提示文本；设为 `true` 时会把 PDF/document 转成 OpenAI Chat Completions 的 file part。
     - `toolContentSupportType`：可选，配置该模型的 tool result content 支持能力，值为 `array`、`image`、`pdf` 的数组。provider 侧未配置时默认只发送 string tool content。若 `supportPdf` 为 `true` 但这里不包含 `pdf`，tool result 里的 file part 会被转成 user role 消息。Copilot 主链路不使用这个 provider 默认，仍按 array + image 且不支持 PDF 的能力处理。
 - **smallModel：** 无工具预热消息的回退模型（例如 Claude Code 的探测请求）；默认是 `gpt-5-mini`。
-- **responsesApiContextManagementModels：** 需要启用 Responses API `context_management` 压缩指令的 GPT 模型 ID 列表。默认是 `[]`，需要你显式开启。一个不错的起点是 `["gpt-5-mini", "gpt-5.3-codex", "gpt-5.4-mini", "gpt-5.4"]`。启用后，请求体会带上 `context_management`，并在后续轮次中仅保留最新的压缩承载内容。实际压缩由服务端完成，看起来会在 usage 接近模型 `maxPromptTokens` 的约 90% 时开始，因此特别适合长任务场景。实践中 `compact_threshold` 似乎也是服务端固定的，所以在本项目中修改它目前不会改变压缩行为。当前该优化仅面向 GPT 系模型。
+- **useResponsesApiContextManagement：** 当为 `true` 时，代理会为 Responses API 附加 `context_management` 压缩指令。默认值为 `true`。如需全局关闭，可设为 `false`。启用后，请求体会带上 `context_management`，并在后续轮次中仅保留最新的压缩承载内容，因此特别适合长任务场景。
 - **modelReasoningEfforts：** 按模型配置发送到 Copilot Responses API 的 `reasoning.effort`。可选值包括 `none`、`minimal`、`low`、`medium`、`high` 和 `xhigh`。若某模型未配置，则默认使用 `high`。
 - **useMessagesApi：** 当为 `true` 时，支持 Copilot 原生 `/v1/messages` 的 Claude 系模型会走 Messages API；否则回退到 `/chat/completions`。设为 `false` 可禁用 Messages API 路由，始终使用 `/chat/completions`。默认值为 `true`。
 - **useResponsesApiWebSocket：** 当为 `true` 时，Responses API 请求会优先对声明了 `ws:/responses` 的模型使用 Copilot websocket transport；仅声明 `/responses` 的模型仍走 HTTP。设为 `false` 可禁用 websocket 路由，并在模型支持 `/responses` 时使用 HTTP `/responses`。默认值为 `true`。
diff --git a/src/lib/config.ts b/src/lib/config.ts
@@ -13,7 +13,7 @@ export interface AppConfig {
   modelMappings?: Record<string, string>
   extraPrompts?: Record<string, string>
   smallModel?: string
-  responsesApiContextManagementModels?: Array<string>
+  useResponsesApiContextManagement?: boolean
   modelReasoningEfforts?: Record<
     string,
     "none" | "minimal" | "low" | "medium" | "high" | "xhigh"
@@ -103,7 +103,7 @@ const defaultConfig: AppConfig = {
     "gpt-5.5": gpt5CommentaryPrompt,
   },
   smallModel: "gpt-5-mini",
-  responsesApiContextManagementModels: [],
+  useResponsesApiContextManagement: true,
   modelReasoningEfforts: {
     "gpt-5-mini": "low",
     "gpt-5.3-codex": "xhigh",
@@ -387,17 +387,9 @@ export function getSmallModel(): string {
   return config.smallModel ?? "gpt-5-mini"
 }
 
-export function getResponsesApiContextManagementModels(): Array<string> {
+export function isResponsesApiContextManagementEnabled(): boolean {
   const config = getConfig()
-  return (
-    config.responsesApiContextManagementModels
-    ?? defaultConfig.responsesApiContextManagementModels
-    ?? []
-  )
-}
-
-export function isResponsesApiContextManagementModel(model: string): boolean {
-  return getResponsesApiContextManagementModels().includes(model)
+  return config.useResponsesApiContextManagement ?? true
 }
 
 export function getReasoningEffortForModel(
diff --git a/src/routes/provider/responses/handler.ts b/src/routes/provider/responses/handler.ts
@@ -13,7 +13,10 @@ import {
   normalizeResponsesUsage,
   type UsageTokens,
 } from "~/lib/token-usage"
-import { applyResponsesApiContextManagement } from "~/routes/responses/utils"
+import {
+  applyResponsesApiContextManagement,
+  compactInputByLatestCompaction,
+} from "~/routes/responses/utils"
 import type {
   ResponsesPayload,
   ResponsesResult,
@@ -69,6 +72,8 @@ export async function handleProviderResponsesForProvider(
     provider,
   })
 
+  compactInputByLatestCompaction(payload)
+
   if (providerConfig.name === "codex") {
     const upstreamResponse = await forwardCodexResponses(
       payload,
diff --git a/src/routes/responses/utils.ts b/src/routes/responses/utils.ts
@@ -11,7 +11,7 @@ import type {
 
 import { COMPACT_REQUEST, type CompactType } from "~/lib/compact"
 import {
-  isResponsesApiContextManagementModel as isConfiguredResponsesApiContextManagementModel,
+  isResponsesApiContextManagementEnabled as isConfiguredResponsesApiContextManagementEnabled,
   isResponsesApiWebSocketEnabled as isConfiguredResponsesApiWebSocketEnabled,
 } from "~/lib/config"
 
@@ -20,8 +20,8 @@ export const RESPONSES_WS_ENDPOINT = "ws:/responses"
 export const DEFAULT_RESPONSES_COMPACT_THRESHOLD_RATIO = 0.9
 
 export const responsesUtilsDependencies = {
-  isResponsesApiContextManagementModel:
-    isConfiguredResponsesApiContextManagementModel,
+  isResponsesApiContextManagementEnabled:
+    isConfiguredResponsesApiContextManagementEnabled,
   isResponsesApiWebSocketEnabled: isConfiguredResponsesApiWebSocketEnabled,
 }
 
@@ -284,11 +284,7 @@ export const applyResponsesApiContextManagement = (
     return
   }
 
-  if (
-    !responsesUtilsDependencies.isResponsesApiContextManagementModel(
-      payload.model,
-    )
-  ) {
+  if (!responsesUtilsDependencies.isResponsesApiContextManagementEnabled()) {
     return
   }
 
diff --git a/tests/builtin-provider-config.test.ts b/tests/builtin-provider-config.test.ts
@@ -6,6 +6,7 @@ import { fileURLToPath } from "node:url"
 
 interface ConfigFileShape {
   builtinProviders?: Record<string, unknown>
+  useResponsesApiContextManagement?: boolean
   providers?: Record<
     string,
     {
@@ -81,6 +82,35 @@ describe("builtin provider config", () => {
     expect(readConfigFile(configPath).builtinProviders).toBeUndefined()
   })
 
+  test("enables Responses API context management by default", () => {
+    const tempDir = createTempConfigDir()
+    const configPath = path.join(tempDir, "config.json")
+
+    const output = runScript(
+      tempDir,
+      'const { isResponsesApiContextManagementEnabled } = await import("./src/lib/config"); console.log(JSON.stringify({ enabled: isResponsesApiContextManagementEnabled() }));',
+    )
+
+    expect(JSON.parse(output)).toEqual({ enabled: true })
+    expect(readConfigFile(configPath).useResponsesApiContextManagement).toBe(
+      true,
+    )
+  })
+
+  test("allows disabling Responses API context management", () => {
+    const tempDir = createTempConfigDir()
+    writeConfigFile(tempDir, {
+      useResponsesApiContextManagement: false,
+    })
+
+    const output = runScript(
+      tempDir,
+      'const { isResponsesApiContextManagementEnabled } = await import("./src/lib/config"); console.log(JSON.stringify({ enabled: isResponsesApiContextManagementEnabled() }));',
+    )
+
+    expect(JSON.parse(output)).toEqual({ enabled: false })
+  })
+
   test("allows codex to be configured in config.providers", () => {
     const tempDir = createTempConfigDir()
     writeConfigFile(tempDir, {