Skip to content

Commit fef57a3

Browse files
committed
feat: enable Responses API context management by default and update related configurations
1 parent 47eba33 commit fef57a3

6 files changed

Lines changed: 48 additions & 25 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ The following command line options are available for the `start` command:
228228
"gpt-5.4": "<built-in commentary prompt>"
229229
},
230230
"smallModel": "gpt-5-mini",
231-
"responsesApiContextManagementModels": [],
231+
"useResponsesApiContextManagement": true,
232232
"modelReasoningEfforts": {
233233
"gpt-5-mini": "low",
234234
"gpt-5.3-codex": "xhigh",
@@ -259,7 +259,7 @@ The following command line options are available for the `start` command:
259259
- `supportPdf` (optional): Controls whether the model supports PDF/document content. Defaults to `false`; unsupported PDFs are converted to a text notice. Set it to `true` to send PDF/document blocks as OpenAI Chat Completions file parts.
260260
- `toolContentSupportType` (optional): Tool result content capabilities for that model, as an array of `array`, `image`, and `pdf`. Provider routes default to string-only tool content when omitted. If `supportPdf` is `true` but this list does not include `pdf`, file parts in tool results are moved to user role messages. This provider default does not change the Copilot main flow, which continues to support array + image and not PDF.
261261
- **smallModel:** Fallback model used for tool-less warmup messages (e.g., Claude Code probe requests); defaults to gpt-5-mini.
262-
- **responsesApiContextManagementModels:** List of GPT model IDs that should receive Responses API `context_management` compaction instructions. This defaults to `[]`, so you need to opt in explicitly. A good starting point is `["gpt-5-mini", "gpt-5.3-codex", "gpt-5.4-mini", "gpt-5.4"]`. When enabled, the request includes `context_management` in the body and keeps only the latest compaction carrier on follow-up turns. The actual compaction is handled server-side and appears to begin when usage approaches roughly 90% of the model's `maxPromptTokens`, which makes it especially useful for long-running tasks. In practice, the effective `compact_threshold` also appears to be fixed on the server side, so changing it in this project does not currently alter compaction behavior. At the moment, this optimization is intended for GPT-family models only.
262+
- **useResponsesApiContextManagement:** When `true`, the proxy adds Responses API `context_management` compaction instructions. Defaults to `true`. Set it to `false` to disable this globally. When enabled, the request includes `context_management` in the body and keeps only the latest compaction carrier on follow-up turns. This is especially useful for long-running tasks.
263263
- **modelReasoningEfforts:** Per-model `reasoning.effort` sent to the Copilot Responses API. Allowed values are `none`, `minimal`, `low`, `medium`, `high`, and `xhigh`. If a model isn’t listed, `high` is used by default.
264264
- **useMessagesApi:** When `true`, Claude-family models that support Copilot's native `/v1/messages` endpoint will use the Messages API; otherwise they fall back to `/chat/completions`. Set to `false` to disable Messages API routing and always use `/chat/completions`. Defaults to `true`.
265265
- **useResponsesApiWebSocket:** When `true`, Responses API requests use Copilot's websocket transport for models that advertise `ws:/responses`; models that only advertise `/responses` continue to use HTTP. Set to `false` to disable websocket routing and use HTTP `/responses` whenever the selected model supports it. Defaults to `true`.

README.zh-CN.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ Copilot API 现在使用子命令结构,主要命令包括:
230230
"gpt-5.4": "<built-in commentary prompt>"
231231
},
232232
"smallModel": "gpt-5-mini",
233-
"responsesApiContextManagementModels": [],
233+
"useResponsesApiContextManagement": true,
234234
"modelReasoningEfforts": {
235235
"gpt-5-mini": "low",
236236
"gpt-5.3-codex": "xhigh",
@@ -261,7 +261,7 @@ Copilot API 现在使用子命令结构,主要命令包括:
261261
- `supportPdf`:可选,控制该模型是否支持 PDF/document content。默认 `false`,不支持时会把 PDF 转成提示文本;设为 `true` 时会把 PDF/document 转成 OpenAI Chat Completions 的 file part。
262262
- `toolContentSupportType`:可选,配置该模型的 tool result content 支持能力,值为 `array``image``pdf` 的数组。provider 侧未配置时默认只发送 string tool content。若 `supportPdf``true` 但这里不包含 `pdf`,tool result 里的 file part 会被转成 user role 消息。Copilot 主链路不使用这个 provider 默认,仍按 array + image 且不支持 PDF 的能力处理。
263263
- **smallModel:** 无工具预热消息的回退模型(例如 Claude Code 的探测请求);默认是 `gpt-5-mini`
264-
- **responsesApiContextManagementModels** 需要启用 Responses API `context_management` 压缩指令的 GPT 模型 ID 列表。默认是 `[]`,需要你显式开启。一个不错的起点是 `["gpt-5-mini", "gpt-5.3-codex", "gpt-5.4-mini", "gpt-5.4"]`。启用后,请求体会带上 `context_management`,并在后续轮次中仅保留最新的压缩承载内容。实际压缩由服务端完成,看起来会在 usage 接近模型 `maxPromptTokens` 的约 90% 时开始,因此特别适合长任务场景。实践中 `compact_threshold` 似乎也是服务端固定的,所以在本项目中修改它目前不会改变压缩行为。当前该优化仅面向 GPT 系模型
264+
- **useResponsesApiContextManagement** 当为 `true` 时,代理会为 Responses API 附加 `context_management` 压缩指令。默认值为 `true`。如需全局关闭,可设为 `false`。启用后,请求体会带上 `context_management`,并在后续轮次中仅保留最新的压缩承载内容因此特别适合长任务场景。
265265
- **modelReasoningEfforts:** 按模型配置发送到 Copilot Responses API 的 `reasoning.effort`。可选值包括 `none``minimal``low``medium``high``xhigh`。若某模型未配置,则默认使用 `high`
266266
- **useMessagesApi:** 当为 `true` 时,支持 Copilot 原生 `/v1/messages` 的 Claude 系模型会走 Messages API;否则回退到 `/chat/completions`。设为 `false` 可禁用 Messages API 路由,始终使用 `/chat/completions`。默认值为 `true`
267267
- **useResponsesApiWebSocket:** 当为 `true` 时,Responses API 请求会优先对声明了 `ws:/responses` 的模型使用 Copilot websocket transport;仅声明 `/responses` 的模型仍走 HTTP。设为 `false` 可禁用 websocket 路由,并在模型支持 `/responses` 时使用 HTTP `/responses`。默认值为 `true`

src/lib/config.ts

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ export interface AppConfig {
1313
modelMappings?: Record<string, string>
1414
extraPrompts?: Record<string, string>
1515
smallModel?: string
16-
responsesApiContextManagementModels?: Array<string>
16+
useResponsesApiContextManagement?: boolean
1717
modelReasoningEfforts?: Record<
1818
string,
1919
"none" | "minimal" | "low" | "medium" | "high" | "xhigh"
@@ -103,7 +103,7 @@ const defaultConfig: AppConfig = {
103103
"gpt-5.5": gpt5CommentaryPrompt,
104104
},
105105
smallModel: "gpt-5-mini",
106-
responsesApiContextManagementModels: [],
106+
useResponsesApiContextManagement: true,
107107
modelReasoningEfforts: {
108108
"gpt-5-mini": "low",
109109
"gpt-5.3-codex": "xhigh",
@@ -387,17 +387,9 @@ export function getSmallModel(): string {
387387
return config.smallModel ?? "gpt-5-mini"
388388
}
389389

390-
export function getResponsesApiContextManagementModels(): Array<string> {
390+
export function isResponsesApiContextManagementEnabled(): boolean {
391391
const config = getConfig()
392-
return (
393-
config.responsesApiContextManagementModels
394-
?? defaultConfig.responsesApiContextManagementModels
395-
?? []
396-
)
397-
}
398-
399-
export function isResponsesApiContextManagementModel(model: string): boolean {
400-
return getResponsesApiContextManagementModels().includes(model)
392+
return config.useResponsesApiContextManagement ?? true
401393
}
402394

403395
export function getReasoningEffortForModel(

src/routes/provider/responses/handler.ts

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,10 @@ import {
1313
normalizeResponsesUsage,
1414
type UsageTokens,
1515
} from "~/lib/token-usage"
16-
import { applyResponsesApiContextManagement } from "~/routes/responses/utils"
16+
import {
17+
applyResponsesApiContextManagement,
18+
compactInputByLatestCompaction,
19+
} from "~/routes/responses/utils"
1720
import type {
1821
ResponsesPayload,
1922
ResponsesResult,
@@ -69,6 +72,8 @@ export async function handleProviderResponsesForProvider(
6972
provider,
7073
})
7174

75+
compactInputByLatestCompaction(payload)
76+
7277
if (providerConfig.name === "codex") {
7378
const upstreamResponse = await forwardCodexResponses(
7479
payload,

src/routes/responses/utils.ts

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ import type {
1111

1212
import { COMPACT_REQUEST, type CompactType } from "~/lib/compact"
1313
import {
14-
isResponsesApiContextManagementModel as isConfiguredResponsesApiContextManagementModel,
14+
isResponsesApiContextManagementEnabled as isConfiguredResponsesApiContextManagementEnabled,
1515
isResponsesApiWebSocketEnabled as isConfiguredResponsesApiWebSocketEnabled,
1616
} from "~/lib/config"
1717

@@ -20,8 +20,8 @@ export const RESPONSES_WS_ENDPOINT = "ws:/responses"
2020
export const DEFAULT_RESPONSES_COMPACT_THRESHOLD_RATIO = 0.9
2121

2222
export const responsesUtilsDependencies = {
23-
isResponsesApiContextManagementModel:
24-
isConfiguredResponsesApiContextManagementModel,
23+
isResponsesApiContextManagementEnabled:
24+
isConfiguredResponsesApiContextManagementEnabled,
2525
isResponsesApiWebSocketEnabled: isConfiguredResponsesApiWebSocketEnabled,
2626
}
2727

@@ -284,11 +284,7 @@ export const applyResponsesApiContextManagement = (
284284
return
285285
}
286286

287-
if (
288-
!responsesUtilsDependencies.isResponsesApiContextManagementModel(
289-
payload.model,
290-
)
291-
) {
287+
if (!responsesUtilsDependencies.isResponsesApiContextManagementEnabled()) {
292288
return
293289
}
294290

tests/builtin-provider-config.test.ts

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ import { fileURLToPath } from "node:url"
66

77
interface ConfigFileShape {
88
builtinProviders?: Record<string, unknown>
9+
useResponsesApiContextManagement?: boolean
910
providers?: Record<
1011
string,
1112
{
@@ -81,6 +82,35 @@ describe("builtin provider config", () => {
8182
expect(readConfigFile(configPath).builtinProviders).toBeUndefined()
8283
})
8384

85+
test("enables Responses API context management by default", () => {
86+
const tempDir = createTempConfigDir()
87+
const configPath = path.join(tempDir, "config.json")
88+
89+
const output = runScript(
90+
tempDir,
91+
'const { isResponsesApiContextManagementEnabled } = await import("./src/lib/config"); console.log(JSON.stringify({ enabled: isResponsesApiContextManagementEnabled() }));',
92+
)
93+
94+
expect(JSON.parse(output)).toEqual({ enabled: true })
95+
expect(readConfigFile(configPath).useResponsesApiContextManagement).toBe(
96+
true,
97+
)
98+
})
99+
100+
test("allows disabling Responses API context management", () => {
101+
const tempDir = createTempConfigDir()
102+
writeConfigFile(tempDir, {
103+
useResponsesApiContextManagement: false,
104+
})
105+
106+
const output = runScript(
107+
tempDir,
108+
'const { isResponsesApiContextManagementEnabled } = await import("./src/lib/config"); console.log(JSON.stringify({ enabled: isResponsesApiContextManagementEnabled() }));',
109+
)
110+
111+
expect(JSON.parse(output)).toEqual({ enabled: false })
112+
})
113+
84114
test("allows codex to be configured in config.providers", () => {
85115
const tempDir = createTempConfigDir()
86116
writeConfigFile(tempDir, {

0 commit comments

Comments
 (0)