Skip to content

Commit 18eda94

Browse files
committed
Refine quota exhaustion handling for versioned Copilot responses
Versioned API now returns 402 quota_exceeded; router treats 402 like 429 for cooldown, docs reflect versioned behavior. Constraint: upstream quota semantics vary by X-GitHub-Api-Version. Rejected: keep 429-only cooldown | misses 402 quota_exceeded from 2026-01-09. Confidence: high Scope-risk: narrow Directive: keep 402/429 parity in router cooldown checks when new quota signals appear. Tested: bun run lint:all --fix; bun run build; bun test; bun run typecheck
1 parent 16a445e commit 18eda94

3 files changed

Lines changed: 82 additions & 5 deletions

File tree

docs/copilot-ai-credits-june-retest-handoff.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,38 @@
7373
- usage API 顶层累计 `total_nano_aiu` / `nano_aiu` / `aiu`
7474
- 4145 的成功模型响应 AIU 字段,因为该账号本轮返回 402 quota exceeded。
7575

76+
### 2026-06-02 第二轮观察:overage 耗尽后的真实行为
77+
78+
本轮在同一天进行第二次 usage API 查询与真实请求测试,获得 overage 耗尽后的完整状态快照。
79+
80+
**usage API 最新快照:**
81+
82+
| 端口 | entitlement | remaining | percent_remaining | has_quota | overage_permitted | overage_count |
83+
|---:|---:|---:|---:|---|---|---:|
84+
| 4142 | 10000 | -92 | 0.0 | false | false | 91 |
85+
| 4143 | 10000 | -140 | 0.0 | false | false | 139 |
86+
| 4144 | 10000 | 786 | 7.8 | true | true | 0 |
87+
88+
**真实请求结果(4142 / 4143):**
89+
90+
- `gpt-5.5` → HTTP 429,body `quota exceeded`
91+
- `gpt-5.4-mini` → HTTP 429,body `quota exceeded`
92+
- `gpt-4o``gpt-4.1` → HTTP 400,`unsupported_api_for_model`(与 quota 无关,Responses API 不支持这些模型)
93+
- `x-quota-snapshot-premium_interactions` header:本轮未出现(NOT PRESENT)
94+
95+
**新增结论:**
96+
97+
- **`overage_permitted` 会从 `true` 变成 `false`**:上次观察到 overage\_permitted=true、overage\_count=0;本轮 4142/4143 均已经历 overage 后,overage\_permitted=false,has\_quota=false。说明 overage 不是无限深度,用完后会被关掉。
98+
- **402 还是 429,由请求头 `X-GitHub-Api-Version` 决定,不是后端随机,也不是代理改写**。同账号同模型 quota 耗尽,实测:
99+
-`X-GitHub-Api-Version: 2026-01-09`**HTTP 402**,body `{"code":"quota_exceeded","message":"You have exceeded your monthly quota"}`
100+
- 不带版本头(旧行为)→ **HTTP 429**,body 纯文本 `quota exceeded`
101+
- host `business` / `githubcopilot` 不影响此结果;path `/responses``/v1/responses` 同结果。
102+
- copilot-api 在 `src/lib/api-config.ts` 硬编码 `API_VERSION = "2026-01-09"`,故走本地代理永远见 402;早前文档记的 429 是手测时漏带版本头。
103+
- `src/lib/error.ts``forwardError()` 原样透传 status,不改写。402 是真实 backend 给的。
104+
- 语义:新 API 版本把 quota 耗尽从「限流 429」重定义为「需付费 402 Payment Required」,与 AI Credits / token billing rollout 一致。
105+
- **overage 上限仍未知**:4142 在 91 次时关闭、4143 在 139 次时关闭,两值不同,目前无法推断 overage 是固定额度还是 per-account 配置或时序效应。无法从 `copilot_internal/user` 端点读出 overage hard cap。
106+
- **reset 日期**:4142/4143/4144 均为 `2026-07-01`,届时 remaining/overage\_count 应当重置。
107+
76108
### 下一轮复测重点
77109

78110
- 等 4142 / 4143 / 4144 任一账号的 `premium_interactions.remaining` 归零后继续发请求,确认是否仍返回 200。

router/state.ts

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -710,7 +710,12 @@ function buildRequestContext(params: {
710710
}
711711
}
712712

713-
function applyCooldownOn429(
713+
// 429 = upstream rate-limit, 402 = quota/credit exhausted (new
714+
// X-GitHub-Api-Version quota_exceeded semantics). Both mean this instance
715+
// cannot serve now, so cool it down and stop routing here.
716+
const COOLDOWN_STATUSES = new Set([429, 402])
717+
718+
function applyCooldownOnExhaustion(
714719
runtime: RouterRuntime,
715720
proxied: Response,
716721
params: {
@@ -720,10 +725,11 @@ function applyCooldownOn429(
720725
requestNowMs: number
721726
},
722727
) {
723-
if (proxied.status !== 429) {
728+
if (!COOLDOWN_STATUSES.has(proxied.status)) {
724729
return
725730
}
726731

732+
// 402 has no Retry-After; falls back to defaultCooldownMs below.
727733
const retryAfter = proxied.headers.get("Retry-After")
728734
const retryAfterMs = parseRetryAfterMs(retryAfter, params.requestNowMs)
729735
const cooldownMs = retryAfterMs ?? runtime.defaultCooldownMs
@@ -732,7 +738,7 @@ function applyCooldownOn429(
732738
runtime.state.portCooldownUntil.set(params.port, cooldownUntilMs)
733739
runtime.state.portCooldownRetryAfter.set(params.port, retryAfter)
734740
runtime.logger(
735-
`cooldown set instance=${params.instanceName}:${params.port} model=${params.model} until=${new Date(cooldownUntilMs).toISOString()} retry-after=${retryAfter || "_"}`,
741+
`cooldown set instance=${params.instanceName}:${params.port} model=${params.model} status=${proxied.status} until=${new Date(cooldownUntilMs).toISOString()} retry-after=${retryAfter || "_"}`,
736742
)
737743
}
738744

@@ -825,7 +831,7 @@ async function handleNoModelRequest(
825831
onQuotaSnapshots: (quotaSnapshots) =>
826832
updateUpstreamQuotaSnapshot(runtime.state, port, quotaSnapshots),
827833
})
828-
applyCooldownOn429(runtime, proxied, {
834+
applyCooldownOnExhaustion(runtime, proxied, {
829835
port,
830836
instanceName,
831837
model: "_",
@@ -895,7 +901,7 @@ async function handleModelRequest(
895901
onQuotaSnapshots: (quotaSnapshots) =>
896902
updateUpstreamQuotaSnapshot(runtime.state, result.port, quotaSnapshots),
897903
})
898-
applyCooldownOn429(runtime, proxied, {
904+
applyCooldownOnExhaustion(runtime, proxied, {
899905
port: result.port,
900906
instanceName,
901907
model: request.model,

tests/router/proxy.test.ts

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,45 @@ describe("router discovery and proxy helpers", () => {
234234

235235
// eslint-disable-next-line max-lines-per-function
236236
describe("router handler cooldown semantics", () => {
237+
test("router handler cools down instance on upstream 402 quota_exceeded", async () => {
238+
const state = createState()
239+
state.modelToPorts.set("gpt-4.1", [4141, 4142])
240+
state.sessionBindings.set("session-1:atlas:gpt-4.1", 4141)
241+
const fixedNowMs = new Date("2026-03-13T00:00:00.000Z").getTime()
242+
243+
const fetchImpl = createFetchStub((input) => {
244+
const port = new URL(toInputUrl(input)).port
245+
if (port === "4141") {
246+
return Promise.resolve(
247+
new Response(
248+
'{"error":{"message":"You have exceeded your monthly quota","code":"quota_exceeded"}}',
249+
{ status: 402 },
250+
),
251+
)
252+
}
253+
return Promise.resolve(new Response("ok", { status: 200 }))
254+
})
255+
256+
const handler = createRouterHandlerForTest({ state, fetchImpl, fixedNowMs })
257+
258+
const res = await handler(
259+
new Request("http://localhost/v1/messages", {
260+
method: "POST",
261+
headers: {
262+
"content-type": "application/json",
263+
"x-session-id": "session-1",
264+
"x-oc-agent": "atlas",
265+
"x-oc-provider": "openai",
266+
},
267+
body: '{"model":"gpt-4.1"}',
268+
}),
269+
)
270+
271+
// 402 has no Retry-After → default cooldown applied, instance cooled down.
272+
expect(res.status).toBe(402)
273+
expect(state.portCooldownUntil.get(4141)).toBeGreaterThan(fixedNowMs)
274+
})
275+
237276
test("router handler sets cooldown on upstream 429 using Retry-After seconds", async () => {
238277
const state = createState()
239278
state.modelToPorts.set("gpt-4.1", [4141, 4142])

0 commit comments

Comments
 (0)