Skip to content

Commit 387651c

Browse files
authored
fix(runtime): surface skip reason in pinned 503 (#486) (#487)
* fix(runtime): surface skip reason in pinned 503 (#486) The pinned-account 503 response previously omitted the runtime skip reason, forcing users to consult `codex-multi-auth status` out of band and making remote diagnosis impossible. The response body now carries a structured `reason` field and an `account_skip_reasons` map, mirroring the existing `writePoolExhausted` shape, and the human-readable message appends the same reason in parentheses. A missing reason maps to explicit `null` and is captured in `status.lastError` so a forecast vs. runtime state desync is detectable instead of silently masked. Updates the error-contract reference doc to match. Adds end-to-end coverage for the rate-limited, cooling-down, and disabled pinned-503 paths and extends the chooseAccount unit tests to assert skipReasons map population for every pinned unavailability case (rate-limited, cooling-down, disabled, policy-blocked, missing, already-attempted). Closes #486 partial: the diagnostic surface lands now; the underlying state desync that prompted the report still needs logs from the reporter to root-cause. * fix(runtime): address review feedback on pinned 503 (#486) - Extract `buildPinnedUnavailableErrorBody` helper so the null-reason state-desync branch can be unit-tested directly without standing up a full proxy. The helper is exported alongside a typed `PinnedUnavailableErrorBody` interface so external consumers can rely on a stable shape. - Tighten the end-to-end cooling-down assertion from a regex prefix to an exact equality check against `cooling-down:auth-failure`, the string set by `markAccountCoolingDown` in the test setup. Prevents silent contract drift on the cooldown reason format. - Add chooseAccount unit cases for the remaining pinned skip reasons flagged by review: `workspace-disabled` (all workspaces disabled) and `circuit-open` (failure threshold tripped via `recordFailure`). The suite now mirrors the full enumeration in `AccountManager.getAccountRuntimeSkipReason`. - Add direct unit coverage for `buildPinnedUnavailableErrorBody` over four shapes: empty map yields `reason: null` with no parenthetical in the message, populated map yields the reason plus the parenthetical, null `pinnedIndex` resolves to `pinnedAccountIndex: null` without throwing, and the full `account_skip_reasons` map is mirrored even when the pinned index has no entry of its own. Closes #486 partial: review feedback addressed; underlying state desync still needs reporter logs to root-cause.
1 parent 42648d6 commit 387651c

4 files changed

Lines changed: 484 additions & 8 deletions

File tree

docs/reference/error-contracts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ The default-on localhost Responses proxy returns JSON error payloads with a stab
7777
| `codex_pinned_account_unavailable` | `503` | A manual pin is set (via `codex-multi-auth switch`) but the pinned account is rate-limited, cooling down, disabled, or blocked by policy. Run `codex-multi-auth status` for details, or `codex-multi-auth unpin` to allow rotation |
7878
| `codex_runtime_rotation_proxy_error` | `500` | Proxy failed before forwarding the request |
7979

80-
Pool exhaustion includes a `reason`, `retry_after_ms`, and a hint to run `codex-multi-auth rotation status`. Pinned-account-unavailable responses include a `pinnedAccountIndex` field identifying the pinned account.
80+
Pool exhaustion includes a `reason`, `retry_after_ms`, and a hint to run `codex-multi-auth rotation status`. Pinned-account-unavailable responses include a `pinnedAccountIndex` field identifying the pinned account, a structured `reason` field carrying the runtime skip reason (for example `rate-limited`, `cooling-down:auth-failure`, `circuit-open`, `disabled`, `workspace-disabled`, `policy-blocked`, `missing`, `already-attempted`) or `null` when no reason was recorded, and an `account_skip_reasons` map keyed by account index that mirrors the pool-exhausted response shape. The human-readable `message` appends the same reason in parentheses when present (see issue #486).
8181

8282
---
8383

lib/runtime-rotation-proxy.ts

Lines changed: 54 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1044,6 +1044,47 @@ function writePoolExhausted(params: {
10441044
});
10451045
}
10461046

1047+
/**
1048+
* Build the JSON `error` body for a pinned-account 503 response. Extracted so
1049+
* the null-reason desync path (`reason: null`, no parenthetical in `message`)
1050+
* can be unit-tested without standing up a full proxy. The shape mirrors
1051+
* `writePoolExhausted` so consumers can handle both 503 codes uniformly. See
1052+
* issue #486.
1053+
*/
1054+
export interface PinnedUnavailableErrorBody {
1055+
message: string;
1056+
code: "codex_pinned_account_unavailable";
1057+
pinnedAccountIndex: number | null;
1058+
reason: string | null;
1059+
account_skip_reasons: Record<string, string>;
1060+
}
1061+
1062+
export function buildPinnedUnavailableErrorBody(
1063+
pinnedIndex: number | null | undefined,
1064+
accountSkipReasons: ReadonlyMap<number, string>,
1065+
): PinnedUnavailableErrorBody {
1066+
const normalizedPinnedIndex =
1067+
typeof pinnedIndex === "number" ? pinnedIndex : null;
1068+
const skipReason =
1069+
normalizedPinnedIndex !== null
1070+
? accountSkipReasons.get(normalizedPinnedIndex) ?? null
1071+
: null;
1072+
const reasonSuffix = skipReason ? ` (${skipReason})` : "";
1073+
const displayIndex = (normalizedPinnedIndex ?? 0) + 1;
1074+
return {
1075+
message: `Pinned account ${displayIndex} is currently unavailable${reasonSuffix}; run \`codex-multi-auth status\` for details, or \`codex-multi-auth unpin\` to allow rotation.`,
1076+
code: "codex_pinned_account_unavailable",
1077+
pinnedAccountIndex: normalizedPinnedIndex,
1078+
reason: skipReason,
1079+
account_skip_reasons: Object.fromEntries(
1080+
[...accountSkipReasons.entries()].map(([index, reason]) => [
1081+
String(index),
1082+
reason,
1083+
]),
1084+
),
1085+
};
1086+
}
1087+
10471088
async function withTimeout<T>(
10481089
promise: Promise<T>,
10491090
timeoutMs: number,
@@ -1738,19 +1779,25 @@ export async function startRuntimeRotationProxy(
17381779
// When a manual pin is set and the pinned account is unavailable, do
17391780
// NOT silently fall through to rotation. Hard-fail with a 503 so the
17401781
// user is informed the pin cannot be honored. See issue #474.
1782+
//
1783+
// Surface the runtime skip reason in both the human-readable message
1784+
// and a structured `reason` field, mirroring `writePoolExhausted`. A
1785+
// null reason indicates a forecast/runtime state desync (the pinned
1786+
// account was selected but no skip reason was recorded) — see #486.
17411787
if (isPinned) {
1788+
const errorBody = buildPinnedUnavailableErrorBody(
1789+
pinnedIndex,
1790+
accountSkipReasons,
1791+
);
1792+
if (errorBody.reason === null) {
1793+
status.lastError = `pinned-503 missing skip reason (pinnedIndex=${pinnedIndex})`;
1794+
}
17421795
await usageRecorder?.record({
17431796
outcome: "failure",
17441797
statusCode: HTTP_STATUS.SERVICE_UNAVAILABLE,
17451798
errorCode: "codex_pinned_account_unavailable",
17461799
});
1747-
writeJson(res, HTTP_STATUS.SERVICE_UNAVAILABLE, {
1748-
error: {
1749-
message: `Pinned account ${(pinnedIndex ?? 0) + 1} is currently unavailable; run \`codex-multi-auth status\` for details, or \`codex-multi-auth unpin\` to allow rotation.`,
1750-
code: "codex_pinned_account_unavailable",
1751-
pinnedAccountIndex: pinnedIndex,
1752-
},
1753-
});
1800+
writeJson(res, HTTP_STATUS.SERVICE_UNAVAILABLE, { error: errorBody });
17541801
return;
17551802
}
17561803

test/issue-474-pin-end-to-end.test.ts

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,8 +279,163 @@ describe("issue #474 — end-to-end pin honored over real HTTP proxy", () => {
279279
expect(thirdResult.bodyText).toContain(
280280
"codex_pinned_account_unavailable",
281281
);
282+
// Issue #486: the 503 body must surface the runtime skip reason so
283+
// users can diagnose why the pin cannot be honored without scraping
284+
// `codex-multi-auth status` logs out-of-band.
285+
const thirdBody = JSON.parse(thirdResult.bodyText) as {
286+
error: {
287+
code: string;
288+
pinnedAccountIndex: number | null;
289+
reason: string | null;
290+
account_skip_reasons: Record<string, string>;
291+
message: string;
292+
};
293+
};
294+
expect(thirdBody.error.reason).toBe("disabled");
295+
expect(thirdBody.error.message).toContain("(disabled)");
296+
expect(thirdBody.error.pinnedAccountIndex).toBe(otherAccountIndex);
297+
expect(
298+
thirdBody.error.account_skip_reasons[String(otherAccountIndex)],
299+
).toBe("disabled");
282300
// No additional upstream call — the proxy refused before issuing one.
283301
expect(upstreamCalls).toHaveLength(2);
284302
},
285303
);
304+
305+
it(
306+
"surfaces 'rate-limited' skip reason in pinned 503 body (issue #486)",
307+
async () => {
308+
const storagePath = makeTmpStoragePath();
309+
const now = Date.now();
310+
const initialStorage = createStorage(now);
311+
const pinnedIndex = 1;
312+
writeStorageFile(storagePath, {
313+
...initialStorage,
314+
pinnedAccountIndex: pinnedIndex,
315+
affinityGeneration: 1,
316+
});
317+
setStoragePathDirect(storagePath);
318+
319+
const accountManager = new AccountManager(undefined, initialStorage);
320+
openManagers.push(accountManager);
321+
322+
const pinned = accountManager.getAccountByIndex(pinnedIndex);
323+
expect(pinned).not.toBeNull();
324+
if (!pinned) throw new Error("setup failed");
325+
// Match the family the proxy will resolve from `model: "gpt-5-codex"`.
326+
// `getModelFamily("gpt-5-codex")` returns "gpt-5-codex", not "codex",
327+
// so the rate-limit must be keyed under that family for the runtime
328+
// skip-reason check to detect it.
329+
accountManager.markRateLimitedWithReason(
330+
pinned,
331+
60_000,
332+
"gpt-5-codex",
333+
"quota",
334+
);
335+
336+
const upstreamCalls: number[] = [];
337+
const fetchImpl: typeof fetch = async (_input, init) => {
338+
const headers = new Headers(init?.headers);
339+
const auth = headers.get("authorization") ?? "";
340+
const token = auth.replace(/^Bearer\s+/i, "");
341+
const index = initialStorage.accounts.findIndex(
342+
(a) => a.accessToken === token,
343+
);
344+
upstreamCalls.push(index);
345+
return new Response(JSON.stringify({ ok: true, account: index }), {
346+
status: HTTP_STATUS.OK,
347+
headers: { "content-type": "application/json" },
348+
});
349+
};
350+
351+
const proxy = await startRuntimeRotationProxy({
352+
accountManager,
353+
fetchImpl,
354+
upstreamBaseUrl: "https://example.test/backend-api",
355+
clientApiKey: CLIENT_API_KEY,
356+
});
357+
openServers.push(proxy);
358+
359+
const result = await postViaHttp(
360+
proxy,
361+
{ model: "gpt-5-codex", stream: false },
362+
"/v1/responses",
363+
);
364+
expect(result.status).toBe(HTTP_STATUS.SERVICE_UNAVAILABLE);
365+
const body = JSON.parse(result.bodyText) as {
366+
error: {
367+
code: string;
368+
reason: string | null;
369+
account_skip_reasons: Record<string, string>;
370+
message: string;
371+
};
372+
};
373+
expect(body.error.code).toBe("codex_pinned_account_unavailable");
374+
expect(body.error.reason).toBe("rate-limited");
375+
expect(body.error.message).toContain("(rate-limited)");
376+
expect(body.error.account_skip_reasons[String(pinnedIndex)]).toBe(
377+
"rate-limited",
378+
);
379+
expect(upstreamCalls).toHaveLength(0);
380+
},
381+
);
382+
383+
it(
384+
"surfaces a cooling-down skip reason in pinned 503 body (issue #486)",
385+
async () => {
386+
const storagePath = makeTmpStoragePath();
387+
const now = Date.now();
388+
const initialStorage = createStorage(now);
389+
const pinnedIndex = 0;
390+
writeStorageFile(storagePath, {
391+
...initialStorage,
392+
pinnedAccountIndex: pinnedIndex,
393+
affinityGeneration: 1,
394+
});
395+
setStoragePathDirect(storagePath);
396+
397+
const accountManager = new AccountManager(undefined, initialStorage);
398+
openManagers.push(accountManager);
399+
400+
const pinned = accountManager.getAccountByIndex(pinnedIndex);
401+
if (!pinned) throw new Error("setup failed");
402+
accountManager.markAccountCoolingDown(pinned, 60_000, "auth-failure");
403+
404+
const upstreamCalls: number[] = [];
405+
const fetchImpl: typeof fetch = async () => {
406+
upstreamCalls.push(-1);
407+
return new Response("{}", { status: HTTP_STATUS.OK });
408+
};
409+
410+
const proxy = await startRuntimeRotationProxy({
411+
accountManager,
412+
fetchImpl,
413+
upstreamBaseUrl: "https://example.test/backend-api",
414+
clientApiKey: CLIENT_API_KEY,
415+
});
416+
openServers.push(proxy);
417+
418+
const result = await postViaHttp(
419+
proxy,
420+
{ model: "gpt-5-codex", stream: false },
421+
"/v1/responses",
422+
);
423+
expect(result.status).toBe(HTTP_STATUS.SERVICE_UNAVAILABLE);
424+
const body = JSON.parse(result.bodyText) as {
425+
error: {
426+
code: string;
427+
reason: string | null;
428+
account_skip_reasons: Record<string, string>;
429+
message: string;
430+
};
431+
};
432+
expect(body.error.code).toBe("codex_pinned_account_unavailable");
433+
expect(body.error.reason).toBe("cooling-down:auth-failure");
434+
expect(body.error.message).toContain("(cooling-down:auth-failure)");
435+
expect(body.error.account_skip_reasons[String(pinnedIndex)]).toBe(
436+
"cooling-down:auth-failure",
437+
);
438+
expect(upstreamCalls).toHaveLength(0);
439+
},
440+
);
286441
});

0 commit comments

Comments
 (0)