Skip to content

Commit d072244

Browse files
authored
feat(rotation): sequential drain-first account scheduling mode (#510)
Adds an optional schedulingStrategy setting (hybrid | sequential). In sequential / drain-first mode the runtime proxy routes all new requests to one active account until it is fully exhausted (rate-limited / cooling down / circuit-open), then advances to the next available account; earlier accounts reclaim the active slot once their quota window recovers, staggering recovery across the pool. Default stays hybrid, so existing behavior is unchanged. - accounts.ts: getCurrentOrNextForFamilySequential sticky selector (per ModelFamily), policy-aware so it never anchors on a blocked account - runtime-rotation-proxy.ts: chooseAccount branches on strategy; sequential skips per-session affinity, manual pin still wins; shared linear-scan fallback does not advance the drain-first primary - config/schemas: schedulingStrategy enum (default hybrid), getSchedulingStrategy accessor, CODEX_AUTH_SCHEDULING_STRATEGY env override, config-explain parity - docs: configuration, settings reference, config-field inventory - tests: selector stickiness/advance/wrap/recovery, per-family isolation, cooldown/circuit-open/disabled paths, policy-block guard, affinity-override and pin-precedence, config accessor Closes #509
1 parent f3d4e7c commit d072244

11 files changed

Lines changed: 771 additions & 4 deletions

docs/configuration.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ These are safe for most operators and frequently used in day-to-day workflows.
7474
| `CODEX_AUTH_FETCH_TIMEOUT_MS=<ms>` | HTTP request timeout override |
7575
| `CODEX_AUTH_STREAM_STALL_TIMEOUT_MS=<ms>` | Stream stall timeout override |
7676
| `CODEX_AUTH_MIN_ROTATION_INTERVAL_MS=<ms>` | Minimum time between global account switches (default `60000`). The proxy biases selection toward the last-served account within this window to reduce the rate at which different OAuth tokens appear from the same IP. Set to `0` to disable. |
77+
| `CODEX_AUTH_SCHEDULING_STRATEGY=hybrid/sequential` | Account scheduling strategy (default `hybrid`). `sequential` (drain-first) keeps one active account until it is fully exhausted before advancing to the next; see [Sequential / drain-first scheduling](#sequential--drain-first-scheduling). |
7778
| `CODEX_AUTH_TOKEN_INVALIDATION_COOLDOWN_MS=<ms>` | Cooldown applied to an account when the upstream or token-refresh endpoint explicitly revokes its OAuth token (default `300000`, 5 minutes). Raise this if accounts continue to be re-invalidated after re-login. |
7879

7980
---
@@ -117,6 +118,15 @@ The proxy preserves request bodies and streaming responses, replaces outbound au
117118
- **Token-invalidation detection**: when the upstream or the token-refresh endpoint returns an explicit OAuth revocation message, the proxy returns the error directly to the client instead of rotating to the next account. The affected account receives a 5-minute cooldown (`tokenInvalidationCooldownMs`, default `300000`) instead of the generic 30-second auth-failure cooldown. Configure via `CODEX_AUTH_TOKEN_INVALIDATION_COOLDOWN_MS`.
118119
- **Rotation-rate throttle**: the proxy biases account selection toward the last-served account for a configurable window (default 60 seconds, `minRotationIntervalMs`). Accounts that are rate-limited or cooling down are still rotated around. Configure via `CODEX_AUTH_MIN_ROTATION_INTERVAL_MS` or set to `0` to disable.
119120

121+
### Sequential / drain-first scheduling
122+
123+
`schedulingStrategy` controls how the proxy picks an account for each request:
124+
125+
- `hybrid` (default) spreads load across all available accounts using a weighted health/token/freshness score. Both accounts tend to consume quota at a similar pace.
126+
- `sequential` (drain-first) routes every new request to one active account and only advances to the next available account once the current one is fully exhausted (rate-limited, cooling down, or circuit-open). Because the scan wraps the pool, an earlier account that has recovered its quota window is reclaimed as soon as the current account drains. This staggers quota recovery across accounts for longer uninterrupted sessions.
127+
128+
In `sequential` mode a manual pin (`codex-multi-auth switch <index>`) still takes precedence and is never overridden. Sequential mode intentionally ignores per-session affinity: once the active account changes, all subsequent requests follow the new active account regardless of which account originally handled a conversation. Enable it with `schedulingStrategy: "sequential"` in settings or `CODEX_AUTH_SCHEDULING_STRATEGY=sequential` for a per-process trial.
129+
120130
Microsoft/Outlook SSO accounts may be more sensitive to proxy-mediated token use. If an Outlook-linked account is invalidated on every first request through the proxy but works normally on ChatGPT web, the root cause is likely IP or device binding on the Microsoft side. Raising `CODEX_AUTH_TOKEN_INVALIDATION_COOLDOWN_MS` and re-logging in the affected account typically resolves the cascade. If the problem persists, consider excluding the Microsoft account from the rotation pool via `codex-multi-auth switch`.
121131

122132
For `codex app` launches that go through the wrapper, the wrapper automatically starts a small internal helper so rotation can keep working if the desktop app launcher detaches. The helper stores only local runtime status, uses the same per-session proxy client key as the CLI path, and exits after an idle timeout.

docs/development/CONFIG_FIELDS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ Used only for host plugin mode through the host runtime config file.
6565

6666
| Key | Default |
6767
| --- | --- |
68+
| `schedulingStrategy` | `hybrid` |
6869
| `retryAllAccountsRateLimited` | `true` |
6970
| `retryAllAccountsMaxWaitMs` | `0` |
7071
| `retryAllAccountsMaxRetries` | `Infinity` |
@@ -73,6 +74,8 @@ Used only for host plugin mode through the host runtime config file.
7374
| `fallbackToGpt52OnUnsupportedGpt53` | `true` |
7475
| `unsupportedCodexFallbackChain` | `{}` |
7576

77+
`schedulingStrategy` selects how the runtime proxy picks an account per request. `hybrid` (default) keeps the weighted health/token/freshness selection that spreads load across all available accounts. `sequential` (drain-first) sticks to one active account and only advances to the next available account once the current one is fully exhausted (rate-limited / cooling down / circuit-open); earlier accounts become eligible again as soon as their quota window recovers, staggering recovery across the pool. A manual pin still overrides this, and sequential mode intentionally ignores per-session affinity so all new requests follow the single active account. Overridable per-process via `CODEX_AUTH_SCHEDULING_STRATEGY`.
78+
7679
### Token / Recovery
7780

7881
| Key | Default |
@@ -220,6 +223,7 @@ Upgrade note:
220223
| `CODEX_TUI_GLYPHS` | TUI glyph mode |
221224
| `CODEX_AUTH_FETCH_TIMEOUT_MS` | Request timeout override |
222225
| `CODEX_AUTH_STREAM_STALL_TIMEOUT_MS` | Stream stall timeout override |
226+
| `CODEX_AUTH_SCHEDULING_STRATEGY` | Account scheduling strategy override (`hybrid` or `sequential`/drain-first) |
223227
| `CODEX_MULTI_AUTH_SYNC_CODEX_CLI` | Toggle Codex CLI state sync |
224228
| `CODEX_MULTI_AUTH_REAL_CODEX_BIN` | Force official Codex binary path |
225229
| `CODEX_MULTI_AUTH_BYPASS` | Bypass local auth handling |

docs/reference/settings.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,7 @@ Named backup behavior:
143143
| Key | Default | Effect |
144144
| --- | --- | --- |
145145
| `codexRuntimeRotationProxy` | `true` | Enable the default-on localhost Responses proxy for forwarded official Codex CLI/app sessions |
146+
| `schedulingStrategy` | `hybrid` | Account scheduling: `hybrid` spreads load across all available accounts; `sequential` (drain-first) keeps one active account until it is fully exhausted, then advances to the next |
146147
| `preemptiveQuotaEnabled` | `true` | Defer requests before remaining quota is critically low |
147148
| `preemptiveQuotaRemainingPercent5h` | `5` | 5-hour quota threshold |
148149
| `preemptiveQuotaRemainingPercent7d` | `5` | 7-day quota threshold |
@@ -207,6 +208,7 @@ Maintainer/debug-focused overrides include:
207208
- `CODEX_MULTI_AUTH_SYNC_CODEX_CLI`
208209
- `CODEX_MULTI_AUTH_REAL_CODEX_BIN`
209210
- `CODEX_MULTI_AUTH_BYPASS`
211+
- `CODEX_AUTH_SCHEDULING_STRATEGY` (`hybrid` | `sequential`; opt-in drain-first scheduling without editing settings)
210212
- `CODEX_CLI_ACCOUNTS_PATH`
211213
- `CODEX_CLI_AUTH_PATH`
212214
- refresh lease controls (`CODEX_AUTH_REFRESH_LEASE*`)

lib/accounts.ts

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -789,6 +789,85 @@ export class AccountManager {
789789
return null;
790790
}
791791

792+
/**
793+
* Sequential / drain-first selection (issue #509).
794+
*
795+
* Unlike the round-robin and hybrid selectors, this does NOT advance on
796+
* every pick. It sticks to the current active account for the family and
797+
* keeps returning it while it is available; only when the active account is
798+
* fully exhausted (rate-limited / cooling down / circuit-open / disabled)
799+
* does it scan forward — wrapping around the pool — for the next available
800+
* account and make THAT the new active account. Because the scan starts from
801+
* the active index and wraps, an earlier account that has recovered its
802+
* quota window becomes eligible again as soon as the current account drains,
803+
* producing the staggered-recovery pattern requested in #509.
804+
*
805+
* Concurrency: mutates `currentAccountIndexByFamily` / `cursorByFamily` like
806+
* the sibling selectors; callers that need serialization wrap the call in the
807+
* routing mutex (see the proxy hot path). No I/O.
808+
*
809+
* `blockedIndexes` (optional) is the request's policy block set
810+
* (`RuntimePolicyDecision.blockedAccountIndexes`: paused / drained / lacking
811+
* capability for the model). Blocked accounts are treated as NOT usable so the
812+
* selector never commits the active pointer to an account that `chooseAccount`
813+
* would immediately reject — otherwise the drain-first primary could anchor on
814+
* a permanently-blocked account and degrade every future request to the linear
815+
* scan fallback (#509 review P1).
816+
*/
817+
getCurrentOrNextForFamilySequential(
818+
family: ModelFamily,
819+
model?: string | null,
820+
blockedIndexes?: ReadonlySet<number>,
821+
): ManagedAccount | null {
822+
const count = this.accounts.length;
823+
if (count === 0) return null;
824+
825+
const isUsable = (
826+
account: ManagedAccount | undefined,
827+
): account is ManagedAccount => {
828+
if (!account) return false;
829+
if (account.enabled === false) return false;
830+
if (blockedIndexes?.has(account.index)) return false;
831+
if (!this.hasEnabledWorkspaces(account)) return false;
832+
clearExpiredRateLimits(account);
833+
return (
834+
!isRateLimitedForFamily(account, family, model) &&
835+
!this.isAccountCoolingDown(account) &&
836+
this.isCircuitAvailable(account)
837+
);
838+
};
839+
840+
// Sticky: stay on the current active account while it is still usable so
841+
// we drain it fully before moving on. A negative/unset active index falls
842+
// through to the forward scan below.
843+
const activeIndex = this.currentAccountIndexByFamily[family];
844+
if (activeIndex >= 0 && activeIndex < count) {
845+
const active = this.accounts[activeIndex];
846+
if (isUsable(active)) {
847+
active.lastUsed = nowMs();
848+
return active;
849+
}
850+
}
851+
852+
// Active account is exhausted (or none set): scan forward from the active
853+
// index, wrapping, for the next usable account and pin it as the new
854+
// active account. Starting at the active index (not active+1) lets a
855+
// recovered earlier account reclaim the active slot on wrap-around.
856+
const start = activeIndex >= 0 && activeIndex < count ? activeIndex : 0;
857+
for (let i = 0; i < count; i++) {
858+
const idx = (start + i) % count;
859+
const account = this.accounts[idx];
860+
if (!isUsable(account)) continue;
861+
862+
this.currentAccountIndexByFamily[family] = idx;
863+
this.cursorByFamily[family] = (idx + 1) % count;
864+
account.lastUsed = nowMs();
865+
return account;
866+
}
867+
868+
return null;
869+
}
870+
792871
getCurrentOrNextForFamilyHybrid(
793872
family: ModelFamily,
794873
model?: string | null,

lib/config.ts

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,6 +252,7 @@ export const DEFAULT_PLUGIN_CONFIG: PluginConfig = {
252252
preemptiveQuotaRemainingPercent7d: 5,
253253
preemptiveQuotaMaxDeferralMs: 2 * 60 * 60_000,
254254
routingMutex: "legacy",
255+
schedulingStrategy: "hybrid",
255256
};
256257

257258
const PLUGIN_CONFIG_FIELD_SCHEMAS = PluginConfigSchema.shape;
@@ -1887,6 +1888,36 @@ export function getRoutingMutexMode(
18871888
);
18881889
}
18891890

1891+
const SCHEDULING_STRATEGY_MODES = new Set<string>(["hybrid", "sequential"]);
1892+
1893+
/**
1894+
* Resolve the account scheduling strategy (issue #509).
1895+
*
1896+
* - `"hybrid"` (default) keeps the existing weighted health/token/freshness
1897+
* selection that spreads load across all available accounts.
1898+
* - `"sequential"` (a.k.a. drain-first) sticks to one active account and only
1899+
* advances to the next available account once the current one is fully
1900+
* exhausted (rate-limited / cooling down / circuit-open). Earlier accounts
1901+
* become eligible again as soon as their quota window recovers, producing the
1902+
* staggered-recovery pattern requested in #509. A manual pin still overrides
1903+
* this; sequential mode ignores per-session affinity so all new requests
1904+
* follow the single active account. The `CODEX_AUTH_SCHEDULING_STRATEGY` env
1905+
* var accepts the same two values for opt-in trials without editing settings.
1906+
*
1907+
* Concurrency: pure read; safe for concurrent callers. Performs no I/O and is
1908+
* unaffected by Windows filesystem semantics. Contains no secrets.
1909+
*/
1910+
export function getSchedulingStrategy(
1911+
pluginConfig: PluginConfig,
1912+
): "hybrid" | "sequential" {
1913+
return resolveStringSetting(
1914+
"CODEX_AUTH_SCHEDULING_STRATEGY",
1915+
pluginConfig.schedulingStrategy,
1916+
"hybrid",
1917+
SCHEDULING_STRATEGY_MODES,
1918+
);
1919+
}
1920+
18901921
type ConfigExplainMeta = {
18911922
key: keyof PluginConfig;
18921923
envNames: string[];
@@ -2237,6 +2268,11 @@ const CONFIG_EXPLAIN_ENTRIES: ConfigExplainMeta[] = [
22372268
envNames: ["CODEX_AUTH_ROUTING_MUTEX"],
22382269
getValue: getRoutingMutexMode,
22392270
},
2271+
{
2272+
key: "schedulingStrategy",
2273+
envNames: ["CODEX_AUTH_SCHEDULING_STRATEGY"],
2274+
getValue: getSchedulingStrategy,
2275+
},
22402276
];
22412277

22422278
export function getPluginConfigExplainReport(): ConfigExplainReport {

lib/runtime-rotation-proxy.ts

Lines changed: 113 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ import {
2323
getTokenRefreshSkewMs,
2424
getPidOffsetEnabled,
2525
getRoutingMutexMode,
26+
getSchedulingStrategy,
2627
loadPluginConfig,
2728
} from "./config.js";
2829
import {
@@ -1162,6 +1163,7 @@ export function chooseAccount(params: {
11621163
skipReasons?: Map<number, string>;
11631164
stickyBoostByAccount?: Record<number, number>;
11641165
pidOffsetEnabled?: boolean;
1166+
schedulingStrategy?: "hybrid" | "sequential";
11651167
}): ManagedAccount | null {
11661168
const {
11671169
accountManager,
@@ -1176,6 +1178,7 @@ export function chooseAccount(params: {
11761178
skipReasons,
11771179
stickyBoostByAccount,
11781180
pidOffsetEnabled,
1181+
schedulingStrategy,
11791182
} = params;
11801183

11811184
// Manual pin (from `codex-multi-auth switch <n>`) overrides every other
@@ -1211,6 +1214,50 @@ export function chooseAccount(params: {
12111214
return pinned;
12121215
}
12131216

1217+
// Sequential / drain-first mode (issue #509): the active account governs ALL
1218+
// new requests, so we deliberately SKIP the per-session affinity tier — there
1219+
// is no per-chat stickiness, every request follows the single active account.
1220+
// The manual pin above still wins (handled first). When the active account is
1221+
// exhausted the selector advances to the next available account and earlier
1222+
// accounts reclaim the slot once their quota recovers.
1223+
if (schedulingStrategy === "sequential") {
1224+
const selected = accountManager.getCurrentOrNextForFamilySequential(
1225+
family,
1226+
model,
1227+
policy?.blockedAccountIndexes,
1228+
);
1229+
if (
1230+
selected &&
1231+
!attemptedIndexes.has(selected.index) &&
1232+
!policy?.blockedAccountIndexes.has(selected.index)
1233+
) {
1234+
const reason = accountManager.getAccountRuntimeSkipReason(
1235+
selected.index,
1236+
family,
1237+
model,
1238+
);
1239+
if (!reason) return selected;
1240+
skipReasons?.set(selected.index, reason);
1241+
}
1242+
1243+
// Active account was attempted/blocked/skipped this request (e.g. it just
1244+
// rate-limited mid-loop): fall through to the shared linear scan to find
1245+
// the next eligible account to TRY. Pass advanceActivePointer=false so a
1246+
// transient, non-exhausting failure on the active account does not
1247+
// permanently move the drain-first primary — only
1248+
// getCurrentOrNextForFamilySequential advances it, and only on true
1249+
// exhaustion.
1250+
return chooseLinearScanFallback({
1251+
accountManager,
1252+
family,
1253+
model,
1254+
attemptedIndexes,
1255+
policy,
1256+
skipReasons,
1257+
advanceActivePointer: false,
1258+
});
1259+
}
1260+
12141261
const preferredIndex = sessionAffinityStore?.getPreferredAccountIndex(sessionKey, now);
12151262
if (
12161263
typeof preferredIndex === "number" &&
@@ -1259,6 +1306,54 @@ export function chooseAccount(params: {
12591306
skipReasons?.set(selected.index, reason);
12601307
}
12611308

1309+
return chooseLinearScanFallback({
1310+
accountManager,
1311+
family,
1312+
model,
1313+
attemptedIndexes,
1314+
policy,
1315+
skipReasons,
1316+
});
1317+
}
1318+
1319+
/**
1320+
* Shared linear-scan fallback used by both the hybrid and sequential selection
1321+
* paths in `chooseAccount`. Walks every account in pool order and returns the
1322+
* first one that is not already attempted, not policy-blocked, and has no
1323+
* runtime skip reason (rate-limited / cooling down / circuit-open), recording a
1324+
* skip reason for each rejected candidate. Returns null when no eligible
1325+
* account remains.
1326+
*
1327+
* `advanceActivePointer` (default `true`) controls whether the winner is
1328+
* committed as the new active/cursor position via `markSwitched`. The hybrid
1329+
* path wants this (round-robin advance). The sequential path passes `false`:
1330+
* its within-request fallback only needs an account to TRY this request, and
1331+
* must NOT move `currentAccountIndexByFamily` — otherwise a transient,
1332+
* non-exhausting failure on the active account (which leaves it `isUsable` but
1333+
* present in `attemptedIndexes`) would permanently switch the drain-first
1334+
* primary even though it was never exhausted (issue #509 regression caught in
1335+
* review). In sequential mode only `getCurrentOrNextForFamilySequential` is
1336+
* allowed to advance the active pointer, and only on true exhaustion.
1337+
*/
1338+
function chooseLinearScanFallback(params: {
1339+
accountManager: AccountManager;
1340+
family: ModelFamily;
1341+
model: string | null;
1342+
attemptedIndexes: ReadonlySet<number>;
1343+
policy: RuntimePolicyDecision | null;
1344+
skipReasons?: Map<number, string>;
1345+
advanceActivePointer?: boolean;
1346+
}): ManagedAccount | null {
1347+
const {
1348+
accountManager,
1349+
family,
1350+
model,
1351+
attemptedIndexes,
1352+
policy,
1353+
skipReasons,
1354+
advanceActivePointer = true,
1355+
} = params;
1356+
12621357
for (const account of accountManager.getAccountsSnapshot()) {
12631358
if (attemptedIndexes.has(account.index)) {
12641359
skipReasons?.set(account.index, "already-attempted");
@@ -1277,7 +1372,11 @@ export function chooseAccount(params: {
12771372
const live = accountManager.getAccountByIndex(account.index);
12781373
if (!live) continue;
12791374
// L4 (deferred): unlocked cursor mutation — see chooseAccount header.
1280-
accountManager.markSwitched(live, "rotation", family);
1375+
// Skipped in sequential mode (advanceActivePointer=false) so a
1376+
// within-request retry never reassigns the drain-first primary.
1377+
if (advanceActivePointer) {
1378+
accountManager.markSwitched(live, "rotation", family);
1379+
}
12811380
return live;
12821381
}
12831382
skipReasons?.set(account.index, reason);
@@ -1472,6 +1571,7 @@ export async function startRuntimeRotationProxy(
14721571
// mutations when routingMutex="enabled". Legacy mode keeps the inline fast path.
14731572
const routingMutexMode = getRoutingMutexMode(pluginConfig);
14741573
activeAccountManager.setRoutingMutexMode(routingMutexMode);
1574+
const schedulingStrategy = getSchedulingStrategy(pluginConfig);
14751575
const fetchImpl = options.fetchImpl ?? fetch;
14761576
const host = options.host ?? DEFAULT_HOST;
14771577
// Defense in depth (runtime-proxy-01): the proxy presents managed OAuth tokens
@@ -1771,17 +1871,26 @@ export async function startRuntimeRotationProxy(
17711871
skipReasons: accountSkipReasons,
17721872
stickyBoostByAccount: rotationStickyBoost,
17731873
pidOffsetEnabled,
1874+
schedulingStrategy,
17741875
});
17751876
const selected =
17761877
routingMutexMode === "enabled"
17771878
? await withRoutingMutex(routingMutexMode, async () => {
17781879
const candidate = selectAccount();
1779-
if (candidate && pinnedIndex === null) {
1880+
if (
1881+
candidate &&
1882+
pinnedIndex === null &&
1883+
schedulingStrategy !== "sequential"
1884+
) {
17801885
// Re-commit the cursor under the held mutex. Skipped when a
17811886
// manual pin is active so the proxy never clobbers the pin
17821887
// (see #474); pinned selections are deterministic and need no
1783-
// cursor advance. Runs inline via reentrancy — see comment
1784-
// above.
1888+
// cursor advance. Also skipped in sequential mode: the
1889+
// sequential selector already committed the correct active
1890+
// index inside this held mutex, and re-committing `candidate`
1891+
// would wrongly advance the drain-first primary when the pick
1892+
// came from the non-advancing linear-scan fallback (#509).
1893+
// Runs inline via reentrancy — see comment above.
17851894
await accountManager.markSwitchedLocked(
17861895
candidate,
17871896
"rotation",

lib/schemas.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ export const PluginConfigSchema = z.object({
7878
preemptiveQuotaRemainingPercent7d: z.number().min(0).max(100).optional(),
7979
preemptiveQuotaMaxDeferralMs: z.number().min(1_000).optional(),
8080
routingMutex: z.enum(["enabled", "legacy"]).optional(),
81+
schedulingStrategy: z.enum(["hybrid", "sequential"]).optional(),
8182
});
8283

8384
export type PluginConfigFromSchema = z.infer<typeof PluginConfigSchema>;

0 commit comments

Comments
 (0)