ndycode
diff --git a/‎docs/configuration.md‎
Lines changed: 10 additions & 0 deletions b/‎docs/configuration.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎docs/development/CONFIG_FIELDS.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/development/CONFIG_FIELDS.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/reference/settings.md‎
Lines changed: 2 additions & 0 deletions b/‎docs/reference/settings.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎lib/accounts.ts‎
Lines changed: 79 additions & 0 deletions b/‎lib/accounts.ts‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎lib/config.ts‎
Lines changed: 36 additions & 0 deletions b/‎lib/config.ts‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎lib/runtime-rotation-proxy.ts‎
Lines changed: 113 additions & 4 deletions b/‎lib/runtime-rotation-proxy.ts‎
Lines changed: 113 additions & 4 deletions
diff --git a/‎lib/schemas.ts‎
Lines changed: 1 addition & 0 deletions b/‎lib/schemas.ts‎
Lines changed: 1 addition & 0 deletions
@@ -74,6 +74,7 @@ These are safe for most operators and frequently used in day-to-day workflows.
 | `CODEX_AUTH_FETCH_TIMEOUT_MS=<ms>` | HTTP request timeout override |
 | `CODEX_AUTH_STREAM_STALL_TIMEOUT_MS=<ms>` | Stream stall timeout override |
 | `CODEX_AUTH_MIN_ROTATION_INTERVAL_MS=<ms>` | Minimum time between global account switches (default `60000`). The proxy biases selection toward the last-served account within this window to reduce the rate at which different OAuth tokens appear from the same IP. Set to `0` to disable. |
+| `CODEX_AUTH_SCHEDULING_STRATEGY=hybrid/sequential` | Account scheduling strategy (default `hybrid`). `sequential` (drain-first) keeps one active account until it is fully exhausted before advancing to the next; see [Sequential / drain-first scheduling](#sequential--drain-first-scheduling). |
 | `CODEX_AUTH_TOKEN_INVALIDATION_COOLDOWN_MS=<ms>` | Cooldown applied to an account when the upstream or token-refresh endpoint explicitly revokes its OAuth token (default `300000`, 5 minutes). Raise this if accounts continue to be re-invalidated after re-login. |
 
 ---
@@ -117,6 +118,15 @@ The proxy preserves request bodies and streaming responses, replaces outbound au
 - **Token-invalidation detection**: when the upstream or the token-refresh endpoint returns an explicit OAuth revocation message, the proxy returns the error directly to the client instead of rotating to the next account. The affected account receives a 5-minute cooldown (`tokenInvalidationCooldownMs`, default `300000`) instead of the generic 30-second auth-failure cooldown. Configure via `CODEX_AUTH_TOKEN_INVALIDATION_COOLDOWN_MS`.
 - **Rotation-rate throttle**: the proxy biases account selection toward the last-served account for a configurable window (default 60 seconds, `minRotationIntervalMs`). Accounts that are rate-limited or cooling down are still rotated around. Configure via `CODEX_AUTH_MIN_ROTATION_INTERVAL_MS` or set to `0` to disable.
 
+### Sequential / drain-first scheduling
+
+`schedulingStrategy` controls how the proxy picks an account for each request:
+
+- `hybrid` (default) spreads load across all available accounts using a weighted health/token/freshness score. Both accounts tend to consume quota at a similar pace.
+- `sequential` (drain-first) routes every new request to one active account and only advances to the next available account once the current one is fully exhausted (rate-limited, cooling down, or circuit-open). Because the scan wraps the pool, an earlier account that has recovered its quota window is reclaimed as soon as the current account drains. This staggers quota recovery across accounts for longer uninterrupted sessions.
+
+In `sequential` mode a manual pin (`codex-multi-auth switch <index>`) still takes precedence and is never overridden. Sequential mode intentionally ignores per-session affinity: once the active account changes, all subsequent requests follow the new active account regardless of which account originally handled a conversation. Enable it with `schedulingStrategy: "sequential"` in settings or `CODEX_AUTH_SCHEDULING_STRATEGY=sequential` for a per-process trial.
+
 Microsoft/Outlook SSO accounts may be more sensitive to proxy-mediated token use. If an Outlook-linked account is invalidated on every first request through the proxy but works normally on ChatGPT web, the root cause is likely IP or device binding on the Microsoft side. Raising `CODEX_AUTH_TOKEN_INVALIDATION_COOLDOWN_MS` and re-logging in the affected account typically resolves the cascade. If the problem persists, consider excluding the Microsoft account from the rotation pool via `codex-multi-auth switch`.
 
 For `codex app` launches that go through the wrapper, the wrapper automatically starts a small internal helper so rotation can keep working if the desktop app launcher detaches. The helper stores only local runtime status, uses the same per-session proxy client key as the CLI path, and exits after an idle timeout.
 
@@ -65,6 +65,7 @@ Used only for host plugin mode through the host runtime config file.
 
 | Key | Default |
 | --- | --- |
+| `schedulingStrategy` | `hybrid` |
 | `retryAllAccountsRateLimited` | `true` |
 | `retryAllAccountsMaxWaitMs` | `0` |
 | `retryAllAccountsMaxRetries` | `Infinity` |
@@ -73,6 +74,8 @@ Used only for host plugin mode through the host runtime config file.
 | `fallbackToGpt52OnUnsupportedGpt53` | `true` |
 | `unsupportedCodexFallbackChain` | `{}` |
 
+`schedulingStrategy` selects how the runtime proxy picks an account per request. `hybrid` (default) keeps the weighted health/token/freshness selection that spreads load across all available accounts. `sequential` (drain-first) sticks to one active account and only advances to the next available account once the current one is fully exhausted (rate-limited / cooling down / circuit-open); earlier accounts become eligible again as soon as their quota window recovers, staggering recovery across the pool. A manual pin still overrides this, and sequential mode intentionally ignores per-session affinity so all new requests follow the single active account. Overridable per-process via `CODEX_AUTH_SCHEDULING_STRATEGY`.
+
 ### Token / Recovery
 
 | Key | Default |
@@ -220,6 +223,7 @@ Upgrade note:
 | `CODEX_TUI_GLYPHS` | TUI glyph mode |
 | `CODEX_AUTH_FETCH_TIMEOUT_MS` | Request timeout override |
 | `CODEX_AUTH_STREAM_STALL_TIMEOUT_MS` | Stream stall timeout override |
+| `CODEX_AUTH_SCHEDULING_STRATEGY` | Account scheduling strategy override (`hybrid` or `sequential`/drain-first) |
 | `CODEX_MULTI_AUTH_SYNC_CODEX_CLI` | Toggle Codex CLI state sync |
 | `CODEX_MULTI_AUTH_REAL_CODEX_BIN` | Force official Codex binary path |
 | `CODEX_MULTI_AUTH_BYPASS` | Bypass local auth handling |
 
@@ -143,6 +143,7 @@ Named backup behavior:
 | Key | Default | Effect |
 | --- | --- | --- |
 | `codexRuntimeRotationProxy` | `true` | Enable the default-on localhost Responses proxy for forwarded official Codex CLI/app sessions |
+| `schedulingStrategy` | `hybrid` | Account scheduling: `hybrid` spreads load across all available accounts; `sequential` (drain-first) keeps one active account until it is fully exhausted, then advances to the next |
 | `preemptiveQuotaEnabled` | `true` | Defer requests before remaining quota is critically low |
 | `preemptiveQuotaRemainingPercent5h` | `5` | 5-hour quota threshold |
 | `preemptiveQuotaRemainingPercent7d` | `5` | 7-day quota threshold |
@@ -207,6 +208,7 @@ Maintainer/debug-focused overrides include:
 - `CODEX_MULTI_AUTH_SYNC_CODEX_CLI`
 - `CODEX_MULTI_AUTH_REAL_CODEX_BIN`
 - `CODEX_MULTI_AUTH_BYPASS`
+- `CODEX_AUTH_SCHEDULING_STRATEGY` (`hybrid` | `sequential`; opt-in drain-first scheduling without editing settings)
 - `CODEX_CLI_ACCOUNTS_PATH`
 - `CODEX_CLI_AUTH_PATH`
 - refresh lease controls (`CODEX_AUTH_REFRESH_LEASE*`)
 
@@ -789,6 +789,85 @@ export class AccountManager {
 		return null;
 	}
 
+	/**
+	 * Sequential / drain-first selection (issue #509).
+	 *
+	 * Unlike the round-robin and hybrid selectors, this does NOT advance on
+	 * every pick. It sticks to the current active account for the family and
+	 * keeps returning it while it is available; only when the active account is
+	 * fully exhausted (rate-limited / cooling down / circuit-open / disabled)
+	 * does it scan forward — wrapping around the pool — for the next available
+	 * account and make THAT the new active account. Because the scan starts from
+	 * the active index and wraps, an earlier account that has recovered its
+	 * quota window becomes eligible again as soon as the current account drains,
+	 * producing the staggered-recovery pattern requested in #509.
+	 *
+	 * Concurrency: mutates `currentAccountIndexByFamily` / `cursorByFamily` like
+	 * the sibling selectors; callers that need serialization wrap the call in the
+	 * routing mutex (see the proxy hot path). No I/O.
+	 *
+	 * `blockedIndexes` (optional) is the request's policy block set
+	 * (`RuntimePolicyDecision.blockedAccountIndexes`: paused / drained / lacking
+	 * capability for the model). Blocked accounts are treated as NOT usable so the
+	 * selector never commits the active pointer to an account that `chooseAccount`
+	 * would immediately reject — otherwise the drain-first primary could anchor on
+	 * a permanently-blocked account and degrade every future request to the linear
+	 * scan fallback (#509 review P1).
+	 */
+	getCurrentOrNextForFamilySequential(
+		family: ModelFamily,
+		model?: string | null,
+		blockedIndexes?: ReadonlySet<number>,
+	): ManagedAccount | null {
+		const count = this.accounts.length;
+		if (count === 0) return null;
+
+		const isUsable = (
+			account: ManagedAccount | undefined,
+		): account is ManagedAccount => {
+			if (!account) return false;
+			if (account.enabled === false) return false;
+			if (blockedIndexes?.has(account.index)) return false;
+			if (!this.hasEnabledWorkspaces(account)) return false;
+			clearExpiredRateLimits(account);
+			return (
+				!isRateLimitedForFamily(account, family, model) &&
+				!this.isAccountCoolingDown(account) &&
+				this.isCircuitAvailable(account)
+			);
+		};
+
+		// Sticky: stay on the current active account while it is still usable so
+		// we drain it fully before moving on. A negative/unset active index falls
+		// through to the forward scan below.
+		const activeIndex = this.currentAccountIndexByFamily[family];
+		if (activeIndex >= 0 && activeIndex < count) {
+			const active = this.accounts[activeIndex];
+			if (isUsable(active)) {
+				active.lastUsed = nowMs();
+				return active;
+			}
+		}
+
+		// Active account is exhausted (or none set): scan forward from the active
+		// index, wrapping, for the next usable account and pin it as the new
+		// active account. Starting at the active index (not active+1) lets a
+		// recovered earlier account reclaim the active slot on wrap-around.
+		const start = activeIndex >= 0 && activeIndex < count ? activeIndex : 0;
+		for (let i = 0; i < count; i++) {
+			const idx = (start + i) % count;
+			const account = this.accounts[idx];
+			if (!isUsable(account)) continue;
+
+			this.currentAccountIndexByFamily[family] = idx;
+			this.cursorByFamily[family] = (idx + 1) % count;
+			account.lastUsed = nowMs();
+			return account;
+		}
+
+		return null;
+	}
+
 	getCurrentOrNextForFamilyHybrid(
 		family: ModelFamily,
 		model?: string | null,
 
@@ -252,6 +252,7 @@ export const DEFAULT_PLUGIN_CONFIG: PluginConfig = {
 	preemptiveQuotaRemainingPercent7d: 5,
 	preemptiveQuotaMaxDeferralMs: 2 * 60 * 60_000,
 	routingMutex: "legacy",
+	schedulingStrategy: "hybrid",
 };
 
 const PLUGIN_CONFIG_FIELD_SCHEMAS = PluginConfigSchema.shape;
@@ -1887,6 +1888,36 @@ export function getRoutingMutexMode(
 	);
 }
 
+const SCHEDULING_STRATEGY_MODES = new Set<string>(["hybrid", "sequential"]);
+
+/**
+ * Resolve the account scheduling strategy (issue #509).
+ *
+ * - `"hybrid"` (default) keeps the existing weighted health/token/freshness
+ *   selection that spreads load across all available accounts.
+ * - `"sequential"` (a.k.a. drain-first) sticks to one active account and only
+ *   advances to the next available account once the current one is fully
+ *   exhausted (rate-limited / cooling down / circuit-open). Earlier accounts
+ *   become eligible again as soon as their quota window recovers, producing the
+ *   staggered-recovery pattern requested in #509. A manual pin still overrides
+ *   this; sequential mode ignores per-session affinity so all new requests
+ *   follow the single active account. The `CODEX_AUTH_SCHEDULING_STRATEGY` env
+ *   var accepts the same two values for opt-in trials without editing settings.
+ *
+ * Concurrency: pure read; safe for concurrent callers. Performs no I/O and is
+ * unaffected by Windows filesystem semantics. Contains no secrets.
+ */
+export function getSchedulingStrategy(
+	pluginConfig: PluginConfig,
+): "hybrid" | "sequential" {
+	return resolveStringSetting(
+		"CODEX_AUTH_SCHEDULING_STRATEGY",
+		pluginConfig.schedulingStrategy,
+		"hybrid",
+		SCHEDULING_STRATEGY_MODES,
+	);
+}
+
 type ConfigExplainMeta = {
 	key: keyof PluginConfig;
 	envNames: string[];
@@ -2237,6 +2268,11 @@ const CONFIG_EXPLAIN_ENTRIES: ConfigExplainMeta[] = [
 		envNames: ["CODEX_AUTH_ROUTING_MUTEX"],
 		getValue: getRoutingMutexMode,
 	},
+	{
+		key: "schedulingStrategy",
+		envNames: ["CODEX_AUTH_SCHEDULING_STRATEGY"],
+		getValue: getSchedulingStrategy,
+	},
 ];
 
 export function getPluginConfigExplainReport(): ConfigExplainReport {
 
@@ -23,6 +23,7 @@ import {
 	getTokenRefreshSkewMs,
 	getPidOffsetEnabled,
 	getRoutingMutexMode,
+	getSchedulingStrategy,
 	loadPluginConfig,
 } from "./config.js";
 import {
@@ -1162,6 +1163,7 @@ export function chooseAccount(params: {
 	skipReasons?: Map<number, string>;
 	stickyBoostByAccount?: Record<number, number>;
 	pidOffsetEnabled?: boolean;
+	schedulingStrategy?: "hybrid" | "sequential";
 }): ManagedAccount | null {
 	const {
 		accountManager,
@@ -1176,6 +1178,7 @@ export function chooseAccount(params: {
 		skipReasons,
 		stickyBoostByAccount,
 		pidOffsetEnabled,
+		schedulingStrategy,
 	} = params;
 
 	// Manual pin (from `codex-multi-auth switch <n>`) overrides every other
@@ -1211,6 +1214,50 @@ export function chooseAccount(params: {
 		return pinned;
 	}
 
+	// Sequential / drain-first mode (issue #509): the active account governs ALL
+	// new requests, so we deliberately SKIP the per-session affinity tier — there
+	// is no per-chat stickiness, every request follows the single active account.
+	// The manual pin above still wins (handled first). When the active account is
+	// exhausted the selector advances to the next available account and earlier
+	// accounts reclaim the slot once their quota recovers.
+	if (schedulingStrategy === "sequential") {
+		const selected = accountManager.getCurrentOrNextForFamilySequential(
+			family,
+			model,
+			policy?.blockedAccountIndexes,
+		);
+		if (
+			selected &&
+			!attemptedIndexes.has(selected.index) &&
+			!policy?.blockedAccountIndexes.has(selected.index)
+		) {
+			const reason = accountManager.getAccountRuntimeSkipReason(
+				selected.index,
+				family,
+				model,
+			);
+			if (!reason) return selected;
+			skipReasons?.set(selected.index, reason);
+		}
+
+		// Active account was attempted/blocked/skipped this request (e.g. it just
+		// rate-limited mid-loop): fall through to the shared linear scan to find
+		// the next eligible account to TRY. Pass advanceActivePointer=false so a
+		// transient, non-exhausting failure on the active account does not
+		// permanently move the drain-first primary — only
+		// getCurrentOrNextForFamilySequential advances it, and only on true
+		// exhaustion.
+		return chooseLinearScanFallback({
+			accountManager,
+			family,
+			model,
+			attemptedIndexes,
+			policy,
+			skipReasons,
+			advanceActivePointer: false,
+		});
+	}
+
 	const preferredIndex = sessionAffinityStore?.getPreferredAccountIndex(sessionKey, now);
 	if (
 		typeof preferredIndex === "number" &&
@@ -1259,6 +1306,54 @@ export function chooseAccount(params: {
 		skipReasons?.set(selected.index, reason);
 	}
 
+	return chooseLinearScanFallback({
+		accountManager,
+		family,
+		model,
+		attemptedIndexes,
+		policy,
+		skipReasons,
+	});
+}
+
+/**
+ * Shared linear-scan fallback used by both the hybrid and sequential selection
+ * paths in `chooseAccount`. Walks every account in pool order and returns the
+ * first one that is not already attempted, not policy-blocked, and has no
+ * runtime skip reason (rate-limited / cooling down / circuit-open), recording a
+ * skip reason for each rejected candidate. Returns null when no eligible
+ * account remains.
+ *
+ * `advanceActivePointer` (default `true`) controls whether the winner is
+ * committed as the new active/cursor position via `markSwitched`. The hybrid
+ * path wants this (round-robin advance). The sequential path passes `false`:
+ * its within-request fallback only needs an account to TRY this request, and
+ * must NOT move `currentAccountIndexByFamily` — otherwise a transient,
+ * non-exhausting failure on the active account (which leaves it `isUsable` but
+ * present in `attemptedIndexes`) would permanently switch the drain-first
+ * primary even though it was never exhausted (issue #509 regression caught in
+ * review). In sequential mode only `getCurrentOrNextForFamilySequential` is
+ * allowed to advance the active pointer, and only on true exhaustion.
+ */
+function chooseLinearScanFallback(params: {
+	accountManager: AccountManager;
+	family: ModelFamily;
+	model: string | null;
+	attemptedIndexes: ReadonlySet<number>;
+	policy: RuntimePolicyDecision | null;
+	skipReasons?: Map<number, string>;
+	advanceActivePointer?: boolean;
+}): ManagedAccount | null {
+	const {
+		accountManager,
+		family,
+		model,
+		attemptedIndexes,
+		policy,
+		skipReasons,
+		advanceActivePointer = true,
+	} = params;
+
 	for (const account of accountManager.getAccountsSnapshot()) {
 		if (attemptedIndexes.has(account.index)) {
 			skipReasons?.set(account.index, "already-attempted");
@@ -1277,7 +1372,11 @@ export function chooseAccount(params: {
 			const live = accountManager.getAccountByIndex(account.index);
 			if (!live) continue;
 			// L4 (deferred): unlocked cursor mutation — see chooseAccount header.
-			accountManager.markSwitched(live, "rotation", family);
+			// Skipped in sequential mode (advanceActivePointer=false) so a
+			// within-request retry never reassigns the drain-first primary.
+			if (advanceActivePointer) {
+				accountManager.markSwitched(live, "rotation", family);
+			}
 			return live;
 		}
 		skipReasons?.set(account.index, reason);
@@ -1472,6 +1571,7 @@ export async function startRuntimeRotationProxy(
 	// mutations when routingMutex="enabled". Legacy mode keeps the inline fast path.
 	const routingMutexMode = getRoutingMutexMode(pluginConfig);
 	activeAccountManager.setRoutingMutexMode(routingMutexMode);
+	const schedulingStrategy = getSchedulingStrategy(pluginConfig);
 	const fetchImpl = options.fetchImpl ?? fetch;
 	const host = options.host ?? DEFAULT_HOST;
 	// Defense in depth (runtime-proxy-01): the proxy presents managed OAuth tokens
@@ -1771,17 +1871,26 @@ export async function startRuntimeRotationProxy(
 						skipReasons: accountSkipReasons,
 						stickyBoostByAccount: rotationStickyBoost,
 						pidOffsetEnabled,
+						schedulingStrategy,
 					});
 				const selected =
 					routingMutexMode === "enabled"
 						? await withRoutingMutex(routingMutexMode, async () => {
 								const candidate = selectAccount();
-								if (candidate && pinnedIndex === null) {
+								if (
+									candidate &&
+									pinnedIndex === null &&
+									schedulingStrategy !== "sequential"
+								) {
 									// Re-commit the cursor under the held mutex. Skipped when a
 									// manual pin is active so the proxy never clobbers the pin
 									// (see #474); pinned selections are deterministic and need no
-									// cursor advance. Runs inline via reentrancy — see comment
-									// above.
+									// cursor advance. Also skipped in sequential mode: the
+									// sequential selector already committed the correct active
+									// index inside this held mutex, and re-committing `candidate`
+									// would wrongly advance the drain-first primary when the pick
+									// came from the non-advancing linear-scan fallback (#509).
+									// Runs inline via reentrancy — see comment above.
 									await accountManager.markSwitchedLocked(
 										candidate,
 										"rotation",
 
@@ -78,6 +78,7 @@ export const PluginConfigSchema = z.object({
 	preemptiveQuotaRemainingPercent7d: z.number().min(0).max(100).optional(),
 	preemptiveQuotaMaxDeferralMs: z.number().min(1_000).optional(),
 	routingMutex: z.enum(["enabled", "legacy"]).optional(),
+	schedulingStrategy: z.enum(["hybrid", "sequential"]).optional(),
 });
 
 export type PluginConfigFromSchema = z.infer<typeof PluginConfigSchema>;