Skip to content

Commit 736eefb

Browse files
committed
fix: surface real upstream cooldowns
1 parent 8a53cc8 commit 736eefb

5 files changed

Lines changed: 162 additions & 15 deletions

File tree

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -406,6 +406,15 @@ A: 看模型。Claude family `<tool_use>` 协议训练扎实最稳(free 账号
406406
**Q: 31 个 trial 账号一会儿就全 unavailable**
407407
A: 八成是用了周限模型 — `claude-opus-4-7-max` / `gpt-5.5-xhigh` / `claude-sonnet-4-7-thinking` 这类高 reasoning effort 变体每个账号每周只有 5 次配额,31 号 × 5 次 ≈ 150 次就到顶。换 `claude-sonnet-4.6` / `claude-haiku-4.5` daily 配额比较宽松。`docker logs windsurfapi-windsurf-api-1 | grep rate_limit` 看每个账号的 cooldown 字段验证。
408408

409+
**Q: All accounts temporarily rate-limited / IP-level cooldown 是不是代理坏了**
410+
A: 通常不是。Windsurf 上游会对同一出口 IP + 同一模型的密集请求施加 cooldown,多个账号绑在同一出口时会一起被限流。WindsurfAPI 会停止继续烧账号并返回 `429 + Retry-After`;v2.0.140 起这个等待时间会按上游 `Resets in: 27m12s` 这类真实值返回,而不是固定提示 30 秒。解决方向是降并发、换更宽松模型、给账号绑定不同出口 IP,或者等上游 cooldown 到期。
411+
412+
**Q: free 账号是不是本地限制成 1 分钟 1 次**
413+
A: 不是。本地 free tier RPM 默认是 10/min。你看到的 1/min 或一段时间后恢复,通常是 Windsurf 上游 free-tier 动态限频或模型 entitlement 限制。Dashboard 里看账号状态和模型可用清单;请求无权限模型时错误里的 `available_in_pool` 会列出当前账号池能用的模型。
414+
415+
**Q: context deadline exceeded / Client.Timeout 能靠调大 .env timeout 解决吗**
416+
A: 不能。长 thinking / 长输出在约 236-243 秒断流,是 Windsurf provider/Cascade 单次 stream 窗口。WindsurfAPI 会把它标成 `upstream_deadline_exceeded` / `windsurf_provider_deadline`,并丢弃半截 Cascade 复用轨迹,避免下一轮上下文错乱。实际规避只能拆任务、降低 reasoning/max output,或换更快模型。
417+
409418
## 贡献者
410419

411420
特别感谢下面的朋友,他们提交过 PR 或系统性地审了代码,让这个项目变得更稳:
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# v2.0.140 - truthful upstream cooldowns
2+
3+
## What changed
4+
5+
- IP-level rate-limit burst short-circuiting now carries the real upstream cooldown instead of always telling clients to wait 30 seconds.
6+
- When Windsurf returns messages like `Resets in: 27m12s`, the 429 response now uses the same value in `Retry-After`, `error.retry_after_ms`, and the user-facing message.
7+
- Non-stream chat handling now supports the same dependency injection hooks as the stream path, so rate-limit behavior can be covered by behavior tests without starting a real language server.
8+
- Non-stream Cascade reuse invalidation now also recognizes structured `upstream_deadline_exceeded` / `windsurf_provider_deadline` responses, not only the raw upstream error text.
9+
- README FAQ now separates local RPM limits, upstream free-tier throttling, IP cooldowns, and the upstream ~240s provider deadline.
10+
11+
## Context
12+
13+
Issues #176 and #189 showed real upstream cooldowns around 26-30 minutes, but the IP-burst guard surfaced a fixed 30-second retry hint. The guard was already doing the right thing by stopping account burn; this release makes the operator/client-facing cooldown truthful.
14+
15+
This does not bypass Windsurf upstream rate limits. It prevents misleading retry timing and reduces repeated hammering during an upstream IP cooldown.
16+
17+
## Validation
18+
19+
- `node --test test/rate-limit.test.js`
20+
- `npm.cmd run test:release`
21+
- `node --test test/stream-error.test.js test/cascade-timeout-invalidation.test.js test/stream-pool-exhausted-error.test.js`
22+
- `npm.cmd run test:shard -- 0 4 --timeout-ms=90000`
23+
- `npm.cmd run test:shard -- 1 4 --timeout-ms=90000`
24+
- `npm.cmd run test:shard -- 2 4 --timeout-ms=90000`
25+
- `npm.cmd run test:shard -- 3 4 --timeout-ms=90000`

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "windsurf-api",
3-
"version": "2.0.139",
3+
"version": "2.0.140",
44
"description": "Windsurf to OpenAI + Anthropic compatible API proxy. Turns Windsurf's 107 AI models (Claude, GPT, Gemini, DeepSeek, Grok, Qwen, Kimi, GLM, SWE) into dual-protocol API endpoints. Zero npm deps.",
55
"type": "module",
66
"main": "src/index.js",

src/handlers/chat.js

Lines changed: 77 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ import {
5252
const HEARTBEAT_MS = 15_000;
5353
const QUEUE_RETRY_MS = 1_000;
5454
const QUEUE_MAX_WAIT_MS = 30_000;
55+
const IP_RATE_LIMIT_BURST_FLOOR_MS = 30_000;
5556

5657
// Build the option bag the v2.0.25 semantic key needs. tools / tool_choice /
5758
// preamble are baked into the digest so a tool schema change misses instead
@@ -826,7 +827,7 @@ export function repairToolCallArguments(tc, messages) {
826827
return tc;
827828
}
828829

829-
export function rateLimitCooldownMs(message = '') {
830+
export function parseRateLimitCooldownMs(message = '') {
830831
const reset = String(message || '').match(/resets?\s+in\s*:?\s*((?:(?:\d+)\s*[hms]\s*)+)/i);
831832
if (reset) {
832833
let total = 0;
@@ -848,7 +849,41 @@ export function rateLimitCooldownMs(message = '') {
848849
return n * 1000;
849850
}
850851
if (/about an hour|in an hour|try again in.*hour/i.test(message)) return 60 * 60 * 1000;
851-
return 60 * 1000;
852+
return null;
853+
}
854+
855+
export function rateLimitCooldownMs(message = '') {
856+
return parseRateLimitCooldownMs(message) || 60 * 1000;
857+
}
858+
859+
function formatRetryAfter(ms) {
860+
const seconds = Math.max(1, Math.ceil(Number(ms) / 1000));
861+
if (seconds >= 3600) {
862+
const h = Math.floor(seconds / 3600);
863+
const m = Math.ceil((seconds - h * 3600) / 60);
864+
return m > 0 ? `${h}h${m}m` : `${h}h`;
865+
}
866+
if (seconds >= 60) {
867+
const m = Math.floor(seconds / 60);
868+
const s = seconds - m * 60;
869+
return s > 0 ? `${m}m${s}s` : `${m}m`;
870+
}
871+
return `${seconds}s`;
872+
}
873+
874+
export function rateLimitBurstCooldownMs({ message = '', retryAfterMs = 0, apiKey = '', modelKey = '' } = {}) {
875+
const candidates = [IP_RATE_LIMIT_BURST_FLOOR_MS];
876+
const retry = Number(retryAfterMs);
877+
if (Number.isFinite(retry) && retry > 0) candidates.push(retry);
878+
const parsed = parseRateLimitCooldownMs(message);
879+
if (Number.isFinite(parsed) && parsed > 0) candidates.push(parsed);
880+
if (apiKey) {
881+
const availability = getAccountAvailability(apiKey, modelKey);
882+
if (!availability.available && Number.isFinite(availability.retryAfterMs) && availability.retryAfterMs > 0) {
883+
candidates.push(availability.retryAfterMs);
884+
}
885+
}
886+
return Math.max(...candidates);
852887
}
853888

854889
function genId() {
@@ -1492,6 +1527,9 @@ async function _handleChatCompletionsInner(body, context = {}) {
14921527
const cachePolicy = body.__cachePolicy || null;
14931528
const checkMessageRateLimitFn = context.checkMessageRateLimit || checkMessageRateLimit;
14941529
const waitForAccountFn = context.waitForAccount || waitForAccount;
1530+
const ensureLsFn = context.ensureLs || ensureLs;
1531+
const getLsForFn = context.getLsFor || getLsFor;
1532+
const WindsurfClientClass = context.WindsurfClient || WindsurfClient;
14951533

14961534
// Probe diagnostics: dump compact request shape for every call, plus a
14971535
// tail of the last user turn. Keeps us able to see how third-party
@@ -2179,11 +2217,11 @@ async function _handleChatCompletionsInner(body, context = {}) {
21792217
}
21802218
}
21812219

2182-
try { await ensureLs(acct.proxy); } catch (e) {
2220+
try { await ensureLsFn(acct.proxy); } catch (e) {
21832221
lastErr = isLsPoolExhausted(e) ? lsPoolExhaustedResponse(e) : { status: e.status || 503, body: { error: { message: e.message || String(e), type: e.type || 'ls_unavailable' } } };
21842222
break;
21852223
}
2186-
const ls = getLsFor(acct.proxy);
2224+
const ls = getLsForFn(acct.proxy);
21872225
if (!ls) { lastErr = { status: 503, body: { error: { message: 'No LS instance available', type: 'ls_unavailable' } } }; break; }
21882226
// Cascade pins cascade_id to a specific LS port too; if the LS it was
21892227
// born on has been replaced, the cascade_id is dead.
@@ -2197,7 +2235,7 @@ async function _handleChatCompletionsInner(body, context = {}) {
21972235
return n + (typeof c === 'string' ? c.length : Array.isArray(c) ? c.reduce((k, p) => k + (typeof p?.text === 'string' ? p.text.length : 0), 0) : 0);
21982236
}, 0);
21992237
log.info(`Chat[${reqId}]: model=${displayModel} flow=${useCascade ? 'cascade' : 'legacy'} attempt=${attempt + 1} account=${acct.email} ls=${ls.port} turns=${(messages||[]).length} chars=${_msgChars}${reuseEntry ? ' reuse=1' : ''}${emulateTools ? ' tools=emu' : ''}`);
2200-
const client = new WindsurfClient(acct.apiKey, ls.port, ls.csrfToken);
2238+
const client = new WindsurfClientClass(acct.apiKey, ls.port, ls.csrfToken);
22012239
const result = await nonStreamResponse(
22022240
client, chatId, created, displayModel, routingModelKey, messages, cascadeMessages, modelEnum, modelUid,
22032241
useCascade, acct.apiKey, ckey,
@@ -2224,8 +2262,14 @@ async function _handleChatCompletionsInner(body, context = {}) {
22242262
// see the matching catch block in streamResponse for the full
22252263
// rationale (cascade trajectory left half-broken, next reuse hits
22262264
// it and the model "loses" the prior conversation).
2227-
const _resultMsg = String(result.body?.error?.message || '');
2228-
if (isUpstreamDeadlineExceeded(_resultMsg)) {
2265+
const _resultError = result.body?.error || {};
2266+
const _resultMsg = String(_resultError.upstream_message || _resultError.message || '');
2267+
if (
2268+
result.status === 504
2269+
|| _resultError.type === 'upstream_deadline_exceeded'
2270+
|| _resultError.code === 'windsurf_provider_deadline'
2271+
|| isUpstreamDeadlineExceeded(_resultMsg)
2272+
) {
22292273
reuseEntryDead = true;
22302274
}
22312275
lastErr = result;
@@ -2245,22 +2289,32 @@ async function _handleChatCompletionsInner(body, context = {}) {
22452289
if (!context.__rateLimitEvents) context.__rateLimitEvents = [];
22462290
const RL_WINDOW_MS = 8_000;
22472291
const RL_BURST_THRESHOLD = 3;
2248-
context.__rateLimitEvents.push({ time: Date.now(), model: routingModelKey, account: acct.id });
2292+
context.__rateLimitEvents.push({
2293+
time: Date.now(),
2294+
model: routingModelKey,
2295+
account: acct.id,
2296+
cooldownMs: rateLimitBurstCooldownMs({
2297+
message: result.body?.error?.message || '',
2298+
retryAfterMs: result.body?.error?.retry_after_ms,
2299+
apiKey: acct.apiKey,
2300+
modelKey: routingModelKey,
2301+
}),
2302+
});
22492303
// Prune old events
22502304
const cutoff = Date.now() - RL_WINDOW_MS;
22512305
while (context.__rateLimitEvents.length && context.__rateLimitEvents[0].time < cutoff) {
22522306
context.__rateLimitEvents.shift();
22532307
}
22542308
const sameModelBurst = context.__rateLimitEvents.filter(e => e.model === routingModelKey);
22552309
if (sameModelBurst.length >= RL_BURST_THRESHOLD) {
2256-
const maxCooldown = Math.max(...sameModelBurst.map(() => 30_000));
2310+
const maxCooldown = Math.max(...sameModelBurst.map(e => e.cooldownMs || IP_RATE_LIMIT_BURST_FLOOR_MS));
22572311
log.warn(`Chat[${reqId}]: IP-rate-limit burst detected — ${sameModelBurst.length} accounts rate-limited on ${displayModel} within ${RL_WINDOW_MS}ms. Short-circuiting.`);
22582312
return {
22592313
status: 429,
22602314
headers: { 'Retry-After': String(Math.ceil(maxCooldown / 1000)) },
22612315
body: {
22622316
error: {
2263-
message: `All accounts temporarily rate-limited on ${displayModel}. Windsurf upstream is applying IP-level cooldown. Wait ${Math.ceil(maxCooldown / 1000)}s before retrying, or switch to a different model.`,
2317+
message: `All accounts temporarily rate-limited on ${displayModel}. Windsurf upstream is applying IP-level cooldown. Wait ${formatRetryAfter(maxCooldown)} before retrying, or switch to a different model.`,
22642318
type: 'rate_limit_exceeded',
22652319
retry_after_ms: maxCooldown,
22662320
},
@@ -3618,7 +3672,17 @@ function streamResponse(id, created, model, modelKey, provider, messages, cascad
36183672
const RL_WINDOW_MS = 8_000;
36193673
const RL_BURST_THRESHOLD = 3;
36203674
const now = Date.now();
3621-
ctx.__rateLimitEvents.push({ time: now, model: modelKey, account: acct?.id });
3675+
ctx.__rateLimitEvents.push({
3676+
time: now,
3677+
model: modelKey,
3678+
account: acct?.id,
3679+
cooldownMs: rateLimitBurstCooldownMs({
3680+
message: err.message || '',
3681+
retryAfterMs: err.retry_after_ms,
3682+
apiKey: currentApiKey,
3683+
modelKey,
3684+
}),
3685+
});
36223686
const cutoff = now - RL_WINDOW_MS;
36233687
while (ctx.__rateLimitEvents.length && ctx.__rateLimitEvents[0].time < cutoff) {
36243688
ctx.__rateLimitEvents.shift();
@@ -3627,8 +3691,8 @@ function streamResponse(id, created, model, modelKey, provider, messages, cascad
36273691
if (sameModelBurst.length >= RL_BURST_THRESHOLD) {
36283692
ctx.__rlAborted = true;
36293693
log.warn(`Chat[${reqId}] stream: IP-rate-limit burst — ${sameModelBurst.length} accounts rate-limited on ${model} within ${RL_WINDOW_MS}ms. Short-circuiting.`);
3630-
const cooldown = Math.max(...sameModelBurst.map(() => 30_000));
3631-
lastErr = Object.assign(new Error(`All accounts temporarily rate-limited on ${model}. Windsurf upstream is applying IP-level cooldown. Wait ~${Math.ceil(cooldown / 1000)}s before retrying.`), { type: 'rate_limit_exceeded', retry_after_ms: cooldown });
3694+
const cooldown = Math.max(...sameModelBurst.map(e => e.cooldownMs || IP_RATE_LIMIT_BURST_FLOOR_MS));
3695+
lastErr = Object.assign(new Error(`All accounts temporarily rate-limited on ${model}. Windsurf upstream is applying IP-level cooldown. Wait ~${formatRetryAfter(cooldown)} before retrying.`), { type: 'rate_limit_exceeded', retry_after_ms: cooldown });
36323696
break;
36333697
}
36343698
}

test/rate-limit.test.js

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ import {
1010
removeAccount,
1111
setAccountTier,
1212
} from '../src/auth.js';
13-
import { handleChatCompletions, rateLimitCooldownMs } from '../src/handlers/chat.js';
13+
import { handleChatCompletions, rateLimitBurstCooldownMs, rateLimitCooldownMs } from '../src/handlers/chat.js';
1414
import { getExperimental, setExperimental } from '../src/runtime-config.js';
1515

1616
const createdAccountIds = [];
@@ -73,6 +73,55 @@ describe('rate-limit handling', () => {
7373
assert.equal(rateLimitCooldownMs('resets in 3h'), 3 * 60 * 60 * 1000);
7474
});
7575

76+
it('keeps real upstream cooldowns for IP-level burst short-circuiting', () => {
77+
assert.equal(
78+
rateLimitBurstCooldownMs({
79+
message: 'Reached message rate limit for this model. Please try again later. Resets in: 27m12s (trace ID: abc)',
80+
retryAfterMs: 30000,
81+
}),
82+
(27 * 60 * 1000) + (12 * 1000)
83+
);
84+
assert.equal(rateLimitBurstCooldownMs({ message: 'rate limit exceeded' }), 30000);
85+
});
86+
87+
it('uses real upstream cooldowns in the non-stream IP burst response', async () => {
88+
const accounts = [
89+
addTestAccount('ip-burst-a'),
90+
addTestAccount('ip-burst-b'),
91+
addTestAccount('ip-burst-c'),
92+
];
93+
for (const account of accounts) setAccountTier(account.id, 'free');
94+
95+
class RateLimitedClient {
96+
async cascadeChat() {
97+
throw new Error('Reached message rate limit for this model. Please try again later. Resets in: 27m12s (trace ID: abc)');
98+
}
99+
async rawGetChatMessage() {
100+
throw new Error('Reached message rate limit for this model. Please try again later. Resets in: 27m12s (trace ID: abc)');
101+
}
102+
}
103+
104+
const result = await handleChatCompletions({
105+
model: 'gemini-2.5-flash',
106+
messages: [{ role: 'user', content: `hi ${Date.now()}` }],
107+
}, {
108+
async waitForAccount(tried, signal, maxWaitMs, modelKey) {
109+
return getApiKey(tried, modelKey);
110+
},
111+
async ensureLs() {},
112+
getLsFor() {
113+
return { port: 12345, csrfToken: 'csrf-test' };
114+
},
115+
WindsurfClient: RateLimitedClient,
116+
});
117+
118+
assert.equal(result.status, 429);
119+
assert.equal(result.body.error.type, 'rate_limit_exceeded');
120+
assert.equal(result.body.error.retry_after_ms, (27 * 60 * 1000) + (12 * 1000));
121+
assert.equal(result.headers['Retry-After'], '1632');
122+
assert.match(result.body.error.message, /27m12s/);
123+
});
124+
76125
it('does not extend an existing cooldown when a later 429 arrives for the same model', async () => {
77126
const account = addTestAccount('max-extend');
78127
const modelKey = 'gemini-2.5-flash';

0 commit comments

Comments
 (0)