Skip to content

Commit 6de3216

Browse files
lpcoxCopilotCopilot
authored
fix(api-proxy): 403 for terminal caps; fix Anthropic/Copilot input credits (#5271)
* fix(api-proxy): 403 for terminal caps; fix Anthropic input credits Two related token-budget fixes: 1. Terminal hard caps (effective_tokens, max_runs, max_cache_misses, ai_credits) now reject with HTTP 403 instead of 429. LLM SDK clients treat 429 as a transient rate-limit and retry-storm against a cap that never recovers, exhausting the run budget until the step times out. 403 is non-retryable, so the agent stops cleanly. The per-IP rate limiter keeps returning 429 (with Retry-After) since it is recoverable. 2. AI-credit calculation is now provider-aware. Anthropic reports input_tokens as the NON-cached input only (cache_read/cache_creation are additive), whereas OpenAI reports it as the TOTAL with cache as a subset. The old code always subtracted cache from input, over-counting cache and under-counting fresh input for Anthropic. provider is now threaded through applyAiCreditsUsage -> calculateAiCredits. Provider string literals in the new code use centralized constants from the new provider-names module (named to avoid colliding with the providers/ adapter directory). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(api-proxy): handle copilot fresh input in ai credits * test(smoke-claude): raise max-turns/maxRuns from 2 to 5 The maxRuns:2 cap was too tight for the smoke prompt: the agent routinely burns its 2 invocations on a planning turn plus a parallel capability-probe before emitting its safe output, then hits the cap and fails. Bump max-turns (which drives apiProxy.maxRuns) to 5 so the smoke test has headroom to complete. Recompiled the lock file and updated the workflow test assertions accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
1 parent 1dc8335 commit 6de3216

15 files changed

Lines changed: 179 additions & 76 deletions

.github/workflows/smoke-claude.lock.yml

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.github/workflows/smoke-claude.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ permissions:
1515
pull-requests: read
1616

1717
name: Smoke Claude
18-
max-turns: 2
18+
max-turns: 5
1919
engine:
2020
id: claude
2121
model: claude-haiku-4-5

containers/api-proxy/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ COPY server.js logging.js metrics.js rate-limiter.js \
2929
deprecated-header-tracker.js billing-headers.js upstream-response.js \
3030
anthropic-cache.js otel.js otel-exporters.js otel-serialization.js \
3131
token-budget-log.js blocked-request-diagnostics.js \
32-
provider-env-constants.js ./
32+
provider-env-constants.js provider-names.js ./
3333
COPY guards/ ./guards/
3434
COPY providers/ ./providers/
3535
COPY transforms/ ./transforms/

containers/api-proxy/guards/ai-credits-guard.js

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ const { logRequest, sanitizeForLog } = require('../logging');
44
const pricingByModel = require('../ai-credits-pricing');
55
const { resolveCatalogModel } = require('../models-dev-catalog');
66
const { parsePositiveNumber } = require('./guard-utils');
7+
const { PROVIDER_ANTHROPIC, PROVIDER_COPILOT } = require('../provider-names');
78

89
const TOKENS_PER_MILLION = 1_000_000;
910
const DOLLARS_PER_CREDIT = 0.01;
@@ -165,17 +166,24 @@ function checkUnknownModelRejection(model) {
165166
};
166167
}
167168

168-
function calculateAiCredits(normalizedUsage, model, state = aiCreditsState) {
169+
function calculateAiCredits(normalizedUsage, model, state = aiCreditsState, provider = undefined) {
169170
const pricing = resolveModelPricing(model, state);
170171
if (!pricing) return null;
171172

172-
// Both Anthropic and OpenAI report input_tokens as the TOTAL input including
173-
// cache_read and cache_creation tokens. To avoid double-counting, subtract
174-
// cached portions before applying the full input rate.
175-
const totalInput = normalizedUsage.input_tokens || 0;
173+
// input_tokens semantics differ by provider:
174+
// - Anthropic and Copilot report input_tokens as the NON-cached input only;
175+
// cache_read_input_tokens and cache_creation_input_tokens are reported
176+
// separately and are ADDITIVE to input_tokens. Subtracting them here would
177+
// over-subtract and undercount the genuinely-fresh input tokens.
178+
// - OpenAI (and OpenAI-compatible providers) report prompt_tokens/input_tokens
179+
// as the TOTAL input, with cached tokens being a SUBSET. Those must be
180+
// subtracted before applying the full input rate to avoid double-counting.
181+
const reportedInput = normalizedUsage.input_tokens || 0;
176182
const cacheReadTokens = normalizedUsage.cache_read_tokens || 0;
177183
const cacheWriteTokens = normalizedUsage.cache_write_tokens || 0;
178-
const nonCachedInput = Math.max(0, totalInput - cacheReadTokens - cacheWriteTokens);
184+
const nonCachedInput = provider === PROVIDER_ANTHROPIC || provider === PROVIDER_COPILOT
185+
? reportedInput
186+
: Math.max(0, reportedInput - cacheReadTokens - cacheWriteTokens);
179187

180188
const inputCredits = (nonCachedInput * pricing.input) / CREDIT_DENOMINATOR;
181189
const cachedInputCredits = (cacheReadTokens * pricing.cachedInput) / CREDIT_DENOMINATOR;
@@ -194,10 +202,10 @@ function calculateAiCredits(normalizedUsage, model, state = aiCreditsState) {
194202
};
195203
}
196204

197-
function applyAiCreditsUsage(normalizedUsage, model) {
205+
function applyAiCreditsUsage(normalizedUsage, model, provider = undefined) {
198206
if (!normalizedUsage) return null;
199207
const safeModel = model || 'unknown';
200-
const calc = calculateAiCredits(normalizedUsage, safeModel);
208+
const calc = calculateAiCredits(normalizedUsage, safeModel, aiCreditsState, provider);
201209
if (!calc) return null;
202210

203211
if (!Object.hasOwn(aiCreditsState.byModel, safeModel)) {

containers/api-proxy/guards/ai-credits-guard.test.js

Lines changed: 76 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -84,30 +84,94 @@ describe('ai-credits-guard', () => {
8484
expect(getAiCreditsReflectState().by_model['claude-sonnet-4-6-20260601'].total).toBeCloseTo(0.5175, 10);
8585
});
8686

87-
it('does not double-count cached tokens (cache_read included in input_tokens)', () => {
88-
// Simulates: 3M total input, 2.9M from cache, 0.1M new input
89-
// This is how Anthropic reports: input_tokens is the total (includes cache hits)
87+
it('does not double-count cached tokens when input_tokens is total-inclusive (OpenAI-style)', () => {
88+
// OpenAI (Chat Completions and Responses API) reports prompt_tokens/input_tokens
89+
// as the TOTAL input, with cached tokens being a subset. When no provider is
90+
// passed, the calculation defaults to this total-inclusive interpretation.
91+
// Simulates: 3M total input, 2.9M from cache, 0.1M new input.
9092
const usage = applyAiCreditsUsage({
9193
input_tokens: 3_000_000,
9294
cache_read_tokens: 2_900_000,
9395
output_tokens: 50_000,
94-
}, 'claude-sonnet-4-6');
96+
}, 'gpt-5.4');
9597

9698
// nonCached = 3M - 2.9M = 100K
97-
// inputCredits = 100_000 × $3.00 / 10000 = 30
98-
// cachedInputCredits = 2_900_000 × $0.30 / 10000 = 87
99+
// inputCredits = 100_000 × $2.50 / 10000 = 25
100+
// cachedInputCredits = 2_900_000 × $0.25 / 10000 = 72.5
99101
// outputCredits = 50_000 × $15.00 / 10000 = 75
100-
// total = 192 AIC
101-
expect(usage.inputCreditsThisResponse).toBeCloseTo(30, 5);
102-
expect(usage.cachedInputCreditsThisResponse).toBeCloseTo(87, 5);
102+
// total = 172.5 AIC
103+
expect(usage.inputCreditsThisResponse).toBeCloseTo(25, 5);
104+
expect(usage.cachedInputCreditsThisResponse).toBeCloseTo(72.5, 5);
103105
expect(usage.outputCreditsThisResponse).toBeCloseTo(75, 5);
104-
expect(usage.aiCreditsThisResponse).toBeCloseTo(192, 5);
106+
expect(usage.aiCreditsThisResponse).toBeCloseTo(172.5, 5);
105107

106-
// BUG (before fix): would have been 30 + 87 + 75 + (2.9M × $3 / 10000) = 192 + 870 = 1062
107-
// i.e., cached tokens counted at full price AND cache rate
108+
// BUG (before fix): would have been 25 + 72.5 + 75 + (2.9M × $2.50 / 10000) = 172.5 + 725
109+
// i.e., cached tokens counted at full price AND cache rate.
108110
expect(usage.aiCreditsThisResponse).toBeLessThan(250);
109111
});
110112

113+
it('treats Anthropic input_tokens as non-cached (additive cache), not total-inclusive', () => {
114+
// Anthropic reports input_tokens as the NON-cached input only;
115+
// cache_read_input_tokens and cache_creation_input_tokens are reported
116+
// separately and are ADDITIVE. The fresh input tokens must therefore be
117+
// charged in full and NOT subtracted from cache totals.
118+
const usage = applyAiCreditsUsage({
119+
input_tokens: 2000,
120+
cache_read_tokens: 10_000,
121+
output_tokens: 100,
122+
}, 'claude-sonnet-4-6', 'anthropic');
123+
124+
// nonCached = 2000 (NOT 2000 - 10000 clamped to 0)
125+
// inputCredits = 2000 × $3.00 / 10000 = 0.6
126+
// cachedInputCredits = 10_000 × $0.30 / 10000 = 0.3
127+
// outputCredits = 100 × $15.00 / 10000 = 0.15
128+
// total = 1.05 AIC
129+
expect(usage.inputCreditsThisResponse).toBeCloseTo(0.6, 10);
130+
expect(usage.cachedInputCreditsThisResponse).toBeCloseTo(0.3, 10);
131+
expect(usage.outputCreditsThisResponse).toBeCloseTo(0.15, 10);
132+
expect(usage.aiCreditsThisResponse).toBeCloseTo(1.05, 10);
133+
134+
// BUG (before fix): nonCached = max(0, 2000 - 10000) = 0, undercounting the
135+
// 2000 fresh input tokens → total would have been 0.45 instead of 1.05.
136+
expect(usage.aiCreditsThisResponse).toBeGreaterThan(1.0);
137+
});
138+
139+
it('charges Anthropic fresh input even when cache totals exceed input_tokens', () => {
140+
// Reproduces the observed smoke-claude record: tiny fresh input alongside
141+
// large cache read/write. Previously nonCached clamped to 0, dropping the
142+
// fresh input charge entirely.
143+
const usage = applyAiCreditsUsage({
144+
input_tokens: 5,
145+
cache_read_tokens: 38_673,
146+
cache_write_tokens: 21_060,
147+
output_tokens: 205,
148+
}, 'claude-opus-4-7', 'anthropic');
149+
150+
// nonCached = 5 (Anthropic: additive, not subtracted)
151+
// inputCredits = 5 × $5.00 / 10000 = 0.0025
152+
// cachedInput = 38_673 × $0.50 / 10000 = 1.93365
153+
// cacheWrite = 21_060 × $6.25 / 10000 = 13.1625
154+
// outputCredits = 205 × $25.00 / 10000 = 0.5125
155+
// total = 15.6111
156+
expect(usage.inputCreditsThisResponse).toBeCloseTo(0.0025, 10);
157+
expect(usage.aiCreditsThisResponse).toBeCloseTo(15.6111, 4);
158+
});
159+
160+
it('treats Copilot input_tokens as non-cached when provider is copilot', () => {
161+
const usage = applyAiCreditsUsage({
162+
input_tokens: 100,
163+
cache_read_tokens: 10_000,
164+
output_tokens: 0,
165+
}, 'gpt-5.4', 'copilot');
166+
167+
// inputCredits = 100 × $2.50 / 10000 = 0.025
168+
// cachedInputCredits = 10_000 × $0.25 / 10000 = 0.25
169+
// total = 0.275
170+
expect(usage.inputCreditsThisResponse).toBeCloseTo(0.025, 10);
171+
expect(usage.cachedInputCreditsThisResponse).toBeCloseTo(0.25, 10);
172+
expect(usage.aiCreditsThisResponse).toBeCloseTo(0.275, 10);
173+
});
174+
111175
it('warns and skips usage for unknown models', () => {
112176
const { lines } = collectLogOutput();
113177
const usage = applyAiCreditsUsage({ input_tokens: 100 }, 'unknown-model');

containers/api-proxy/guards/common-guard-checks.js

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,11 @@ function buildCommonGuardChecks(deps, model) {
6565
{
6666
block: getEffectiveTokenBlockState(),
6767
isBlocked: block => block && block.maxExceeded,
68-
statusCode: 429,
68+
// Terminal hard cap: returning 429 would make LLM SDK clients treat this
69+
// as a transient rate-limit and retry-storm against a limit that never
70+
// recovers, burning the budget until the step times out. 403 is
71+
// non-retryable, so the agent stops cleanly.
72+
statusCode: 403,
6973
eventName: 'effective_tokens_limit_exceeded',
7074
buildError: buildEffectiveTokenLimitError,
7175
buildLogFields: block => ({
@@ -76,7 +80,8 @@ function buildCommonGuardChecks(deps, model) {
7680
{
7781
block: getMaxRunsBlockState(),
7882
isBlocked: block => block && block.maxExceeded,
79-
statusCode: 429,
83+
// Terminal hard cap — non-retryable (see effective-tokens guard above).
84+
statusCode: 403,
8085
eventName: 'max_runs_exceeded',
8186
buildError: buildMaxRunsExceededError,
8287
buildLogFields: block => ({
@@ -87,7 +92,8 @@ function buildCommonGuardChecks(deps, model) {
8792
{
8893
block: getMaxCacheMissesBlockState(),
8994
isBlocked: block => block && block.maxExceeded,
90-
statusCode: 429,
95+
// Terminal hard cap — non-retryable (see effective-tokens guard above).
96+
statusCode: 403,
9197
eventName: 'max_cache_misses_exceeded',
9298
buildError: buildMaxCacheMissesExceededError,
9399
buildLogFields: block => ({
@@ -109,7 +115,8 @@ function buildCommonGuardChecks(deps, model) {
109115
{
110116
block: getAiCreditsBlockState(),
111117
isBlocked: block => block && block.maxExceeded,
112-
statusCode: 429,
118+
// Terminal hard cap — non-retryable (see effective-tokens guard above).
119+
statusCode: 403,
113120
eventName: 'ai_credits_limit_exceeded',
114121
buildError: buildAiCreditsLimitError,
115122
buildLogFields: block => ({
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
'use strict';
2+
3+
/**
4+
* Centralized provider name constants.
5+
*
6+
* Use these instead of bare string literals when comparing provider names so
7+
* that provider checks are spelling-safe and easy to find/refactor.
8+
*
9+
* NB: this module is intentionally named `provider-names` rather than
10+
* `providers` to avoid colliding with the `providers/` directory (the upstream
11+
* adapter registry resolved via `require('./providers')`).
12+
*/
13+
const PROVIDER_ANTHROPIC = 'anthropic';
14+
const PROVIDER_OPENAI = 'openai';
15+
const PROVIDER_COPILOT = 'copilot';
16+
const PROVIDER_GEMINI = 'gemini';
17+
18+
module.exports = {
19+
PROVIDER_ANTHROPIC,
20+
PROVIDER_OPENAI,
21+
PROVIDER_COPILOT,
22+
PROVIDER_GEMINI,
23+
};

containers/api-proxy/server.token-guards.test.js

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
/**
2-
* Tests for proxyRequest guards: effective token limit (429) and
3-
* max-runs limit (429).
2+
* Tests for proxyRequest token and permission guard behavior, including
3+
* effective-token, max-runs, max-cache-misses, AI-credits, and
4+
* permission-denied enforcement paths.
45
*
56
* Extracted from server.proxy.test.js.
67
*/
@@ -60,7 +61,7 @@ describe('proxyRequest effective token guard', () => {
6061
jest.restoreAllMocks();
6162
});
6263

63-
it('returns 429 with structured payload when effective token limit is reached', async () => {
64+
it('returns 403 with structured payload when effective token limit is reached', async () => {
6465
const cycle = createMockUpstreamCycle(https);
6566

6667
const req1 = makeReq();
@@ -81,7 +82,7 @@ describe('proxyRequest effective token guard', () => {
8182
await flushPromises();
8283

8384
expect(cycle.spy).toHaveBeenCalledTimes(1);
84-
expect(res2.writeHead).toHaveBeenCalledWith(429, expect.objectContaining({
85+
expect(res2.writeHead).toHaveBeenCalledWith(403, expect.objectContaining({
8586
'Content-Type': 'application/json',
8687
}));
8788
const payload = JSON.parse(res2.end.mock.calls[0][0]);
@@ -148,7 +149,7 @@ describe('proxyRequest max-runs guard', () => {
148149
jest.restoreAllMocks();
149150
});
150151

151-
it('returns 429 after max consecutive cache misses with non-zero input tokens', async () => {
152+
it('returns 403 after max consecutive cache misses with non-zero input tokens', async () => {
152153
const cycle = createMockUpstreamCycle(https);
153154

154155
const req1 = makeReq();
@@ -178,7 +179,7 @@ describe('proxyRequest max-runs guard', () => {
178179
await flushPromises();
179180

180181
expect(cycle.spy).toHaveBeenCalledTimes(2);
181-
expect(res3.writeHead).toHaveBeenCalledWith(429, expect.objectContaining({
182+
expect(res3.writeHead).toHaveBeenCalledWith(403, expect.objectContaining({
182183
'Content-Type': 'application/json',
183184
}));
184185
const payload = JSON.parse(res3.end.mock.calls[0][0]);
@@ -220,7 +221,7 @@ describe('proxyRequest max-runs guard', () => {
220221
await flushPromises();
221222

222223
expect(cycle.spy).toHaveBeenCalledTimes(3);
223-
expect(res3.writeHead).not.toHaveBeenCalledWith(429, expect.anything());
224+
expect(res3.writeHead).not.toHaveBeenCalledWith(403, expect.anything());
224225
});
225226
});
226227

@@ -230,7 +231,7 @@ describe('proxyRequest max-runs guard', () => {
230231
jest.restoreAllMocks();
231232
});
232233

233-
it('returns 429 with structured payload when max runs limit is exceeded', async () => {
234+
it('returns 403 with structured payload when max runs limit is exceeded', async () => {
234235
const cycle = createMockUpstreamCycle(https);
235236

236237
// First request completes successfully — consumes the single allowed run
@@ -250,7 +251,7 @@ describe('proxyRequest max-runs guard', () => {
250251
await flushPromises();
251252

252253
expect(cycle.spy).toHaveBeenCalledTimes(1);
253-
expect(res2.writeHead).toHaveBeenCalledWith(429, expect.objectContaining({
254+
expect(res2.writeHead).toHaveBeenCalledWith(403, expect.objectContaining({
254255
'Content-Type': 'application/json',
255256
}));
256257
const payload = JSON.parse(res2.end.mock.calls[0][0]);
@@ -273,7 +274,7 @@ describe('proxyRequest max-runs guard', () => {
273274
await flushPromises();
274275

275276
expect(httpsRequestSpy).toHaveBeenCalledTimes(1);
276-
expect(res.writeHead).not.toHaveBeenCalledWith(429, expect.anything());
277+
expect(res.writeHead).not.toHaveBeenCalledWith(403, expect.anything());
277278
});
278279
});
279280

@@ -296,7 +297,7 @@ describe('proxyRequest max-ai-credits guard', () => {
296297
jest.restoreAllMocks();
297298
});
298299

299-
it('returns 429 with structured payload when ai credits limit is reached', async () => {
300+
it('returns 403 with structured payload when ai credits limit is reached', async () => {
300301
const cycle = createMockUpstreamCycle(https);
301302

302303
const req1 = makeReq();
@@ -317,7 +318,7 @@ describe('proxyRequest max-ai-credits guard', () => {
317318
await flushPromises();
318319

319320
expect(cycle.spy).toHaveBeenCalledTimes(1);
320-
expect(res2.writeHead).toHaveBeenCalledWith(429, expect.objectContaining({
321+
expect(res2.writeHead).toHaveBeenCalledWith(403, expect.objectContaining({
321322
'Content-Type': 'application/json',
322323
}));
323324
const payload = JSON.parse(res2.end.mock.calls[0][0]);

0 commit comments

Comments
 (0)