Skip to content

Commit 76f552c

Browse files
ralyodioclaude
andcommitted
fix(finance): raise AI report token budget to avoid gpt-5.5 truncation
The AI research report intermittently failed with "Could not generate the report" (ledger: "Model returned an empty or unusable report"). gpt-5.5 spends hidden reasoning tokens out of the same max_completion_tokens budget before emitting output; the report JSON is ~1500-1900 tokens, and successful runs sat at 1903/1912 against the 2000 cap. When reasoning ran long the JSON truncated, JSON.parse failed, and the route surfaced a 502. Raise MAX_COMPLETION_TOKENS 2000 -> 4000 (a ceiling, not added cost) and have the pipeline read finish_reason so a 'length' truncation throws a distinct, diagnosable error instead of a vague "empty report". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent e6458a4 commit 76f552c

2 files changed

Lines changed: 22 additions & 2 deletions

File tree

src/lib/finance/analysis/pipeline.ts

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ export interface LLMCompletion {
1717
promptTokens: number;
1818
completionTokens: number;
1919
totalTokens: number;
20+
/** OpenAI finish_reason — 'length' means the budget was exhausted (truncated). */
21+
finishReason?: string;
2022
}
2123

2224
export interface ReportLLM {
@@ -57,6 +59,7 @@ export function createOpenAIReportLLM(apiKey: string): ReportLLM {
5759
promptTokens: completion.usage?.prompt_tokens ?? 0,
5860
completionTokens: completion.usage?.completion_tokens ?? 0,
5961
totalTokens: completion.usage?.total_tokens ?? 0,
62+
finishReason: completion.choices[0]?.finish_reason,
6063
};
6164
},
6265
};
@@ -87,6 +90,14 @@ export async function generateReport({
8790

8891
const { sections, sources: parsedSources } = parseReportJson(completion.content);
8992
if (!isReportUsable(sections)) {
93+
// 'length' means the model hit max_completion_tokens before finishing the
94+
// JSON (often reasoning tokens eating the budget) — distinguish it so the
95+
// failure is diagnosable rather than a vague "empty report".
96+
if (completion.finishReason === 'length') {
97+
throw new ReportGenerationError(
98+
`Report truncated at the token budget (${MAX_COMPLETION_TOKENS}); raise MAX_COMPLETION_TOKENS`,
99+
);
100+
}
90101
throw new ReportGenerationError('Model returned an empty or unusable report');
91102
}
92103

src/lib/finance/analysis/prompt.ts

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,17 @@ import type { ReportInputs } from './types';
99

1010
export const PROMPT_VERSION = 1;
1111

12-
/** Hard token budget per report (cost control, PRD §3.3). */
13-
export const MAX_COMPLETION_TOKENS = 2000;
12+
/**
13+
* Hard token budget per report (cost control, PRD §3.3).
14+
*
15+
* NB: gpt-5.x reasoning models spend *hidden reasoning tokens* out of this same
16+
* `max_completion_tokens` budget before emitting any output. The JSON report
17+
* itself runs ~1500-1900 tokens, so a 2000 cap left almost no headroom — when
18+
* reasoning ran long the output got truncated, JSON.parse failed, and the route
19+
* surfaced a misleading "empty or unusable report". 4000 gives reasoning room
20+
* without changing per-run cost (you only pay for tokens actually produced).
21+
*/
22+
export const MAX_COMPLETION_TOKENS = 4000;
1423

1524
export const SYSTEM_PROMPT = `You are a financial research analyst writing an informational, long-form narrative thesis about a publicly traded company or ETF, in the spirit of a community "narrative" — covering what the business does, recent catalysts, a bull case, a bear case, valuation framing, and key risks.
1625

0 commit comments

Comments
 (0)