Skip to content

Commit d5c7c9d

Browse files
committed
fix: be patient with thinking models that output reasoning as plain text
llama-server with Qwen3.5/Claude-distilled models outputs thinking as 'Let me analyze...' plain text in delta.content (no <think> tags, no separate reasoning field). The JSON-expect abort was firing at 50 chars, killing the request after 8-10 tokens before the model could output actual JSON. Changes: - Raised JSON content check threshold from 50 to 200 chars - Strip common plain-text reasoning prefixes before checking - Only abort if 200+ chars of non-JSON, non-reasoning content
1 parent d4b3550 commit d5c7c9d

File tree

1 file changed

+12
-4
lines changed

1 file changed

+12
-4
lines changed

skills/analysis/home-security-benchmark/scripts/run-benchmark.cjs

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -428,10 +428,18 @@ async function llmCall(messages, opts = {}) {
428428
controller.abort();
429429
break;
430430
}
431-
// If content is arriving, check it starts with JSON
432-
if (opts.expectJSON && isContent && content.length >= 50) {
433-
const stripped = content.replace(/<think>[\s\S]*?<\/think>\s*/gi, '').trimStart();
434-
if (stripped.length >= 50 && !/^\s*[{\[]/.test(stripped)) {
431+
// If content is arriving, check it starts with JSON.
432+
// Be patient with thinking models: llama-server sends reasoning
433+
// as plain text in delta.content (no <think> tags or separate
434+
// reasoning field). Wait for enough content before deciding.
435+
if (opts.expectJSON && isContent && content.length >= 200) {
436+
// Strip <think> blocks AND common plain-text reasoning prefixes
437+
// that thinking models (Qwen3.5, etc.) emit before JSON output
438+
let stripped = content.replace(/<think>[\s\S]*?<\/think>\s*/gi, '').trimStart();
439+
// Strip leading plain-text reasoning (models often start with
440+
// "Let me analyze...", "I need to...", followed by actual JSON)
441+
stripped = stripped.replace(/^(?:Let me|I need to|I'll|I will|First,|Okay,|Sure,|Alright,|Here's|Looking at|Analyzing)[\s\S]*?(?=\s*[{\[])/i, '').trimStart();
442+
if (stripped.length >= 200 && !/^\s*[{\[]/.test(stripped)) {
435443
log(` ⚠ Aborting: expected JSON but got: "${stripped.slice(0, 80)}…"`);
436444
controller.abort();
437445
break;

0 commit comments

Comments
 (0)