Summary
Shannon's pre-recon agent fails at the final synthesis turn on any target whose consolidated code_analysis_deliverable.md exceeds 32,000 output tokens, due to an undocumented client-side cap in @anthropic-ai/claude-agent-sdk v0.1.25 (resolved via the current ^0.1.0 caret pin in package.json). The documented workaround (CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000 in Shannon's .env) is silently ignored — the SDK validator caps the value to 32000 regardless of what's set.
Reproduction
- Install Shannon at HEAD;
claude-agent-sdk resolves to v0.1.25 via the ^0.1.0 caret pin.
- Set
CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000 in Shannon's .env.
- Launch against a medium-sized target. Reproduced on a Next.js + Express + Supabase web app (~95 MB clone, ~50 source files reviewed by sub-agents).
- Wait ~22 minutes — Phase 1 + Phase 2 sub-agents complete successfully.
- Final synthesis turn fails with:
API Error: Claude's response exceeded the 32000 output token maximum.
To configure this behavior, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable.
- Validator rejects (no
deliverables/code_analysis_deliverable.md was written).
- Workspace rollback removes the per-category deliverables that DID get saved (e.g.,
XSS_ANALYSIS.md).
- Activity fails; Temporal queues retry; same outcome on retry.
Root Cause
/node_modules/@anthropic-ai/claude-agent-sdk/sdk.mjs (in v0.1.25) contains:
var maxOutputTokensValidator = {
name: "CLAUDE_CODE_MAX_OUTPUT_TOKENS",
default: 32000,
validate: (value) => {
if (!value) {
return { effective: 32000, status: "valid" };
}
const parsed = parseInt(value, 10);
if (isNaN(parsed) || parsed <= 0) {
return { effective: 32000, status: "invalid", message: \`Invalid value "\${value}" (using default: 32000)\` };
}
if (parsed > 32000) {
return { effective: 32000, status: "capped", message: \`Capped from \${parsed} to 32000\` };
}
return { effective: parsed, status: "valid" };
}
};
The "capped" status warning is debug-only; from the operator's perspective, setting CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000 (exactly as Shannon's docs and operator runbooks currently recommend) produces no visible warning that the value was overridden. We've not yet bisected which earlier claude-agent-sdk version introduced this validator — earlier engagements on the same Shannon codebase (running pre-rebuild SDK versions) completed pre-recon without issue.
Impact
For Shannon's typical use case (whitebox pentest of medium-to-large web applications), the consolidated code_analysis_deliverable.md routinely exceeds 32k tokens. Pre-recon completes the discovery + analysis sub-agents successfully, then fails at the very last write of the consolidated report — meaning all the LLM work is paid for but the deliverable is rolled back.
We hit this on a real customer engagement and burned ~$7 USD of Anthropic credits on a single failed attempt (22 min @ ~$0.30/min agent loop) before recognizing the regression. Temporal's automatic retry would have repeated the same failure for another ~$6.
Suggested Fixes (in order of preference)
- Pin
claude-agent-sdk in Shannon's package.json to a pre-cap version (the current ^0.1.0 caret currently floats to v0.1.25 with the cap).
- Modify the
pre-recon-code prompt to chunk the consolidated synthesis into multiple save_deliverable calls (one per analysis section: architecture, entry-points, security-patterns, xss, ssrf, data-security, etc.), then write a small index file as code_analysis_deliverable.md to satisfy the validator's existence check while keeping the synthesis turn well under the 32k cap. This also makes individual section files independently consumable in the final report.
- Coordinate with Anthropic to expose an override mechanism (e.g., a beta header for extended-output mode) for SDK consumers who legitimately need >32k. The cap appears to be a safe-default rather than an API-level limit.
Local Workaround We're Using
While awaiting an upstream fix, we maintain a local overlay in our own repo: SDK pin in Shannon's package.json, modified pre-recon-code.md prompt with chunked synthesis, and a pre-launch hook that applies both. Happy to share if useful for the fix design — it's the same approach as Suggested Fix 1+2 combined.
Environment
- Shannon: HEAD (current)
@anthropic-ai/claude-agent-sdk: 0.1.25 (resolved from ^0.1.0 caret pin)
- Docker: Desktop on macOS
- Temporal: as bundled in Shannon's docker-compose
- Target model:
claude-sonnet-4-5-20250929
- Host RAM: 16 GB (well above any memory pressure thresholds)
Why It's Worth Fixing Soon
Anyone running Shannon on a real-sized target post-2026-05-10 SDK rebuild is hitting this — the regression is silent and the failure mode (Phase 1+2 succeed, final synthesis dies) makes it look like a one-off rather than a systemic block. Most users will burn a $6+ run and a couple of hours diagnosing before realizing the documented workaround doesn't work in current SDK.
Thanks for Shannon — it's been the keystone whitebox tool in our PT service. Glad to provide more detail or testing on a fix candidate if useful.
Summary
Shannon's
pre-reconagent fails at the final synthesis turn on any target whose consolidatedcode_analysis_deliverable.mdexceeds 32,000 output tokens, due to an undocumented client-side cap in@anthropic-ai/claude-agent-sdkv0.1.25 (resolved via the current^0.1.0caret pin inpackage.json). The documented workaround (CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000in Shannon's.env) is silently ignored — the SDK validator caps the value to 32000 regardless of what's set.Reproduction
claude-agent-sdkresolves to v0.1.25 via the^0.1.0caret pin.CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000in Shannon's.env.deliverables/code_analysis_deliverable.mdwas written).XSS_ANALYSIS.md).Root Cause
/node_modules/@anthropic-ai/claude-agent-sdk/sdk.mjs(in v0.1.25) contains:The
"capped"status warning is debug-only; from the operator's perspective, settingCLAUDE_CODE_MAX_OUTPUT_TOKENS=200000(exactly as Shannon's docs and operator runbooks currently recommend) produces no visible warning that the value was overridden. We've not yet bisected which earlierclaude-agent-sdkversion introduced this validator — earlier engagements on the same Shannon codebase (running pre-rebuild SDK versions) completed pre-recon without issue.Impact
For Shannon's typical use case (whitebox pentest of medium-to-large web applications), the consolidated
code_analysis_deliverable.mdroutinely exceeds 32k tokens. Pre-recon completes the discovery + analysis sub-agents successfully, then fails at the very last write of the consolidated report — meaning all the LLM work is paid for but the deliverable is rolled back.We hit this on a real customer engagement and burned ~$7 USD of Anthropic credits on a single failed attempt (22 min @ ~$0.30/min agent loop) before recognizing the regression. Temporal's automatic retry would have repeated the same failure for another ~$6.
Suggested Fixes (in order of preference)
claude-agent-sdkin Shannon'spackage.jsonto a pre-cap version (the current^0.1.0caret currently floats to v0.1.25 with the cap).pre-recon-codeprompt to chunk the consolidated synthesis into multiplesave_deliverablecalls (one per analysis section: architecture, entry-points, security-patterns, xss, ssrf, data-security, etc.), then write a small index file ascode_analysis_deliverable.mdto satisfy the validator's existence check while keeping the synthesis turn well under the 32k cap. This also makes individual section files independently consumable in the final report.Local Workaround We're Using
While awaiting an upstream fix, we maintain a local overlay in our own repo: SDK pin in Shannon's
package.json, modifiedpre-recon-code.mdprompt with chunked synthesis, and a pre-launch hook that applies both. Happy to share if useful for the fix design — it's the same approach as Suggested Fix 1+2 combined.Environment
@anthropic-ai/claude-agent-sdk: 0.1.25 (resolved from^0.1.0caret pin)claude-sonnet-4-5-20250929Why It's Worth Fixing Soon
Anyone running Shannon on a real-sized target post-2026-05-10 SDK rebuild is hitting this — the regression is silent and the failure mode (Phase 1+2 succeed, final synthesis dies) makes it look like a one-off rather than a systemic block. Most users will burn a $6+ run and a couple of hours diagnosing before realizing the documented workaround doesn't work in current SDK.
Thanks for Shannon — it's been the keystone whitebox tool in our PT service. Glad to provide more detail or testing on a fix candidate if useful.