Skip to content

Pre-recon agent fails OutputValidationError on medium+ targets — silent CLAUDE_CODE_MAX_OUTPUT_TOKENS cap in claude-agent-sdk v0.1.25 #332

@MagVeTs

Description

@MagVeTs

Summary

Shannon's pre-recon agent fails at the final synthesis turn on any target whose consolidated code_analysis_deliverable.md exceeds 32,000 output tokens, due to an undocumented client-side cap in @anthropic-ai/claude-agent-sdk v0.1.25 (resolved via the current ^0.1.0 caret pin in package.json). The documented workaround (CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000 in Shannon's .env) is silently ignored — the SDK validator caps the value to 32000 regardless of what's set.

Reproduction

  1. Install Shannon at HEAD; claude-agent-sdk resolves to v0.1.25 via the ^0.1.0 caret pin.
  2. Set CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000 in Shannon's .env.
  3. Launch against a medium-sized target. Reproduced on a Next.js + Express + Supabase web app (~95 MB clone, ~50 source files reviewed by sub-agents).
  4. Wait ~22 minutes — Phase 1 + Phase 2 sub-agents complete successfully.
  5. Final synthesis turn fails with:
    API Error: Claude's response exceeded the 32000 output token maximum.
    To configure this behavior, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable.
    
  6. Validator rejects (no deliverables/code_analysis_deliverable.md was written).
  7. Workspace rollback removes the per-category deliverables that DID get saved (e.g., XSS_ANALYSIS.md).
  8. Activity fails; Temporal queues retry; same outcome on retry.

Root Cause

/node_modules/@anthropic-ai/claude-agent-sdk/sdk.mjs (in v0.1.25) contains:

var maxOutputTokensValidator = {
  name: "CLAUDE_CODE_MAX_OUTPUT_TOKENS",
  default: 32000,
  validate: (value) => {
    if (!value) {
      return { effective: 32000, status: "valid" };
    }
    const parsed = parseInt(value, 10);
    if (isNaN(parsed) || parsed <= 0) {
      return { effective: 32000, status: "invalid", message: \`Invalid value "\${value}" (using default: 32000)\` };
    }
    if (parsed > 32000) {
      return { effective: 32000, status: "capped", message: \`Capped from \${parsed} to 32000\` };
    }
    return { effective: parsed, status: "valid" };
  }
};

The "capped" status warning is debug-only; from the operator's perspective, setting CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000 (exactly as Shannon's docs and operator runbooks currently recommend) produces no visible warning that the value was overridden. We've not yet bisected which earlier claude-agent-sdk version introduced this validator — earlier engagements on the same Shannon codebase (running pre-rebuild SDK versions) completed pre-recon without issue.

Impact

For Shannon's typical use case (whitebox pentest of medium-to-large web applications), the consolidated code_analysis_deliverable.md routinely exceeds 32k tokens. Pre-recon completes the discovery + analysis sub-agents successfully, then fails at the very last write of the consolidated report — meaning all the LLM work is paid for but the deliverable is rolled back.

We hit this on a real customer engagement and burned ~$7 USD of Anthropic credits on a single failed attempt (22 min @ ~$0.30/min agent loop) before recognizing the regression. Temporal's automatic retry would have repeated the same failure for another ~$6.

Suggested Fixes (in order of preference)

  1. Pin claude-agent-sdk in Shannon's package.json to a pre-cap version (the current ^0.1.0 caret currently floats to v0.1.25 with the cap).
  2. Modify the pre-recon-code prompt to chunk the consolidated synthesis into multiple save_deliverable calls (one per analysis section: architecture, entry-points, security-patterns, xss, ssrf, data-security, etc.), then write a small index file as code_analysis_deliverable.md to satisfy the validator's existence check while keeping the synthesis turn well under the 32k cap. This also makes individual section files independently consumable in the final report.
  3. Coordinate with Anthropic to expose an override mechanism (e.g., a beta header for extended-output mode) for SDK consumers who legitimately need >32k. The cap appears to be a safe-default rather than an API-level limit.

Local Workaround We're Using

While awaiting an upstream fix, we maintain a local overlay in our own repo: SDK pin in Shannon's package.json, modified pre-recon-code.md prompt with chunked synthesis, and a pre-launch hook that applies both. Happy to share if useful for the fix design — it's the same approach as Suggested Fix 1+2 combined.

Environment

  • Shannon: HEAD (current)
  • @anthropic-ai/claude-agent-sdk: 0.1.25 (resolved from ^0.1.0 caret pin)
  • Docker: Desktop on macOS
  • Temporal: as bundled in Shannon's docker-compose
  • Target model: claude-sonnet-4-5-20250929
  • Host RAM: 16 GB (well above any memory pressure thresholds)

Why It's Worth Fixing Soon

Anyone running Shannon on a real-sized target post-2026-05-10 SDK rebuild is hitting this — the regression is silent and the failure mode (Phase 1+2 succeed, final synthesis dies) makes it look like a one-off rather than a systemic block. Most users will burn a $6+ run and a couple of hours diagnosing before realizing the documented workaround doesn't work in current SDK.

Thanks for Shannon — it's been the keystone whitebox tool in our PT service. Glad to provide more detail or testing on a fix candidate if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions