Pre-recon agent fails OutputValidationError on medium+ targets — silent CLAUDE_CODE_MAX_OUTPUT_TOKENS cap in claude-agent-sdk v0.1.25

## Summary

Shannon's `pre-recon` agent fails at the final synthesis turn on any target whose consolidated `code_analysis_deliverable.md` exceeds 32,000 output tokens, due to an undocumented client-side cap in `@anthropic-ai/claude-agent-sdk` v0.1.25 (resolved via the current `^0.1.0` caret pin in `package.json`). The documented workaround (`CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000` in Shannon's `.env`) is silently ignored — the SDK validator caps the value to 32000 regardless of what's set.

## Reproduction

1. Install Shannon at HEAD; `claude-agent-sdk` resolves to v0.1.25 via the `^0.1.0` caret pin.
2. Set `CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000` in Shannon's `.env`.
3. Launch against a medium-sized target. Reproduced on a Next.js + Express + Supabase web app (~95 MB clone, ~50 source files reviewed by sub-agents).
4. Wait ~22 minutes — Phase 1 + Phase 2 sub-agents complete successfully.
5. Final synthesis turn fails with:
   ```
   API Error: Claude's response exceeded the 32000 output token maximum.
   To configure this behavior, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable.
   ```
6. Validator rejects (no `deliverables/code_analysis_deliverable.md` was written).
7. Workspace rollback removes the per-category deliverables that DID get saved (e.g., `XSS_ANALYSIS.md`).
8. Activity fails; Temporal queues retry; same outcome on retry.

## Root Cause

`/node_modules/@anthropic-ai/claude-agent-sdk/sdk.mjs` (in v0.1.25) contains:

```javascript
var maxOutputTokensValidator = {
  name: "CLAUDE_CODE_MAX_OUTPUT_TOKENS",
  default: 32000,
  validate: (value) => {
    if (!value) {
      return { effective: 32000, status: "valid" };
    }
    const parsed = parseInt(value, 10);
    if (isNaN(parsed) || parsed <= 0) {
      return { effective: 32000, status: "invalid", message: \`Invalid value "\${value}" (using default: 32000)\` };
    }
    if (parsed > 32000) {
      return { effective: 32000, status: "capped", message: \`Capped from \${parsed} to 32000\` };
    }
    return { effective: parsed, status: "valid" };
  }
};
```

The `"capped"` status warning is debug-only; from the operator's perspective, setting `CLAUDE_CODE_MAX_OUTPUT_TOKENS=200000` (exactly as Shannon's docs and operator runbooks currently recommend) produces no visible warning that the value was overridden. We've not yet bisected which earlier `claude-agent-sdk` version introduced this validator — earlier engagements on the same Shannon codebase (running pre-rebuild SDK versions) completed pre-recon without issue.

## Impact

For Shannon's typical use case (whitebox pentest of medium-to-large web applications), the consolidated `code_analysis_deliverable.md` routinely exceeds 32k tokens. Pre-recon completes the discovery + analysis sub-agents successfully, then fails at the very last write of the consolidated report — meaning all the LLM work is paid for but the deliverable is rolled back.

We hit this on a real customer engagement and burned ~$7 USD of Anthropic credits on a single failed attempt (22 min @ ~\$0.30/min agent loop) before recognizing the regression. Temporal's automatic retry would have repeated the same failure for another ~$6.

## Suggested Fixes (in order of preference)

1. **Pin `claude-agent-sdk`** in Shannon's `package.json` to a pre-cap version (the current `^0.1.0` caret currently floats to v0.1.25 with the cap).
2. **Modify the `pre-recon-code` prompt** to chunk the consolidated synthesis into multiple `save_deliverable` calls (one per analysis section: architecture, entry-points, security-patterns, xss, ssrf, data-security, etc.), then write a small index file as `code_analysis_deliverable.md` to satisfy the validator's existence check while keeping the synthesis turn well under the 32k cap. This also makes individual section files independently consumable in the final report.
3. **Coordinate with Anthropic** to expose an override mechanism (e.g., a beta header for extended-output mode) for SDK consumers who legitimately need >32k. The cap appears to be a safe-default rather than an API-level limit.

## Local Workaround We're Using

While awaiting an upstream fix, we maintain a local overlay in our own repo: SDK pin in Shannon's `package.json`, modified `pre-recon-code.md` prompt with chunked synthesis, and a pre-launch hook that applies both. Happy to share if useful for the fix design — it's the same approach as Suggested Fix 1+2 combined.

## Environment

- Shannon: HEAD (current)
- `@anthropic-ai/claude-agent-sdk`: 0.1.25 (resolved from `^0.1.0` caret pin)
- Docker: Desktop on macOS
- Temporal: as bundled in Shannon's docker-compose
- Target model: `claude-sonnet-4-5-20250929`
- Host RAM: 16 GB (well above any memory pressure thresholds)

## Why It's Worth Fixing Soon

Anyone running Shannon on a real-sized target post-2026-05-10 SDK rebuild is hitting this — the regression is silent and the failure mode (Phase 1+2 succeed, final synthesis dies) makes it look like a one-off rather than a systemic block. Most users will burn a $6+ run and a couple of hours diagnosing before realizing the documented workaround doesn't work in current SDK.

Thanks for Shannon — it's been the keystone whitebox tool in our PT service. Glad to provide more detail or testing on a fix candidate if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-recon agent fails OutputValidationError on medium+ targets — silent CLAUDE_CODE_MAX_OUTPUT_TOKENS cap in claude-agent-sdk v0.1.25 #332

Summary

Reproduction

Root Cause

Impact

Suggested Fixes (in order of preference)

Local Workaround We're Using

Environment

Why It's Worth Fixing Soon

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Pre-recon agent fails OutputValidationError on medium+ targets — silent CLAUDE_CODE_MAX_OUTPUT_TOKENS cap in claude-agent-sdk v0.1.25 #332

Description

Summary

Reproduction

Root Cause

Impact

Suggested Fixes (in order of preference)

Local Workaround We're Using

Environment

Why It's Worth Fixing Soon

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions