feat(m3): smolagents + claude_agent_sdk bridge for CC subscription by suzuke · Pull Request #13 · suzuke/autocrucible

suzuke · 2026-04-26T03:17:55Z

Summary

Stacked on #12 (M3 PR 18 marketing audit). Adds provider: "claude-subscription" to the smolagents backend so it can drive Claude via CC OAuth credentials (no API key burn).

Best of both worlds: smolagents' strong ACL boundary (CheatResistancePolicy at tool forward()) + CC subscription's auth path.

ACL invariant — the load-bearing pin

claude_agent_sdk.query() is NOT a token completion API — it's a complete agent product that runs its own loop with its own tools. Naive use would re-create the §3.3 agent-loop-in-agent-loop problem and silently void smolagents' ACL.

Fix: configure SDK as a degenerate single-turn text generator:

ClaudeAgentOptions(
    allowed_tools=[],
    disallowed_tools=[Read, Edit, Write, Glob, Grep, Bash,
                      WebFetch, WebSearch, Task, TodoWrite,
                      NotebookEdit, MultiEdit, BashOutput, KillShell],
    max_turns=1,
    can_use_tool=_deny_all_tools,  # defense in depth
)

This is locked in by test_sdk_is_invoked_with_no_internal_tools — patches claude_agent_sdk.query, captures the actual options arg, asserts all 4 invariant properties. If a future SDK update adds a default tool that bypasses disallowed_tools, this test breaks.

Reviewer trail

Round	Verdict	Headline
1 (design)	NEEDS_TWEAK	Q3 critical: my original design treated `claude_agent_sdk.query()` as a completion API; reviewer corrected — it's a complete agent. Forced single-turn config + invariant test added. Plus 4 didn't-ask items: SDK version pin, cost framing, auth UX, default opt-in.
2 (impl)	VERIFIED	ACL invariant solid, defense-in-depth properly layered, suite numbers match reality. 2 non-blocking: (1) auth classification was string-match coincidence not type-based — fixed via `_classify_error_typed()` + e2e regression test; (2) docstring claimed `usage_source="oauth_estimated"` already exists but it's deferred to PR 19a — reworded as future-tense.

Stats

2 commits (55d788b + a15b606 R2 fixes)
5 files changed (~+725 LOC counting all commits)
13 new tests including the critical ACL invariant + auth-classification e2e
Full suite: 2775 passed + 1 pre-existing failure unchanged + 4 skipped. 0 regressions from PR 19.

Configuration

agent:
  type: smolagents
  smolagents:
    provider: claude-subscription   # NEW M3 PR 19 value
    model: claude-3-5-sonnet-20241022
    # api_key_env: ignored when provider="claude-subscription"

Existing provider: "anthropic" (default) and other LiteLLM providers unchanged.

Known limitations / non-blockers

usage_source="oauth_estimated" not yet plumbed onto AttemptNode (spec §4.1 enum needs the new value + orchestrator change). Deferred to PR 19a; cost field falls back to orchestrator default for now. Module docstring uses future tense.
Real-world spike NOT done — the wrapper's correctness is verifiable by the invariant test (we configure correctly), but Claude's behavior when forbidden from using tools is unverified. If real testing surfaces "Claude won't respond when it can't use tools," fix is a system-prompt nudge, not architectural change.
Anthropic ToS has not publicly endorsed claude_agent_sdk use outside their first-party CC + Claude Code Skills products. Module docstring documents the risk; users should review their CC ToS before relying on this in production.
Transitional shim — module docstring documents remove-when conditions: smolagents native subscription auth, OR Anthropic publishes a token-completion API path with OAuth.

🤖 Generated with Claude Code

…3 PR 19) Adds `provider: "claude-subscription"` to SmolagentsConfig. When set, the smolagents backend drives Claude via `claude_agent_sdk` (OAuth from `~/.claude/credentials.json`) instead of LiteLLM + API key. No API token burn. **ACL invariant — the load-bearing design pin** (reviewer round 1 Q3): `claude_agent_sdk.query()` is NOT a token completion API — it's a complete agent product that runs its own loop with its own tools. Naive use would re-create the §3.3 agent-loop-in-agent-loop problem (same as cli-subscription) and silently void smolagents' ACL. Fix: configure SDK as a degenerate single-turn text generator — - `allowed_tools=[]` - `disallowed_tools=[Read, Edit, Write, Glob, Grep, Bash, ...]` (exhaustive) - `max_turns=1` - `can_use_tool=lambda *a, **kw: {"behavior": "deny"}` (defense in depth) This forces the SDK to return a single text response with NO internal tool execution. smolagents parses the text for tool calls and dispatches via its OWN tools — where `CheatResistancePolicy` ACL fires. The invariant is locked in by `test_sdk_is_invoked_with_no_internal_tools`. If a future SDK update adds a default tool that bypasses `disallowed_tools`, that test must trip — re-verify against `claude_agent_sdk.__version__` before merge. **Reviewer round 1 Q1-Q7 + 4 didn't-ask items**: - Q1 flat module placement (`smolagents_claude_sdk_model.py`) ✓ - Q2 provider name `claude-subscription` (discoverable) ✓ - Q3 ACL invariant locked + invariant test (this commit's headline) - Q4 marketing wording: medium framing + explicit ACL invariant clause - Q5 no separate compliance gate (smolagents tool surface is the gate) - Q6 asyncio.run() with loud failure inside running loop - Q7 transitional shim — module docstring marks remove-when condition - Cost framing: comment in module docstring noting `total_cost_usd` is API-equivalent estimate, not actual subscription bill - Auth-failure UX: `ClaudeAgentSDKAuthError` with "run claude login" hint - Default opt-in: `provider="anthropic"` default unchanged; users must explicitly set `claude-subscription` - SDK version pin: `claude-agent-sdk` is already a base dep **Tests** (12 new in `test_smolagents_claude_sdk_model.py`): - ACL invariant (THE critical test) - can_use_tool deny callback - _SDK_DISALLOWED_TOOLS exhaustive set - Message format conversion (dict + list-content) - Stream draining - generate() returns ChatMessage with text - Auth failure → ClaudeAgentSDKAuthError - OSError variants → AuthError - asyncio.run() loud failure in running loop - SmolagentsBackend dispatches to ClaudeAgentSDKModel for "claude-subscription" - SmolagentsBackend keeps LiteLLMModel for other providers (back-compat) Stats: 4 files changed, +459 / -16 LOC. 12 new tests. Full suite 2774 passed + 1 pre-existing failure + 4 skipped, 0 regressions. **ToS notice**: Anthropic has not publicly endorsed `claude_agent_sdk` use outside their first-party CC + Claude Code Skills products. Module docstring + future user doc must say so. Operators should review their CC ToS before relying on this in production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reviewer round 2 was VERIFIED with 2 non-blocking issues. Both folded in. R2 #1 (suggested before merge) — typed auth classification: `ClaudeAgentSDKAuthError` was mapping to `AgentErrorType.AUTH` only because the error message happened to contain "api key" (in the "switch to provider: anthropic with an explicit API key" sentence). String- match coincidence; rewording the message would silently break the classification. New `_classify_error_typed(exc)` does isinstance check on `ClaudeAgentSDKAuthError` BEFORE falling through to the string-match path. Generic exceptions still go through `_classify_error` for backward compat. End-to-end regression test added: `test_auth_error_classification_end_to_end` constructs a real SmolagentsBackend, simulates SDK raising ClaudeAgentSDKAuthError via `agent.run`, asserts the resulting AgentResult.error_type == AgentErrorType.AUTH. Catches future docstring/message rewording that would silently demote to UNKNOWN. R2 #2 (docs-only) — future tense for `usage_source` plumbing: Module docstring previously claimed AttemptNode "records this as `usage_source=\"oauth_estimated\"`" — but the new enum value isn't yet in spec §4.1's `Literal[...]` and orchestrator doesn't read `backend_metadata` for this field. Reworded as future-tense: "WILL record once PR 19a lands the orchestrator plumbing; today, the field is unset and falls back to default." Stats: 2 files, +52/-7 LOC. 13 tests in test_smolagents_claude_sdk_model.py (was 12). Full suite: 2775 passed + 1 pre-existing failure unchanged + 4 skipped. 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

suzuke and others added 2 commits April 26, 2026 11:09

suzuke mentioned this pull request Apr 26, 2026

feat(m3): usage_source=oauth_estimated cost plumbing #14

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(m3): smolagents + claude_agent_sdk bridge for CC subscription#13

feat(m3): smolagents + claude_agent_sdk bridge for CC subscription#13
suzuke wants to merge 2 commits into
feat/m3-marketing-auditfrom
feat/m3-smolagents-cc-bridge

suzuke commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

suzuke commented Apr 26, 2026

Summary

ACL invariant — the load-bearing pin

Reviewer trail

Stats

Configuration

Known limitations / non-blockers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant