Skip to content

fix(claude-code): make subscription mode a true pass-through#749

Merged
edenreich merged 3 commits into
mainfrom
fix/claude-code-passthrough
Jul 4, 2026
Merged

fix(claude-code): make subscription mode a true pass-through#749
edenreich merged 3 commits into
mainfrom
fix/claude-code-passthrough

Conversation

@edenreich

Copy link
Copy Markdown
Contributor

Summary

Claude Code mode (claude_code.enabled: true) is now a true pass-through: infer executes claude headless and streams its output without modifying the conversation.

  • No injections: infer's system prompt, context blocks (git/tools/skills/etc.), and system reminders are no longer added in claude_code mode — claude uses its own system prompt and native tools. infer debug agent system_prompt reports the pass-through status in this mode.
  • No double execution: claude executes its tools internally; their results are now captured from the stream (domain.ToolCallResultProvider / ChatSyncResponse.ToolResults) and used verbatim instead of re-running every tool call through infer's registry and approval gate. This was the root cause of the "denied writes" / broken-todo behavior in [FEATURE] Team/department attribution for gateway and pushed metrics inference-gateway#412 — blocked fake results were being replayed to claude.
  • TodoWrite mapping moved to the output layer: the stored/replayed conversation keeps TaskCreate/TaskUpdate verbatim; headless stdout renders one synthesized TodoWrite call with the full accumulated todo list (statuses included), so infer-action mirroring keeps working. TaskUpdate is now mapped too.
  • Configurable append prompt: new prompts.agent.system_prompt_claude_code (empty default = pure pass-through), passed via --append-system-prompt (never --system-prompt), settable with INFER_PROMPTS_AGENT_SYSTEM_PROMPT_CLAUDE_CODE.
  • Extra CLI args: new claude_code.extra_args (config), --claude-code-extra-args flag, and INFER_CLAUDE_CODE_EXTRA_ARGS env (env > flag), appended before the trailing -p.
  • Robust parsing: tool_result.content now accepts both string and content-block-array shapes (fixes unmarshal errors in logs).
  • Debug logging: the claude invocation is logged with individually quoted args; -p is always the last argument so no flag can be swallowed as its optional prompt value.

Note: the headless continuation-check <system-reminder> in cmd/agent.go is intentionally kept — it is loop control, not the reminders feature.

Verification

  • task test, task lint, task build all pass; mocks regenerated.
  • E2E via a stdin-capturing claude shim: first-request stdin contains only the user message (no system messages/reminders); replay carries TaskCreate/TaskUpdate (never TodoWrite) with claude's real tool results; stdout shows one TodoWrite call with live statuses; --append-system-prompt and extra args appear in argv only when configured.

Cross-repo impact

Fixes the behavior reported in inference-gateway/inference-gateway#412.

- No infer system prompt, context blocks, or system reminders are injected
  in claude_code mode; claude uses its own system prompt and native tools
- Stop double-executing claude's already-executed tool calls: their results
  are captured from the stream (domain.ToolCallResultProvider) and replayed
  verbatim instead of re-running them through infer's registry/approval gate
- Move the TaskCreate/TaskUpdate -> TodoWrite mapping to the output layer
  only: headless stdout shows one synthesized TodoWrite with the full
  accumulated todo list; the stored/replayed conversation stays verbatim
- Add prompts.agent.system_prompt_claude_code (empty default) passed via
  --append-system-prompt, settable with
  INFER_PROMPTS_AGENT_SYSTEM_PROMPT_CLAUDE_CODE
- Add claude_code.extra_args (config, --claude-code-extra-args flag,
  INFER_CLAUDE_CODE_EXTRA_ARGS env) appended before the trailing -p
- Accept tool_result content as string or content-block array (fixes
  unmarshal errors in logs)
- Quote claude CLI args in the debug log and keep -p as the last argument

Fixes the todo-mirroring and denied-writes behavior seen in
inference-gateway/inference-gateway#412
@edenreich edenreich merged commit 10117b4 into main Jul 4, 2026
9 checks passed
@edenreich edenreich deleted the fix/claude-code-passthrough branch July 4, 2026 23:08
inference-gateway-releaser Bot added a commit that referenced this pull request Jul 4, 2026
## [0.132.1](v0.132.0...v0.132.1) (2026-07-04)

### 🐛 Bug Fixes

* allow agent to read plans from userspace config dir ([#748](#748)) ([d1c1be7](d1c1be7)), closes [#746](#746)
* **config:** correct GLM 5.2 context window to 1M tokens ([#747](#747)) ([164ffaf](164ffaf)), closes [#745](#745)
* **claude-code:** make subscription mode a true pass-through ([#749](#749)) ([10117b4](10117b4)), closes [inference-gateway/inference-gateway#412](inference-gateway/inference-gateway#412)

### 👷 CI/CD

* **deps:** update inference workflow to version 0.15.0 ([c8e1522](c8e1522))
@inference-gateway-releaser

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 0.132.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants