fix(claude-code): make subscription mode a true pass-through#749
Merged
Conversation
- No infer system prompt, context blocks, or system reminders are injected in claude_code mode; claude uses its own system prompt and native tools - Stop double-executing claude's already-executed tool calls: their results are captured from the stream (domain.ToolCallResultProvider) and replayed verbatim instead of re-running them through infer's registry/approval gate - Move the TaskCreate/TaskUpdate -> TodoWrite mapping to the output layer only: headless stdout shows one synthesized TodoWrite with the full accumulated todo list; the stored/replayed conversation stays verbatim - Add prompts.agent.system_prompt_claude_code (empty default) passed via --append-system-prompt, settable with INFER_PROMPTS_AGENT_SYSTEM_PROMPT_CLAUDE_CODE - Add claude_code.extra_args (config, --claude-code-extra-args flag, INFER_CLAUDE_CODE_EXTRA_ARGS env) appended before the trailing -p - Accept tool_result content as string or content-block array (fixes unmarshal errors in logs) - Quote claude CLI args in the debug log and keep -p as the last argument Fixes the todo-mirroring and denied-writes behavior seen in inference-gateway/inference-gateway#412
3 tasks
…pt_claude_code and extra_args
4 tasks
4 tasks
inference-gateway-releaser Bot
added a commit
that referenced
this pull request
Jul 4, 2026
## [0.132.1](v0.132.0...v0.132.1) (2026-07-04) ### 🐛 Bug Fixes * allow agent to read plans from userspace config dir ([#748](#748)) ([d1c1be7](d1c1be7)), closes [#746](#746) * **config:** correct GLM 5.2 context window to 1M tokens ([#747](#747)) ([164ffaf](164ffaf)), closes [#745](#745) * **claude-code:** make subscription mode a true pass-through ([#749](#749)) ([10117b4](10117b4)), closes [inference-gateway/inference-gateway#412](inference-gateway/inference-gateway#412) ### 👷 CI/CD * **deps:** update inference workflow to version 0.15.0 ([c8e1522](c8e1522))
Contributor
|
🎉 This PR is included in version 0.132.1 🎉 The release is available on: Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Claude Code mode (
claude_code.enabled: true) is now a true pass-through: infer executesclaudeheadless and streams its output without modifying the conversation.infer debug agent system_promptreports the pass-through status in this mode.domain.ToolCallResultProvider/ChatSyncResponse.ToolResults) and used verbatim instead of re-running every tool call through infer's registry and approval gate. This was the root cause of the "denied writes" / broken-todo behavior in [FEATURE] Team/department attribution for gateway and pushed metrics inference-gateway#412 — blocked fake results were being replayed to claude.TaskCreate/TaskUpdateverbatim; headless stdout renders one synthesizedTodoWritecall with the full accumulated todo list (statuses included), so infer-action mirroring keeps working.TaskUpdateis now mapped too.prompts.agent.system_prompt_claude_code(empty default = pure pass-through), passed via--append-system-prompt(never--system-prompt), settable withINFER_PROMPTS_AGENT_SYSTEM_PROMPT_CLAUDE_CODE.claude_code.extra_args(config),--claude-code-extra-argsflag, andINFER_CLAUDE_CODE_EXTRA_ARGSenv (env > flag), appended before the trailing-p.tool_result.contentnow accepts both string and content-block-array shapes (fixes unmarshal errors in logs).-pis always the last argument so no flag can be swallowed as its optional prompt value.Note: the headless continuation-check
<system-reminder>incmd/agent.gois intentionally kept — it is loop control, not the reminders feature.Verification
task test,task lint,task buildall pass; mocks regenerated.TaskCreate/TaskUpdate(neverTodoWrite) with claude's real tool results; stdout shows one TodoWrite call with live statuses;--append-system-promptand extra args appear in argv only when configured.Cross-repo impact
[DOCS]ticket (claude_code mode is now documented pass-through; new config keyssystem_prompt_claude_code,extra_args).Fixes the behavior reported in inference-gateway/inference-gateway#412.