Python: Enforce approval_mode in Claude and GitHub Copilot agents#5562
Open
eavanvalkenburg wants to merge 2 commits intomicrosoft:mainfrom
Open
Python: Enforce approval_mode in Claude and GitHub Copilot agents#5562eavanvalkenburg wants to merge 2 commits intomicrosoft:mainfrom
eavanvalkenburg wants to merge 2 commits intomicrosoft:mainfrom
Conversation
Tools declared with approval_mode="always_require" were bypassed by the ClaudeAgent and GitHubCopilotAgent because their SDK-managed tool-calling loops invoke FunctionTool.invoke() directly via package-supplied handlers, skipping the standard _try_execute_function_calls approval gate. Per discussion on microsoft#5494, the fix lives in the agents (not in FunctionTool): any flag added to the tool itself can be spoofed by code with the same level of access, so the security boundary is the agent that owns the tool-calling loop. - Add on_function_approval option to ClaudeAgentOptions and GitHubCopilotOptions. Callback receives a FunctionCallContent describing the pending call and returns bool (sync or async). - Gate FunctionTool.invoke() inside each agent's existing tool-handler closure when approval_mode == "always_require". Default policy is deny; callbacks that raise also deny safely. - Deny path returns a tool-error to the model (Claude: text content; Copilot: ToolResult(result_type="failure", error="approval_denied")) so the LLM can react gracefully instead of silently failing. - Tests for both agents covering: deny by default, sync False, sync True, async True, callback-raises -> deny, no-op for never_require tools. - Samples demonstrating sync, async, and deny-by-default flows for both agents. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||||||||||||
Contributor
There was a problem hiding this comment.
Pull request overview
This PR closes a security gap in SDK-managed tool-calling loops by enforcing approval_mode="always_require" at the agent boundary for the Python ClaudeAgent and GitHubCopilotAgent, introducing an explicit approval callback and default-deny behavior when approval is required.
Changes:
- Add
on_function_approvalcallback option toClaudeAgentOptionsandGitHubCopilotOptions, and gateFunctionTool.invoke()behind it forapproval_mode="always_require". - Add unit tests covering deny-by-default, sync/async approval, callback failure, and no-op behavior for non-approval tools.
- Add provider samples demonstrating sync/async approval callbacks and default-deny behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| python/packages/claude/agent_framework_claude/_agent.py | Adds on_function_approval option + approval gating in the Claude SDK MCP tool handler. |
| python/packages/claude/tests/test_claude_agent.py | Adds tests validating Claude-side approval enforcement behavior. |
| python/packages/github_copilot/agent_framework_github_copilot/_agent.py | Adds on_function_approval option + approval gating in the Copilot SDK tool handler. |
| python/packages/github_copilot/tests/test_github_copilot_agent.py | Adds tests validating Copilot-side approval enforcement behavior. |
| python/samples/02-agents/providers/anthropic/anthropic_claude_with_function_approval.py | New Claude sample showing function-approval callback usage. |
| python/samples/02-agents/providers/github_copilot/github_copilot_with_function_approval.py | New Copilot sample showing function-approval callback usage. |
…override
- _resolve_function_approval no longer collapses {} into None when building
the FunctionCallContent passed to the callback (Claude + Copilot).
- Claude _apply_runtime_options and Copilot _run_impl/_stream_updates now
raise ValueError if on_function_approval is supplied via per-run options,
instead of silently ignoring it. Approval policy must be set at agent
construction time.
- Drop unnecessary # type: ignore[attr-defined] on Content.name/.arguments
in samples (Content is a unified class with both attributes defined).
- Add regression tests for the new runtime-options validation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Tools declared with
approval_mode="always_require"are silently executed byClaudeAgentandGitHubCopilotAgentbecause their SDK-managed tool-calling loops invokeFunctionTool.invoke()directly inside per-package handlers, bypassing the_try_execute_function_callsapproval gate that protects the standard chat-client pipeline.PR #5494 proposed a fix inside
FunctionTool.invoke()itself (an_approved=Trueparameter), but as discussed on that PR any flag on the tool can be spoofed by any code with the same level of access. The security boundary must be the agent that owns the tool-calling loop, not the tool.Description
Enforce
approval_mode="always_require"at the agent boundary for the two affected packages:on_function_approvalonClaudeAgentOptionsandGitHubCopilotOptions. The callback receives aFunctionCallContentdescribing the pending call (name + arguments) and returnsbool(sync or async).func_tool.approval_mode == "always_require"before callingFunctionTool.invoke(). Approval flow:logger.exception) so user code crashes don't accidentally allow execution.ToolResult(result_type="failure", error="approval_denied"). This lets the LLM react gracefully (apologise, try a different path) instead of silently failing.FunctionTooland core APIs are untouched, so the rejected_approved/ToolApprovalRequiredExceptionsurface is not introduced.Coverage
packages/claude/tests/test_claude_agent.py— 6 tests inTestClaudeAgentFunctionApproval.packages/github_copilot/tests/test_github_copilot_agent.py— 6 tests inTestGitHubCopilotAgentFunctionApproval.Each suite exercises: deny-by-default, sync
False, syncTrue, asyncTrue, callback-raises → deny, and no-op fornever_requiretools (with payload-shape assertions on theFunctionCallContentpassed to the callback).Samples
samples/02-agents/providers/anthropic/anthropic_claude_with_function_approval.pysamples/02-agents/providers/github_copilot/github_copilot_with_function_approval.pyBoth demonstrate sync callback, async callback, and the deny-by-default fallback.
Audit of other packages
grepforFunctionTool.invoke(...)call sites acrosspackages/: onlycore/_agents.py(Agent-as-tool, uses the standard pipeline transitively) andhyperlight/_execute_code_tool.py(internal sandbox tool, not a user-supplied always-require target) remain. CopilotStudio doesn't acceptFunctionTool. No other production fixes required.Contribution Checklist