Skip to content

feat(middlewares): add permission middleware for tool call gating#966

Open
shentongmartin wants to merge 59 commits into
alpha/09from
feat/permission
Open

feat(middlewares): add permission middleware for tool call gating#966
shentongmartin wants to merge 59 commits into
alpha/09from
feat/permission

Conversation

@shentongmartin
Copy link
Copy Markdown
Contributor

Summary

Problem Solution
No built-in way to gate tool execution behind permission checks in the ADK middleware layer Implement a permission.Middleware that intercepts tool calls with a user-defined BeforeToolCall evaluator, supporting Allow/Deny/Ask decisions with interrupt/resume for interactive approval

Key Insight: Framework Skeleton vs. Strategy Pattern

Inspired by Claude Code's permission system (7 modes, 8 rule layers, 5 parallel racers), but deliberately scoped as a framework skeleton rather than an opinionated strategy engine. The middleware provides the interrupt/resume plumbing; users supply the evaluation logic. This matches Eino's position as a general-purpose framework that must serve diverse product forms.

The critical design choice is the three-outcome model:

  • Allow: Execute the tool (optionally with modified input via UpdatedInput)
  • Deny: Return a formatted deny message as the tool result—crucially, this does NOT interrupt the agent loop. The LLM sees "Permission denied" and self-adapts, matching CC's battle-tested pattern.
  • Ask: Trigger StatefulInterrupt to pause the agent loop, checkpoint state, and await external approval via ResumeWithParams.

Design Decisions

Why fail-closed for unknown decisions?

Unknown Decision values (e.g., a typo like "alow") produce a deny result rather than an error. This prevents fail-open security vulnerabilities—a permission system that accidentally allows unknown states is worse than one that blocks them.

Why error only for infrastructure failures?

BeforeToolCall returning error is reserved for infrastructure failures (rule store unreachable, network timeout). This aborts the agent loop. Permission denials use Decision: Deny instead, which keeps the agent loop running. This separation prevents conflating policy decisions with system failures.

Why re-interrupt on non-target resume?

When the runtime resumes a different tool but this tool was previously interrupted, it must re-interrupt to preserve its pending state. This supports parallel tool call scenarios where only one tool is approved at a time.

What Changed

File Change
adk/middlewares/permission/permission.go New package: Middleware struct, permissionGate logic, WrapInvokableToolCall/WrapStreamableToolCall, all supporting types (Decision, ToolCallDecision, AskInfo, AskState, ResumeResponse)
adk/middlewares/permission/permission_test.go 22 unit tests covering Allow/Deny/Ask/Unknown/Nil/Error paths for both invokable and streamable tool calls

How to Verify

go test -race -count=1 ./adk/middlewares/permission/...
golangci-lint run ./adk/middlewares/permission/...

摘要

问题 解决方案
ADK 中间件层缺少内置的工具调用权限控制机制 实现 permission.Middleware,通过用户定义的 BeforeToolCall 评估函数拦截工具调用,支持 Allow/Deny/Ask 三种决策,Ask 决策通过中断/恢复机制实现交互式审批

核心洞察:框架骨架 vs. 策略引擎

受 Claude Code 权限系统(7 种模式、8 层规则、5 路并行竞争)启发,但刻意定位为框架骨架而非固化的策略引擎。中间件提供中断/恢复的管道设施,用户自行实现评估逻辑。这符合 Eino 作为通用框架需服务多样产品形态的定位。

三态结果模型的关键设计:

  • Allow:执行工具(可选通过 UpdatedInput 修改输入)
  • Deny:返回格式化的拒绝消息作为工具结果——不中断 agent 循环。LLM 看到 "Permission denied" 后自行适应,沿用 CC 经过实战验证的模式。
  • Ask:触发 StatefulInterrupt 暂停 agent 循环、持久化状态、等待外部通过 ResumeWithParams 回注审批结果。

设计决策

为何对未知决策采用 fail-closed?

未知的 Decision 值产生 deny 结果而非 error,防止 fail-open 安全漏洞。

为何 error 仅用于基础设施故障?

BeforeToolCall 返回 error 仅表示基础设施故障(如规则存储不可达),会中断 agent 循环。权限拒绝使用 Decision: Deny,保持 agent 循环正常运行,避免将策略决策与系统故障混淆。

为何非目标恢复时重新中断?

当运行时恢复其他工具而当前工具之前已中断时,必须重新中断以保留挂起状态,支持并行工具调用场景下的逐个审批。

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 95.00000% with 5 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (alpha/09@d7161f7). Learn more about missing BASE report.

Files with missing lines Patch % Lines
adk/middlewares/permission/permission.go 95.00% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             alpha/09     #966   +/-   ##
===========================================
  Coverage            ?   82.23%           
===========================================
  Files               ?      163           
  Lines               ?    20405           
  Branches            ?        0           
===========================================
  Hits                ?    16781           
  Misses              ?     2447           
  Partials            ?     1177           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shentongmartin shentongmartin changed the base branch from main to alpha/09 April 15, 2026 01:25
Comment thread adk/middlewares/permission/permission.go Outdated
Comment thread adk/middlewares/permission/permission.go
Comment thread adk/middlewares/permission/permission.go
Comment thread adk/middlewares/permission/permission.go Outdated
Comment thread adk/middlewares/permission/permission.go Outdated
Comment thread adk/middlewares/permission/permission.go Outdated
Comment thread adk/middlewares/permission/permission_test.go
mrh997 and others added 24 commits April 15, 2026 14:14
feat(agentic_model):
- format print
- support agentic chat template
- support to compose agentic odel&agentic tools node
- support agentic tool node
- support agentic message concat
meguminnnnnnnnn and others added 18 commits April 15, 2026 14:14
Change-Id: If20fa78dba82a1c177c8ec47090050ea8c1354ed
* fix(adk): skip saving checkpoint when TurnLoop is idle

When Stop() is called on an idle TurnLoop (no active agent run, no
unhandled items, no canceled items), the resulting checkpoint contains
no meaningful state. Skip saving such checkpoints to avoid unnecessary
store writes.

- Add isIdle check in cleanup() before checkpoint save decision
- Add TestTurnLoop_StopWhileIdle_SkipsCheckpoint test

Change-Id: I6aeaff5ed5833a971cb95298193fdb96d904baf8

* fix(internal): merge id2State in PopulateInterruptState instead of replacing

PopulateInterruptState merged id2Addr entries one by one but replaced
id2State wholesale. In a parallel workflow resume, two goroutines share
the same globalResumeInfo. If one goroutine's compose graph called
PopulateInterruptState (replacing id2State with compose-only entries)
before the other goroutine looked up its outer-level entry, the lookup
returned a zero-value InterruptState with State=nil, triggering the
'has no state' panic in ChatModelAgent.Resume.

Change id2State handling to merge entry by entry, consistent with
id2Addr.

Change-Id: Ia21f65289bff7beb2bc383fb033926ad9c92d7e7

* fix(adk): keep watching for cancel escalation after stopSig.done

When watchStopSignal entered the stopSig.done branch, it processed the
initial cancel and then blocked on <-done (turn completion), never
looping back to check notify. This meant a subsequent Stop() call with
a higher cancel mode (e.g. CancelImmediate) was never forwarded to the
agent, causing TestTurnLoop_Stop_EscalatesCancelMode to time out.

Replace the blocking <-done with an inner loop that selects on both
done and notify, so escalation signals are always delivered. Also apply
the generation-based dedup check consistent with the notify branch.

Change-Id: Ia6a04d00a2b44625ffbcb625ff0e559c12ed145f
…r agent cancellation (#929)

* fix(adk): prevent panic when orphaned tool goroutine sends event after agent cancellation

When CancelAfterChatModel times out and escalates to CancelImmediate,
GraphInterrupt fires with timeout=0. The compose graph returns immediately,
orphaning parallel tool goroutines. When an orphaned tool completes,
eventSenderToolWrapper tries to send an event via the AsyncGenerator which
is already closed, causing 'send on closed channel' panic.

- Add isImmediateCancelled() to cancelContext for checking immediateChan
- Make chatModelAgentExecCtx.send cancel-aware: skip send when immediate cancel is active
- Use trySend as safety net for the TOCTOU race window
- Route SendEvent() through execCtx.send() instead of direct generator.Send()

Change-Id: Ic7e0194c860e2692a3cddc559911ab379024f650

* test(adk): add test for orphaned tool goroutine panic after CancelImmediate

- unit_send_after_close: directly reproduces the panic by sending to a
  closed generator with isImmediateCancelled=true
- unit_send_after_close_without_cancel_ctx: verifies trySend safety net
  prevents panic even without cancelCtx
- integration_cancel_escalation_orphans_tool: end-to-end test with slow
  tool, CancelAfterChatModel timeout escalation, and orphaned goroutine

Change-Id: Ia82fa957b102ccc2ac42094d18d4b15db2a1701c

* test(adk): improve coverage for orphaned tool goroutine fix

Add test cases for:
- nil execCtx and nil generator defensive guards
- nil cancelContext in isImmediateCancelled
- TOCTOU race window (isImmediateCancelled=false but generator closed)
- SendEvent public API with closed generator
- SendEvent without exec context

Change-Id: I197c36f34675f5376cbe5f830b15db6ca873cd1f
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
* fix: rebase error

Change-Id: If20fa78dba82a1c177c8ec47090050ea8c1354ed

* feat(adk): add failover support for ChatModel

Change-Id: Ice1b513b4b509e7b540316da9119ff3d529c9bae

* feat(adk): add failover support for ChatModel

Change-Id: Ice1b513b4b509e7b540316da9119ff3d529c9bae

* feat(adk): add failover support for ChatModel

Change-Id: Id5483447b74322f6dd495bdd3b994c001094569d

* feat(adk): make Name and Description optional in ChatModelAgentConfig

* feat(adk): add callback lifecycle management to failoverProxyModel

- Extract prepareCallbacks method to reuse callback setup logic between
  Generate and Stream methods
- Add callbacks.ReuseHandlers with proper RunInfo (model type + component)
  before each failover model invocation so handlers receive correct identity
- Add explicit OnStart/OnEnd/OnError callback invocations in Generate and
  Stream since failoverProxyModel declares IsCallbacksEnabled() = true and
  the outer layer skips automatic callback injection

Change-Id: I0150529024125251828cf6f77c8247aa464b1f84

* fix(adk): preserve partial result in failoverProxyModel.Generate on error

Return result instead of nil when target.Generate fails, so that the
outer failoverModelWrapper can pass the partial output message to
ShouldFailover for inspection.

Change-Id: I32d86151a6e133f1a58d5e988bccf42d831a646c

* refactor(adk): use EnsureRunInfo in failoverProxyModel and separate ctx for callbacks

- Replace manual RunInfo construction + ReuseHandlers with
  callbacks.EnsureRunInfo for cleaner RunInfo setup
- Use nCtx (from EnsureRunInfo) for target model invocation and
  original ctx for OnStart/OnEnd/OnError callback lifecycle

Change-Id: I1d5982d0e1ceeaf8f6648b9c40c229b6a2b07ab8

---------

Co-authored-by: shentong.martin <shentong.martin@bytedance.com>
feat: tool search definition
…945)

- Add ToolAliases to prepareExecContext when building ToolsNodeConfig
- Add UnknownToolsHandler, ExecuteSequentially, ToolArgumentsHandler,
  and ToolAliases to applyBeforeAgent when rebuilding after BeforeAgent
  handlers modify tools
- Add tests covering argument alias remapping, name alias dispatch,
  alias preservation after handler rebuild, and handler-only tool
  registration with pre-configured aliases
Change-Id: I04e50d2736406cd2bed030715ec02959cb33dc68
Change-Id: I8b733829302674ff05b1d775f264a5f6f0f9657a
…leware

- Rename BeforeToolCall type to Checker; pass *adk.ToolContext instead of bare toolName (Thread #1)
- Rename NewMiddleware to New to match codebase convention (Thread #4)
- Improve error messages with tool name, call ID, and args for LLM consumption (Thread #5)
- Replace schema.Pipe with StreamReaderFromArray for stream deny path (Thread #6)
- Add 4 E2E ask→resume tests: approved, denied, re-interrupt non-target, resume with updated input (Thread #7)

Change-Id: If43ac495fa3a5fc1d71a27db9614cb996d4547b5
…olArgument

Forward-compatibility: when ToolArgument gains additional fields (e.g.
multimodal content), existing Checker implementations will receive them
without signature changes. The middleware constructs the ToolArgument
from the raw argumentsInJSON before invoking the checker, and extracts
.Text back for the endpoint. UpdatedInput stays string since the
endpoint signature remains string-based.

Change-Id: I19b8c808415e01969330466c034becbd74e25994
…sages

Cover EnhancedInvokableToolCall and EnhancedStreamableToolCall endpoints
in the permission gate to prevent bypass. Add denyToolResult helper for
structured ToolResult deny responses. Use internal.SelectPrompt for
English/Chinese deny message templates.

Change-Id: I7c95ee55526bb2109566e5cd6a60cb5242baaf8b
@shentongmartin shentongmartin added C-enhancement Category: This is a PR that adds a new feature or fixes a bug. D-adk Domain: this is an issue related to the adk package labels Apr 20, 2026
@shentongmartin shentongmartin force-pushed the alpha/09 branch 2 times, most recently from d4a9e3b to 531c324 Compare April 28, 2026 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-enhancement Category: This is a PR that adds a new feature or fixes a bug. D-adk Domain: this is an issue related to the adk package

Development

Successfully merging this pull request may close these issues.

7 participants