Centralize plan mode policy#5246
Conversation
Add read_only_task as an ephemeral subagent that exposes only read-only research tools plus plan-mode safe foreground bash. Wire it into token economy source loading and document the plan-mode boundary. Co-authored-by: SivanCola <32437197+SivanCola@users.noreply.github.com>
Exclude connect_tool_source from read_only_task child registries so token economy source loading cannot reopen writer-capable or installer tool surfaces. Extend the economy plan-mode test to execute read_only_task and verify the child request stays within the read-only boundary. Co-authored-by: SivanCola <32437197+SivanCola@users.noreply.github.com>
Co-authored-by: SivanCola <63166760+SivanCola@users.noreply.github.com>
Plan mode previously trusted any ReadOnly()==true tool while planning, so a pseudo-read-only tool (e.g. complete_step) could slip through unless it was manually denylisted, with no test pinning the built-in roster. - Add tri-state PlanSafety + tool.PlanModeClassifier so tools self-report their plan-mode stance; complete_step reports unsafe, read_only_task / read_only_skill report safe. - Make Decide fail-closed for non-read-only tools (plugin/MCP, writers) while still trusting in-process ReadOnly() tools; enforce PlanSafe => ReadOnly. plan_mode_allowed_tools is the escape valve for read-only MCP/plugin tools. - Add reconcile_test.go: walks tool.Builtins() and fails if any built-in lacks an explicit plan-mode stance (forcing function against drift). - Add marker round-trip test pinning the model-facing Marker to the gate. - Document the boundary in SPEC/GUIDE/GUIDE.zh-CN/MIGRATING. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@codex review |
SivanCola
left a comment
There was a problem hiding this comment.
Thanks, this is directionally right and the central policy shape is useful, but I think one plan-mode boundary needs fixing before merge.
Blocking issue:
- The new policy relies on the invariant that MCP/plugin tools are untrusted and therefore report
ReadOnly()==false, so they fail closed unless explicitly declared throughplan_mode_allowed_tools(internal/planmode/policy.go, aroundDecide). That invariant is not true in the current code: MCPremoteTool.ReadOnly()returns true when the server advertisesannotations.readOnlyHintor whenSpec.ReadOnlyToolNamesmarks the raw tool read-only (internal/plugin/plugin.go). BecausePolicy.Decideallows any call withReadOnly()==true, a connected MCP server can make a tool callable in plan mode without being declared inplan_mode_allowed_tools. The same trust leak also reachesread_only_task/read_only_skill, becauseReadOnlySubagentToolRegistryadmits every parent tool whoseReadOnly()returns true.
This reopens the external-tool fail-closed boundary this PR is trying to establish. Please either make MCP/plugin calls plan-opaque unless the visible mcp__... tool is explicitly declared for plan mode, and add a regression test for an MCP tool with ReadOnly()==true being blocked by default, or change the design/docs/tests to explicitly say plan mode trusts MCP readOnlyHint/known overrides.
What I checked:
- Plan-mode policy centralization and marker/gate alignment.
read_only_task/read_only_skillregistry boundaries and economy source gating.- Security/cache impact: no per-turn tool-schema mutation found, but this PR does add stable tool schemas and changes the user-turn plan marker.
- Related PRs: #5262 has a stack conflict in
internal/skill/tools_test.go; #5256 and #5161 merge-tree cleanly with this PR.
Verified:
go test -count=1 ./internal/planmode ./internal/agent ./internal/boot ./internal/skill ./internal/config ./internal/control ./internal/pluginpassedgit diff --check origin/main-v2...refs/remotes/origin/pr/5246passedgit merge-tree --write-tree origin/main-v2 refs/remotes/origin/pr/5246passed
Once the external-tool plan-mode boundary is aligned and verification stays green, I will consider merging this into main-v2.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 905977386e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The moderate fail-closed gate trusted any ReadOnly()==true tool, but an MCP remoteTool's ReadOnly() can be true purely from a server-reported readOnlyHint — an untrusted signal. Such a tool could enter plan mode (and read-only research sub-agents) without a plan_mode_allowed_tools declaration. - Add tool.PlanModeUntrustedReadOnly; plugin remoteTool/lazyTool implement it, returning true only when ReadOnly() came from a server readOnlyHint, not from a first-party Spec.ReadOnlyToolNames override (cache format unchanged). - planmode.Call gains Untrusted; Decide no longer trusts an untrusted ReadOnly() on its fast path — such tools are gated like writers and must be declared in plan_mode_allowed_tools. Built-ins and first-party overrides stay trusted. - ReadOnlySubagentToolRegistry / FilterReadOnlyRegistry exclude untrusted read-only tools from read-only research sub-agents and the planner registry. - Tests across planmode/plugin/agent; docs corrected (the earlier "plugin/MCP contractually non-read-only" claim was wrong). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Fixed in
Regression tests:
Docs corrected in SPEC/GUIDE/GUIDE.zh-CN/MIGRATING (the earlier "contractually non-read-only" wording was wrong).
|
SivanCola
left a comment
There was a problem hiding this comment.
Thanks for the quick update. The previous MCP readOnlyHint / ReadOnlyToolNames trust-boundary issue is fixed in e3b4d2a1a: the new PlanModeUntrustedReadOnly path is wired through the agent gate, planner/read-only subagent registries, lazy MCP tools, and the docs/tests now describe the trusted first-party override exception correctly.
I am still requesting changes before merge because the remaining bot findings expose real plan-mode contract holes:
- P1: the bash safety guard still tokenizes with
strings.Fields, so quoted dangerous arguments are missed. For example,find . '-delete'is accepted by the policy because the checked token is'-delete', but the shell removes the quotes before runningfind, making it the write-capable-deleteaction. Please parse shell words correctly or conservatively reject quoted guarded flags, and add regression coverage for quotedfind/git grep/goguarded arguments. - P2: in token economy plan mode,
connect_tool_source({"source":"mcp","name":"..."})is blocked before the server can register tools, so a documentedplan_mode_allowed_tools = ["mcp__server__tool"]escape hatch cannot ever take effect for MCP tools hidden behind the connector. Please thread the configured allow-list/server name into the source gate, or otherwise allow connecting MCP servers that have explicitly allowed plan-mode tools while still preserving the per-tool gate after registration.
The P3 legacy marker stripping comment is lower severity, but it is also valid as a compatibility cleanup: old persisted plan-mode turns can leak the previous marker into display/title fallback paths after this marker text change. Please either fix it in this PR or state why it is intentionally deferred.
What I checked on the updated head:
- The original external-tool fail-closed blocker is addressed by
PlanModeUntrustedReadOnlyand the new regression tests. - Cache impact remains bounded to the intended stable tool additions and user-turn plan marker change; I did not find per-turn schema churn.
- Security review: no new dependency, hidden network path, secret handling change, or command execution expansion beyond the plan-mode bash wrapper. The quoted-argument issue above is the remaining unsafe command-execution gap.
- Related PRs: #5262 still conflicts with this PR in
internal/skill/tools_test.go; #5217 now conflicts ininternal/agent/agent.go; #5256 and #5161 merge-tree cleanly with this PR.
Verified:
go test -count=1 ./internal/planmode ./internal/agent ./internal/boot ./internal/skill ./internal/config ./internal/control ./internal/pluginpassedgit diff --check origin/main-v2...refs/remotes/origin/pr/5246passedgit merge-tree --write-tree origin/main-v2 refs/remotes/origin/pr/5246passed
Once the P1/P2 issues are addressed and CI stays green, I will consider merging this into main-v2.
Honor shell quoting when screening plan-mode bash commands, allow explicitly declared MCP servers to connect in token economy plan mode, and strip the legacy plan marker from restored display text. Co-authored-by: SivanCola <32437197+SivanCola@users.noreply.github.com>
SivanCola
left a comment
There was a problem hiding this comment.
Approved. This is a well-scoped plan-mode policy improvement, and the follow-up fixes address the remaining blockers I raised.
Why this fits the architecture:
- Plan mode still rides in the user turn and keeps the stable provider-visible prefix intact.
- The MCP path remains fail-closed by default: token economy can connect an MCP server during planning only when
plan_mode_allowed_toolsdeclares a concrete visible tool name for that server. - The bash gate now screens shell fields after quote/escape handling, so read-only planning cannot bypass the argument denylist with quoted write-capable flags.
Additional confidence:
- Legacy plan markers are stripped for restored display text without changing the current marker injection path.
- The added regression tests cover the bash quoting bypass, the MCP allowlist source gate, and the legacy marker fallback.
Verified:
go test -count=1 ./internal/planmode ./internal/agent ./internal/boot ./internal/skill ./internal/config ./internal/control ./internal/pluginpassedgit diff --checkpassed- GitHub CI is green on b1326ef
背景
Plan mode 目前的规则分散在 control、agent、boot/config 和 economy source 连接逻辑中:提示文案、实际工具拦截、bash 安全判断、
plan_mode_allowed_toolsoverride、connect_tool_source各自维护一部分语义。这会让模型看到的能力说明和实际 gate 发生漂移。本轮工作不只是重构,也包含少量语义收紧和偏移修复。因此辛苦重点核查下:这些收紧是否符合项目作者对 plan mode 的设计意图。
做了什么
internal/planmode,集中维护 plan mode 的阶段 policy。plan_mode_allowed_tools、economyconnect_tool_source都消费同一套 policy。task可用但执行层会拒绝的漂移。complete_step、写工具、安装/记忆变更、进程控制、writer-capable delegation 等归入 plan mode 阻断范围。plan_mode_allowed_tools:它现在只表示额外只读 custom/external tools,不能解锁 known blocked tools 或 unsafe bash。设计理解与需要作者核查的点
我对 plan mode 的理解是:它是“计划阶段”的粗粒度 gate,核心语义是用户批准前只能研究、询问和组织计划,不能通过工具产生写入、持久状态变更、安装能力、危险 shell、后台进程或 writer-capable 子代理。
基于这个理解,本 PR 做了两类收紧:
plan_mode_allowed_tools = ["bash"]不再绕过 bash 安全检查;安全 bash 仍可运行,但rm、git commit、npm install、管道/重定向/后台等仍被阻断。complete_step虽然在 permission 层是 read-only,但它属于批准后执行阶段的 evidence sign-off,所以 plan mode 下会被阻断。这些改动能减少 plan mode 的逃生口,并让提示、执行 gate、economy mode 行为保持一致。但如果项目作者原本希望
plan_mode_allowed_tools能作为非常强的 escape hatch,或认为complete_step在计划阶段也应可用,请指出,我会按项目意图调整。后续可选方向
如果项目希望 plan mode 下仍能使用
task/ subagent 做更大范围的研究,我认为更合理的后续方向不是重新放开当前 writer-capabletask,而是单独设计“只读 subagent”能力:子 agent 继承 plan mode 阶段约束,只暴露 read-only research 工具,禁止 unsafe bash、后台进程、安装、记忆变更和 writer-capable delegation。这样可以保留 plan mode 的只读边界,同时让复杂计划仍能借助隔离上下文做更深入的探索。这不是本 PR 的实现范围;我把它列出来,是希望维护者评估是否符合项目后续设计方向。
预期结果与收益
Cache 影响说明
Cache-impact: low - plan-mode marker 仍作为 user turn 前缀注入,不进入 system prompt;plan mode toggle 仍不改变 provider-visible tool schema。
Cache-guard:
go test -count=1 ./internal/agent -run TestPlanModeDoesNotMutateSystemOrTools;另已运行go test ./...。System-prompt-review: self-reviewed; no stable system prompt change intended. The changed boot/config/token_profile files route policy/config behavior, and the model-facing plan-mode instruction remains user-turn context.
验证
go test -count=1 ./internal/planmode ./internal/agent ./internal/bootgo test ./...git merge-tree --write-tree HEAD origin/main-v2无冲突AGENTS.md、architecture/、context/、.gitignoreFollow-up commit:只读 subagent 落地 + plan-mode gate 加固
解决的问题
原 gate 在 plan mode 下盲信工具自报的
ReadOnly()(if call.ReadOnly { allow })。但ReadOnly()的语义是“可并行 / 无主机副作用”,不等于“计划阶段安全”:complete_step正是ReadOnly()==true的伪只读工具,仅靠手动列入knownBlockedTools才被拦,且无任何测试对账 builtin 全集——新增工具时容易静默漂移。做了什么
PlanSafety+tool.PlanModeClassifier可选接口:工具自报计划阶段立场。complete_step自报 unsafe(与knownBlockedTools双保险),read_only_task/read_only_skill自报 safe。Decide改为温和 fail-closed:ReadOnly()==false的工具(plugin/MCP、写工具)默认拒绝,除非自报 plan-safe 或在plan_mode_allowed_tools声明;ReadOnly()==true的内部工具仍受信任(除非自报 unsafe)。强制不变量PlanSafe ⇒ ReadOnly。reconcile_test.go(新):遍历tool.Builtins(),任何 builtin 缺少显式 plan-mode 立场即测试失败——把“新增工具忘登记”从静默漂移变成 CI 红(forcing function)。Marker文案点名的“可用 / 不可用”工具锁到Decide,防文案与 gate 漂移。plan_mode_allowed_tools现在也是只读 MCP/plugin 工具的逃生阀:它们按接口契约ReadOnly()==false,默认 fail-closed,需在此显式声明;文档(SPEC/GUIDE/GUIDE.zh-CN/MIGRATING)已更新。需要 reviewer 重点确认的设计取舍
最初实现的是完全 fail-closed(删掉
if call.ReadOnly),但实测会让 8+ 个项目内部只读工具回归被拦(connect_tool_source/recall/lsp/history/slash_command/ session 工具)。鉴于本 PR 的 Tool 接口契约已规定 plugin/MCP 工具必为ReadOnly()==false(其副作用无法静态推断),改为温和 fail-closed:信任内部ReadOnly()==true工具,对不可信工具(必为ReadOnly()==false)fail-closed——达到等价安全且零回归。请确认这个边界是否符合项目对 plan mode 的设计意图。验证
go test ./...:plan-mode 相关全部包(planmode / tool / agent / boot / skill / config 等)全绿;go vet ./...、go build ./...通过。internal/cli存在与本改动无关的预存 flaky(TUI 渲染测试,单独跑通过、全量并行失败)。