forge-v2: feat(core): refactor subagent tool to unified invoke_subagent tool by kimjune01 · Pull Request #2 · kimjune01/gemini-cli-claude

kimjune01 · 2026-04-18T07:06:57Z

Forge-v2 refactored version of google-gemini/gemini-cli#24489

Original PR

feat(core): refactor subagent tool to unified invoke_subagent tool

Summary

Refactors specialized subagent tools into a single, unified invoke_agent tool and updates the Policy Engine to support virtual tool aliases for subagents.

Details

Unified Tooling: Introduced invoke_agent in packages/core as the standard mechanism for subagent delegation, replacing the previous 1:1 tool-to-agent mapping.
Policy Engine Enhancement: Updated PolicyEngine to automatically treat the agent_name argument of invoke_agent as a virtual tool name. This ensures that existing safety policies (e.g., denying codebase_investigator) remain functional without requiring rule updates.
Prompt Refactoring: Updated both modern and legacy prompt snippets to instruct the model to use invoke_agent.
Testing: Added unit tests in policy-engine.test.ts for virtual alias matching and updated subagents.eval.ts to verify successful delegation via the new unified tool.
Documentation: Updated subagents.md and policy-engine.md to reflect the new invocation pattern and policy syntax.

Related Issues

Related to the effort of simplifying the main agent's toolset and improving context efficiency.

How to Validate

Unit Tests: Run npm test -w @google/gemini-cli-core to verify Policy Engine and prompt rendering logic.
Evals: Run npm run test:evals -- subagents.eval.ts to ensure the model correctly uses the unified tool for delegation.
Policy Verification:
- Create a local policy denying a specific subagent (e.g., codebase_investigator).
- Verify that calling invoke_agent with that agent_name results in a DENY decision.

Pre-Merge Checklist

Updated relevant documentation and README (if needed)
Added/updated tests (if needed)
Noted breaking changes (if any)
Validated on required platforms/methods:
- MacOS
  - npm run

Forge-v2 pipeline results

Metric	Value
Blind-blind winner	codex (Opus 4.6 vs Codex GPT-5.4)
Build + tests	PASS
Complexity gate	Δ=-0.0096 (PASS)
Gemini reviewer	✅ Approved ("No comments")

Refactoring claims applied

C1 — Centralize Agent Input Key Selection
C2 — Reuse Hinted Child Invocation Construction

What is this?

This diff shows the output of a forge-wrapped LLM refactoring pipeline applied to PR google-gemini#24489's code at the point where tests first passed (C_test). The question: can an autonomous pipeline improve the implementation before human review?

Pipeline: goal-anchored volley → adversarial hunt-spec → blind-blind implementation (Opus 4.6 + Codex GPT-5.4, smaller-churn wins) → hunt-code with full build+tests → Gemini 3.1 Pro reviewer-loop → complexity gate (δ=0.05).

Experiment: refactor-equivalence v2 — Does an LLM refactoring pass help or hurt brownfield PRs?

We'd love your take: would you approve this diff? 🙏

Blind-blind winner: codex Complexity gate: delta=-0.0096 Reviewer: approved

forge-v2 refactor of PR google-gemini#24489

a992c88

Blind-blind winner: codex Complexity gate: delta=-0.0096 Reviewer: approved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

forge-v2: feat(core): refactor subagent tool to unified invoke_subagent tool#2

forge-v2: feat(core): refactor subagent tool to unified invoke_subagent tool#2
kimjune01 wants to merge 1 commit intoc-test-24489from
forge-v2-24489

kimjune01 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kimjune01 commented Apr 18, 2026

Forge-v2 refactored version of google-gemini/gemini-cli#24489

Original PR

Summary

Details

Related Issues

How to Validate

Pre-Merge Checklist

Forge-v2 pipeline results

Refactoring claims applied

What is this?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant