Skip to content

forge-v2: feat(core): refactor subagent tool to unified invoke_subagent tool#2

Open
kimjune01 wants to merge 1 commit intoc-test-24489from
forge-v2-24489
Open

forge-v2: feat(core): refactor subagent tool to unified invoke_subagent tool#2
kimjune01 wants to merge 1 commit intoc-test-24489from
forge-v2-24489

Conversation

@kimjune01
Copy link
Copy Markdown
Owner

Forge-v2 refactored version of google-gemini/gemini-cli#24489

Original PR

feat(core): refactor subagent tool to unified invoke_subagent tool

Summary

Refactors specialized subagent tools into a single, unified invoke_agent tool and updates the Policy Engine to support virtual tool aliases for subagents.

Details

  • Unified Tooling: Introduced invoke_agent in packages/core as the standard mechanism for subagent delegation, replacing the previous 1:1 tool-to-agent mapping.
  • Policy Engine Enhancement: Updated PolicyEngine to automatically treat the agent_name argument of invoke_agent as a virtual tool name. This ensures that existing safety policies (e.g., denying codebase_investigator) remain functional without requiring rule updates.
  • Prompt Refactoring: Updated both modern and legacy prompt snippets to instruct the model to use invoke_agent.
  • Testing: Added unit tests in policy-engine.test.ts for virtual alias matching and updated subagents.eval.ts to verify successful delegation via the new unified tool.
  • Documentation: Updated subagents.md and policy-engine.md to reflect the new invocation pattern and policy syntax.

Related Issues

Related to the effort of simplifying the main agent's toolset and improving context efficiency.

How to Validate

  1. Unit Tests: Run npm test -w @google/gemini-cli-core to verify Policy Engine and prompt rendering logic.
  2. Evals: Run npm run test:evals -- subagents.eval.ts to ensure the model correctly uses the unified tool for delegation.
  3. Policy Verification:
    • Create a local policy denying a specific subagent (e.g., codebase_investigator).
    • Verify that calling invoke_agent with that agent_name results in a DENY decision.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run

Forge-v2 pipeline results

Metric Value
Blind-blind winner codex (Opus 4.6 vs Codex GPT-5.4)
Build + tests PASS
Complexity gate Δ=-0.0096 (PASS)
Gemini reviewer ✅ Approved ("No comments")

Refactoring claims applied

  • C1 — Centralize Agent Input Key Selection
  • C2 — Reuse Hinted Child Invocation Construction

What is this?

This diff shows the output of a forge-wrapped LLM refactoring pipeline applied to PR google-gemini#24489's code at the point where tests first passed (C_test). The question: can an autonomous pipeline improve the implementation before human review?

Pipeline: goal-anchored volley → adversarial hunt-spec → blind-blind implementation (Opus 4.6 + Codex GPT-5.4, smaller-churn wins) → hunt-code with full build+tests → Gemini 3.1 Pro reviewer-loop → complexity gate (δ=0.05).

Experiment: refactor-equivalence v2Does an LLM refactoring pass help or hurt brownfield PRs?

We'd love your take: would you approve this diff? 🙏

Blind-blind winner: codex
Complexity gate: delta=-0.0096
Reviewer: approved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant