Skip to content

feat(adk): add agent teams middleware with mailbox-based multi-agent coordination#915

Open
fanlv wants to merge 73 commits into
mainfrom
feat/eager_receive_agent_team2
Open

feat(adk): add agent teams middleware with mailbox-based multi-agent coordination#915
fanlv wants to merge 73 commits into
mainfrom
feat/eager_receive_agent_team2

Conversation

@fanlv
Copy link
Copy Markdown
Contributor

@fanlv fanlv commented Mar 26, 2026

Summary

  • Add team middleware package implementing mailbox-based multi-agent coordination for ADK
  • Includes file-backed mailbox, message pump, source router, lifecycle management, teammate registry, and team runner
  • Provides tools for team creation/deletion, sending messages, and agent spawning
  • Enhances plantask middleware with task API, task reminder, and improved task management

Test plan

  • Unit tests added for all new team middleware components
  • Unit tests updated for plantask middleware changes

Comment thread adk/middlewares/plantask/backend_test.go
@fanlv fanlv force-pushed the feat/eager_receive_agent_team2 branch 7 times, most recently from 27b39a2 to c4cd61d Compare March 27, 2026 02:06
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 86.60983% with 267 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (alpha/09@e5e2b18). Learn more about missing BASE report.

Files with missing lines Patch % Lines
adk/middlewares/plantask/task_update.go 79.76% 17 Missing and 17 partials ⚠️
adk/middlewares/team/team_runner.go 75.00% 21 Missing and 8 partials ⚠️
adk/middlewares/team/message_source.go 76.92% 16 Missing and 11 partials ⚠️
adk/middlewares/team/tool_agent.go 82.35% 13 Missing and 11 partials ⚠️
adk/middlewares/plantask/task_api.go 80.21% 10 Missing and 8 partials ⚠️
adk/middlewares/team/lifecycle.go 85.93% 10 Missing and 8 partials ⚠️
adk/middlewares/team/mailbox_pump.go 83.01% 13 Missing and 5 partials ⚠️
adk/middlewares/team/team_config.go 90.11% 9 Missing and 8 partials ⚠️
adk/middlewares/team/mailbox_file.go 88.96% 7 Missing and 9 partials ⚠️
adk/middlewares/team/tool_send_message.go 90.50% 8 Missing and 7 partials ⚠️
... and 11 more
Additional details and impacted files
@@             Coverage Diff             @@
##             alpha/09     #915   +/-   ##
===========================================
  Coverage            ?   83.37%           
===========================================
  Files               ?      183           
  Lines               ?    23844           
  Branches            ?        0           
===========================================
  Hits                ?    19879           
  Misses              ?     2664           
  Partials            ?     1301           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fanlv fanlv force-pushed the feat/eager_receive_agent_team2 branch from c4cd61d to c286f1c Compare March 27, 2026 02:23
@shentongmartin shentongmartin added C-feature-request Category: feature request issue. Implementations of feature requests use `C-enhancement` instead. D-adk Domain: this is an issue related to the adk package labels Mar 27, 2026
@fanlv fanlv force-pushed the feat/eager_receive_agent_team2 branch 17 times, most recently from 9e1898d to 4eeb2ab Compare April 4, 2026 10:33
shentongmartin and others added 22 commits May 6, 2026 09:37
…r agent cancellation (#929)

* fix(adk): prevent panic when orphaned tool goroutine sends event after agent cancellation

When CancelAfterChatModel times out and escalates to CancelImmediate,
GraphInterrupt fires with timeout=0. The compose graph returns immediately,
orphaning parallel tool goroutines. When an orphaned tool completes,
eventSenderToolWrapper tries to send an event via the AsyncGenerator which
is already closed, causing 'send on closed channel' panic.

- Add isImmediateCancelled() to cancelContext for checking immediateChan
- Make chatModelAgentExecCtx.send cancel-aware: skip send when immediate cancel is active
- Use trySend as safety net for the TOCTOU race window
- Route SendEvent() through execCtx.send() instead of direct generator.Send()

Change-Id: Ic7e0194c860e2692a3cddc559911ab379024f650

* test(adk): add test for orphaned tool goroutine panic after CancelImmediate

- unit_send_after_close: directly reproduces the panic by sending to a
  closed generator with isImmediateCancelled=true
- unit_send_after_close_without_cancel_ctx: verifies trySend safety net
  prevents panic even without cancelCtx
- integration_cancel_escalation_orphans_tool: end-to-end test with slow
  tool, CancelAfterChatModel timeout escalation, and orphaned goroutine

Change-Id: Ia82fa957b102ccc2ac42094d18d4b15db2a1701c

* test(adk): improve coverage for orphaned tool goroutine fix

Add test cases for:
- nil execCtx and nil generator defensive guards
- nil cancelContext in isImmediateCancelled
- TOCTOU race window (isImmediateCancelled=false but generator closed)
- SendEvent public API with closed generator
- SendEvent without exec context

Change-Id: I197c36f34675f5376cbe5f830b15db6ca873cd1f
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
* fix: rebase error

Change-Id: If20fa78dba82a1c177c8ec47090050ea8c1354ed

* feat(adk): add failover support for ChatModel

Change-Id: Ice1b513b4b509e7b540316da9119ff3d529c9bae

* feat(adk): add failover support for ChatModel

Change-Id: Ice1b513b4b509e7b540316da9119ff3d529c9bae

* feat(adk): add failover support for ChatModel

Change-Id: Id5483447b74322f6dd495bdd3b994c001094569d

* feat(adk): make Name and Description optional in ChatModelAgentConfig

* feat(adk): add callback lifecycle management to failoverProxyModel

- Extract prepareCallbacks method to reuse callback setup logic between
  Generate and Stream methods
- Add callbacks.ReuseHandlers with proper RunInfo (model type + component)
  before each failover model invocation so handlers receive correct identity
- Add explicit OnStart/OnEnd/OnError callback invocations in Generate and
  Stream since failoverProxyModel declares IsCallbacksEnabled() = true and
  the outer layer skips automatic callback injection

Change-Id: I0150529024125251828cf6f77c8247aa464b1f84

* fix(adk): preserve partial result in failoverProxyModel.Generate on error

Return result instead of nil when target.Generate fails, so that the
outer failoverModelWrapper can pass the partial output message to
ShouldFailover for inspection.

Change-Id: I32d86151a6e133f1a58d5e988bccf42d831a646c

* refactor(adk): use EnsureRunInfo in failoverProxyModel and separate ctx for callbacks

- Replace manual RunInfo construction + ReuseHandlers with
  callbacks.EnsureRunInfo for cleaner RunInfo setup
- Use nCtx (from EnsureRunInfo) for target model invocation and
  original ctx for OnStart/OnEnd/OnError callback lifecycle

Change-Id: I1d5982d0e1ceeaf8f6648b9c40c229b6a2b07ab8

---------

Co-authored-by: shentong.martin <shentong.martin@bytedance.com>
feat: tool search definition
…945)

- Add ToolAliases to prepareExecContext when building ToolsNodeConfig
- Add UnknownToolsHandler, ExecuteSequentially, ToolArgumentsHandler,
  and ToolAliases to applyBeforeAgent when rebuilding after BeforeAgent
  handlers modify tools
- Add tests covering argument alias remapping, name alias dispatch,
  alias preservation after handler rebuild, and handler-only tool
  registration with pre-configured aliases
feat(adk): add MultiModalRead with custom FileContentPart types

- Define FileContentPartType, FileContentPart in filesystem package
  to replace direct schema.ToolOutputPart dependency, supporting
  only Image (bytes) and File (bytes) types
- Add MultiModalReader interface and MultiModalReadRequest with Pages field
- Add multiModalReadFileArgs extending readFileArgs with PDF pages param
- Convert FileContentPart to schema.ToolOutputPart with base64
  encoding in middleware layer
- Guard against nil FileContent returned from Backend.Read and
  MultiModalRead; return human-readable fallback instead of panicking
- Reuse base64 encoding buffer across multimodal parts via base64Encoder
- Add tests for image, file, unsupported type, pages passthrough,
  schema fields, custom desc, empty data error, nil result, and routing
* feat(adk): validate pages parameter in MultiModalReadFileTool

- Add validatePages function to check format (must be "N" or "N-M")
- Reject invalid formats such as "1-", "-5", non-numeric values
- Enforce end >= start and max 20 pages per request
- Return validation error as ToolResult so the model can self-correct

* test(adk): add unit tests for validatePages function

- Cover valid formats: single page, range, same start/end, max boundary
- Cover invalid formats: trailing dash, leading dash, non-numeric, zero
- Cover logic errors: end < start, range exceeds 20 pages
Replace FunctionToolResult.Result string field with Blocks
[]*FunctionToolResultBlock to uniformly represent all tool results
(text-only and multimodal) as structured content blocks.

- Add FunctionToolResultBlock type supporting text, image, audio,
  video, and file content with String() method
- Remove FunctionToolResult.Result field; text results are now
  wrapped as FunctionToolResultBlock{Text: ...}
- Update FunctionToolResultAgenticMessage to accept blocks parameter
- Convert MessageInputPart to FunctionToolResultBlock in compose layer
- Update concatFunctionToolResults to merge via Blocks append
- Add comprehensive tests for multimodal and streaming tool results
…xt, and DeepAgent for AgenticMessage support (#988)
…1004)

refactor(adk): build ToolsNodeConfig via shallow copy + field override

Replace explicit field-by-field struct literals with a shallow copy of
the source ToolsNodeConfig followed by overriding only the fields that
need per-run isolation (Tools and ToolCallMiddlewares). New fields added
to compose.ToolsNodeConfig in the future will be forwarded automatically
instead of being silently dropped.

- applyBeforeAgent: reuse the already-cloned toolsNodeConf local instead
  of rebuilding the struct
- prepareExecContext: shallow-copy a.toolsConfig.ToolsNodeConfig then
  cloneSlice the Tools/ToolCallMiddlewares that will be appended to

No behavior change: every field is assigned the same value as before.
…base

Swap handler positions in InnermostGetsOriginalOutput subtests to match
the forward-iteration semantics from #1000. The tests assumed the old
reverse-iteration order where handlers[0] was innermost.

Change-Id: Ib319b3ea687870db9f69c4c93e1ee69369ea2fe8
shentongmartin and others added 3 commits May 6, 2026 11:19
…eep-copy (#1007)

fix(serialization): ensure pointer-receiver MarshalJSON is invoked in InternalSerializer

When InternalSerializer marshals a struct value that implements
json.Marshaler via pointer receiver (e.g. *ToolInfo), rv.Interface()
produces a non-addressable copy. json.Marshal then cannot call the
pointer method and falls back to default struct encoding, which skips
unexported fields — causing ParamsOneOf data loss after deepCopyState
during interrupt/resume.

Fix: pass a pointer to json.Marshal by using rv.Addr() when addressable,
or copying into reflect.New() otherwise.
@shentongmartin shentongmartin force-pushed the feat/eager_receive_agent_team2 branch from e1df247 to f89fc49 Compare May 7, 2026 13:46
…coordination

Add a new `team` middleware package that enables multiple agents to collaborate
within a team via file-backed mailbox message passing and shared task lists.

Key components:
- Team lifecycle management (leader/teammate creation, shutdown, registry)
- File-backed mailbox system with per-agent inboxes and poll-based pump
- Source router for dispatching messages to agent TurnLoops
- Protocol layer with structured message types (DM, broadcast, shutdown, idle, plan-approval)
- Team tools: Agent (spawn teammates), SendMessage, TeamCreate, TeamDelete
- Team config store for persistent team/member metadata

Also enhance the `plantask` middleware:
- Add programmatic task API (TaskInput/CreateTask/UpdateTask/ListTasks/GetTask)
- Add task reminder system that injects periodic reminders after N assistant turns
- Refactor task CRUD tools to use shared task API internally
- Expand test coverage for task operations

Change-Id: I41e0dd3be788da2a8a5c4b211106b4a26b0aa2e8
@shentongmartin shentongmartin force-pushed the feat/eager_receive_agent_team2 branch from f89fc49 to 96e8fbb Compare May 7, 2026 13:52
Base automatically changed from alpha/09 to main May 19, 2026 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-feature-request Category: feature request issue. Implementations of feature requests use `C-enhancement` instead. D-adk Domain: this is an issue related to the adk package

Development

Successfully merging this pull request may close these issues.

8 participants