Skip to content

Commit 0251d63

Browse files
authored
Merge pull request #1 from simonraj79/v2-migration
2 parents d1532b4 + 8f2da90 commit 0251d63

620 files changed

Lines changed: 67097 additions & 8533 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
name: adk-architecture
3+
description: ADK architectural knowledge — graph orchestration, resumption, execution flow, node contracts, observability, and LLM context orchestration. Use this skill whenever you need to understand the architecture, event flow, or state management of the ADK system, or when designing or modifying core components. Triggers on "how does X work", "design of", "architecture of", "event flow", "resumption state", "checkpoint", "BaseNode", "NodeRunner".
4+
---
5+
6+
# ADK Architecture Guide
7+
8+
## Core Interfaces (references/interfaces/)
9+
- [BaseNode](references/interfaces/base-node.md) — node contract, output/streaming, state/routing, HITL, configuration
10+
- [Workflow](references/interfaces/workflow.md) — graph orchestration, dynamic nodes (tracking/dedup/resume), transitive dynamic nodes, interrupt propagation, design rules for node authors
11+
- [Runner](references/interfaces/runner.md) — The public interface for executing workflows and agents. Documents entrance methods `run` and `run_async`.
12+
- [Agent](references/interfaces/agent.md) — Blueprint defining identity, instructions, and tools. Documents that `run` is the preferred entrance method.
13+
- [BaseAgent](references/interfaces/base-agent.md) — Base class for all agents. Defines the contract for subclassing with `_run_impl` as the primary override point.
14+
- [Event](references/interfaces/event.md) — Core data structure for state reconstruction and communication. Represents a conversation turn or action.
15+
16+
## Key Principles (references/principles/)
17+
- [API Principles](references/principles/api-principles.md) — stability, backward compatibility, and self-containment. Use when making design choices that affect the public API surface.
18+
19+
## Runtime Knowledge (references/architecture/)
20+
- [Context](references/architecture/context.md) — 1:1 node-context mapping, InvocationContext singleton, property reference
21+
- [NodeRunner](references/architecture/node-runner.md) — two communication channels, execution flow, output delegation. Internal runtime details.
22+
- [Runner Roles](references/architecture/runner-roles.md) — Runner vs NodeRunner vs Workflow separation. Explains why they are separate to avoid deadlocks.
23+
- [Checkpoint and Resume](references/architecture/checkpoint-resume.md) — HITL lifecycle, `rerun_on_resume`, `run_id`
24+
- [Observability](references/architecture/observability.md) — span-on-Context design, NodeRunner integration, correlated logs, metrics
25+
- [LLM Context Orchestration](references/architecture/llm-context-orchestration.md) — relationship between events and LLM context, task delegation translation, branch isolation. Use when modifying event processing, context preparation for LLMs, or debugging context pollution issues.
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Checkpoint and Resume Lifecycle
2+
3+
HITL (Human-in-the-Loop) follows this pattern:
4+
5+
1. **Interrupt**: Node yields an event with `long_running_tool_ids`.
6+
Each ancestor propagates the interrupt upward via `ctx.interrupt_ids`.
7+
2. **Persist**: Only the leaf node's interrupt event is persisted to
8+
session. Workflow sets `ctx._interrupt_ids` directly (no internal
9+
event needed).
10+
3. **Resume**: User sends a `FunctionResponse` message. The Runner
11+
scans session events to find the matching `invocation_id`, then
12+
reconstructs node state from persisted events.
13+
4. **Continue**: The interrupted node receives the FR and continues
14+
execution. Downstream nodes receive the resumed node's output.
15+
16+
## run_id on resume
17+
18+
Resumed nodes reuse the same `run_id` from the original
19+
execution. From the node's perspective, the execution never paused
20+
— events before and after the resume share the same run_id.
21+
22+
Fresh dispatches (first run, loop re-trigger) get a new run_id.
23+
24+
## Resume behavior by `rerun_on_resume`
25+
26+
A node with multiple interrupt IDs may receive partial FRs (only
27+
some resolved). The behavior depends on `rerun_on_resume`:
28+
29+
**`rerun_on_resume=True`** (Workflow, orchestration nodes):
30+
31+
| FRs received | Status | Behavior |
32+
|---|---|---|
33+
| Partial | PENDING | Re-execute immediately with partial `resume_inputs`. Node handles remaining interrupts internally (e.g., Workflow dispatches resolved children, keeps unresolved as WAITING). |
34+
| All | PENDING | Re-execute with all `resume_inputs`. |
35+
36+
This is critical for Workflow — when one child's FR arrives, it
37+
re-runs immediately to dispatch that resolved child. It doesn't
38+
wait for all children's FRs.
39+
40+
**`rerun_on_resume=False`** (leaf nodes, simple HITL):
41+
42+
| FRs received | Status | Behavior |
43+
|---|---|---|
44+
| Partial | WAITING | Stay waiting. Need all FRs. |
45+
| All | COMPLETED | Auto-complete. Output = aggregated `resolved_responses`. No re-execution. |
46+
47+
## Resume with prior output and interrupts
48+
49+
A node can produce output AND interrupt in the same execution (e.g.,
50+
a Workflow where child A completes with output and child B interrupts).
51+
On resume:
52+
53+
- Some interrupt IDs are resolved (provided in `resume_inputs`)
54+
- Remaining interrupt IDs carry forward via `prior_interrupt_ids`
55+
- Prior output carries forward via `prior_output`
56+
- NodeRunner pre-populates ctx with these values before re-executing
57+
58+
```python
59+
runner = NodeRunner(
60+
node=node, parent_ctx=ctx,
61+
run_id=prior_run_id, # reuse
62+
prior_output=cached_output,
63+
prior_interrupt_ids={'fc-2'}, # still unresolved
64+
)
65+
child_ctx = await runner.run(
66+
node_input=input,
67+
resume_inputs={'fc-1': response},
68+
)
69+
```
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Context
2+
3+
## Architecture
4+
5+
The runtime uses two scoping objects:
6+
7+
- **InvocationContext** — singleton per invocation. Holds shared
8+
state (session, services, event queue) accessible by all nodes.
9+
Pydantic model at `agents/invocation_context.py`.
10+
- **Context** — one per node execution. Holds per-node results
11+
(output, route, interrupt_ids) and provides the API surface for
12+
node code. At `agents/context.py`.
13+
14+
Every Context holds a reference to the same InvocationContext
15+
(`_invocation_context`). Service access (artifacts, memory, auth)
16+
is delegated through it.
17+
18+
```
19+
Root Context ← created by Runner from IC
20+
└── Context [runner.node] ← the root node (e.g., Workflow)
21+
├── Context [child_a] ← child node A
22+
└── Context [child_b] ← child node B
23+
└── Context [grandchild] ← nested child
24+
```
25+
26+
The Runner creates `root_ctx = Context(ic)` as the tree root and
27+
passes it as `parent_ctx` to `NodeRunner(node=self.node)`. The
28+
root Context has no node_path or run_id — it exists solely
29+
as the parent for the Runner's root node. All Contexts in the tree
30+
share the same InvocationContext singleton.
31+
32+
InvocationContext contents:
33+
34+
- `session`, `agent`, `user_content`
35+
- `invocation_id`, `app_name`, `user_id`
36+
- Services: `artifact_service`, `memory_service`, `credential_service`
37+
- `run_config`, `live_request_queue`
38+
- `process_queue` — shared event queue consumed by the main loop
39+
40+
## 1:1 node-context mapping
41+
42+
Every node execution gets its own Context instance. The relationship
43+
is strictly 1:1: one node, one Context. The Context tree mirrors the
44+
node execution tree.
45+
46+
**NodeRunner** creates the child Context from the parent's Context
47+
via `_create_child_context()`. The child inherits:
48+
49+
- `_invocation_context` — same singleton (shared session, services)
50+
- `node_path` — parent path + node name (e.g., `wf/child_a`)
51+
- `run_id` — unique per execution (reused on resume)
52+
- `event_author` — inherited from parent
53+
- `schedule_dynamic_node_internal` — inherited from parent
54+
55+
The child does NOT inherit output, route, or interrupt_ids — those
56+
are per-execution results, starting fresh (unless resume carries
57+
forward `prior_output` / `prior_interrupt_ids`).
58+
59+
## Node result properties
60+
61+
These properties on Context are the primary mechanism for
62+
communicating results between nodes:
63+
64+
- **`ctx.output`** — the node's result value. Set once per
65+
execution. Can be set via `yield value` (framework sets it) or
66+
`ctx.output = X` directly. Second write raises `ValueError`.
67+
- **`ctx.route`** — routing value for conditional edges. Set
68+
independently of output. Workflow-specific.
69+
- **`ctx.interrupt_ids`** — accumulated interrupt IDs. Read-only
70+
for user code. Set by framework when node yields an Event with
71+
`long_running_tool_ids`.
72+
73+
Output and interrupts can coexist — the orchestrator's `_finalize`
74+
decides what to propagate. The orchestrator reads these properties
75+
after the child node finishes.
76+
77+
## Class hierarchy
78+
79+
```
80+
ReadonlyContext (agents/readonly_context.py)
81+
└── Context (agents/context.py)
82+
```
83+
84+
**ReadonlyContext** — read-only view used in callbacks and plugins:
85+
- `user_content`, `invocation_id`, `agent_name`
86+
- `state` (returns `MappingProxyType` — immutable view)
87+
- `session`, `user_id`, `run_config`
88+
89+
**Context(ReadonlyContext)** — full read-write context for node
90+
execution. Extends ReadonlyContext with mutable state, node results,
91+
workflow metadata, and service methods. See property reference below.
92+
93+
## Property reference
94+
95+
| Category | Properties |
96+
|---|---|
97+
| State & actions | `state` (mutable `State`), `actions` (EventActions) |
98+
| Node results | `output`, `route`, `interrupt_ids` (read-only) |
99+
| Workflow | `node_path`, `run_id`, `triggered_by`, `in_nodes`, `resume_inputs`, `retry_count`, `event_author` |
100+
| Methods | `run_node()`, `get_next_child_run_id()` |
101+
| Artifacts | `load_artifact()`, `save_artifact()`, `list_artifacts()` |
102+
| Memory | `search_memory()`, `add_session_to_memory()`, `add_events_to_memory()`, `add_memory()` |
103+
| Auth | `request_credential()`, `load_credential()`, `save_credential()` |
104+
| Tools | `request_confirmation()`, `function_call_id` |
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# LLM Context Orchestration from Events
2+
3+
## Core Principle
4+
5+
In ADK, there is a clear distinction between the **Event Stream** and the **LLM Context**:
6+
7+
- **Events are the Ground Truth**: They are immutable records of what has happened in a session (user messages, model responses, tool calls, results). They serve as the audit log and persistence state.
8+
- **LLM Context is an Orchestrated View**: The context passed to an LLM is not merely a dump of the raw event log. It is a carefully orchestrated view, filtered and transformed to match the specific role, task, and branch of the agent currently executing.
9+
10+
## Orchestration Strategies
11+
12+
The framework orchestrates the translation of events into LLM context using several strategies:
13+
14+
### 1. Task Delegation Translation
15+
16+
When a coordinator agent delegates a task to a sub-agent (Task Agent) via a tool call:
17+
18+
- **Source Event**: Coordinator calls a tool like `request_task_<sub_agent_name>(args...)`.
19+
- **Orchestrated Context**:
20+
- The arguments in the `request_task_<sub_agent_name>` tool call are extracted and placed in the **System Instruction (SI)** or treated as the core instruction for the sub-agent.
21+
- The first user message presented to the sub-agent is synthesized to represent the goal (e.g., "Finish task of [sub_agent_name] with arguments [args]").
22+
- **Goal**: Isolate the sub-agent from the coordinator's full history and give it a crisp, clear starting point.
23+
24+
### 2. Branch Isolation
25+
26+
In complex workflows with parallel execution:
27+
28+
- **Source Events**: Events from all nodes and branches are stored in the same session chronologically.
29+
- **Orchestrated Context**: The framework filters events by `branch` (e.g., `node:path.name`). An agent only sees events that belong to its own execution path.
30+
- **Goal**: Prevent cross-node event pollution and ensure deterministic behavior in isolated tasks.
31+
32+
### 3. History Trimming and Compaction
33+
34+
To prevent context window overflow and stale instruction loops:
35+
36+
- **Source Events**: A long history of retries, tool calls, and interactions.
37+
- **Orchestrated Context**: The framework may trim older events or summarize them (event compaction). In task mode, it might keep only the essential setup events, ignoring stale retry loops that would otherwise confuse the LLM.
38+
- **Goal**: Maintain a focused and efficient context window for the LLM.
39+
40+
## Summary
41+
42+
The relationship is one of **Source vs. View**. Events are the source of truth for the session, while LLM context is a highly orchestrated view of that truth, tailored for the active agent.
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# NodeRunner
2+
3+
NodeRunner is the per-node executor. It drives `BaseNode.run()`,
4+
creates the child Context, enriches events, and writes results
5+
to ctx.
6+
7+
## Two communication channels
8+
9+
The runtime has two distinct channels for data flow:
10+
11+
- **Context** — parent ↔ child communication. Output, route, state,
12+
resume_inputs, and interrupt_ids flow through ctx. The orchestrator
13+
reads ctx after the child completes to decide what to do next.
14+
- **Event** — persistence and streaming. Events are appended to the
15+
session and streamed to the caller. They carry message, state
16+
deltas, function calls, and interrupt markers.
17+
18+
A node writes to **ctx** to communicate with its parent. A node
19+
yields **Events** to persist data and stream messages to the user.
20+
21+
## Execution flow
22+
23+
```
24+
Orchestrator
25+
26+
├─ NodeRunner(node=child, parent_ctx=ctx)
27+
│ │
28+
│ ├─ _create_child_context() → child Context
29+
│ ├─ _execute_node() → iterate node.run()
30+
│ │ ├─ _track_event_in_context() → write to ctx
31+
│ │ └─ _enqueue_event() → enrich + persist
32+
│ ├─ _flush_output_and_deltas() → emit deferred output
33+
│ └─ return child ctx
34+
35+
└─ reads ctx.output, ctx.route, ctx.interrupt_ids
36+
```
37+
38+
1. **Create child Context** — inherits `_invocation_context` (shared
39+
singleton), builds `node_path` from parent, assigns `run_id`.
40+
41+
2. **Iterate `node.run()`** — for each yielded Event:
42+
43+
**Track in context**`_track_event_in_context` writes output,
44+
route, and interrupt_ids from the event to ctx (source of truth).
45+
46+
**Enrich**`_enrich_event` stamps metadata before persistence:
47+
- `event.author` — node name (or `event_author` override)
48+
- `event.invocation_id` — from InvocationContext
49+
- `event.node_info.path` — full path (e.g., `wf/child_a`)
50+
- `event.node_info.run_id` — unique per execution
51+
- `event.node_info.output_for` — ancestor paths when
52+
`use_as_output=True`
53+
54+
**Flush deltas** — for non-partial events, `_flush_deltas` moves
55+
pending state/artifact deltas from `ctx.actions` onto the event
56+
before enqueueing.
57+
58+
**Enqueue**`ic.enqueue_event` puts the event on the shared
59+
process queue for session persistence.
60+
61+
3. **Flush deferred output** — if `ctx.output` was set directly
62+
(not via yield), `_flush_output_and_deltas` emits the output
63+
Event after `_run_impl` returns. Bundles any remaining
64+
state/artifact deltas onto the same Event.
65+
66+
4. **Return child ctx** — the orchestrator reads `ctx.output`,
67+
`ctx.route`, and `ctx.interrupt_ids`.
68+
69+
## Output delegation (`use_as_output`)
70+
71+
When a child is scheduled with `use_as_output=True`, its output
72+
Event also counts as the parent's output. NodeRunner:
73+
74+
- Sets `ctx._output_delegated = True` on the parent
75+
- Skips emitting the parent's own output Event
76+
- Stamps `event.node_info.output_for` with ancestor paths

0 commit comments

Comments
 (0)