Commit 1e2b983
authored
🤖 feat: add advisor same-step context handoff (#3168)
## Summary
Adds same-step context handoff to the advisor tool so the nested advisor
model call understands **why it was invoked at that exact moment**, not
just the prior transcript.
## Background
The advisor tool previously sent only the full conversation transcript
to the nested model. While this gives the advisor the overall context,
it had no information about:
- **What question** the caller model wants answered
- **What reasoning/commentary** the caller was producing in the current
step that led to the advisor call
- **What the pending tool call looks like**
This gap meant the advisor couldn't distinguish "why was I called now?"
from "what happened in the conversation so far?"
## Implementation
Three layered changes, each building on the previous:
### Phase 1: Advisor tool contract (`question` field)
- Extended `AdvisorToolInputSchema` to accept an optional `question:
z.string().min(1).nullish()` field
- Updated `ADVISOR_TOOL_DESCRIPTION` to instruct the caller to pass a
brief question
- Added normalization in `execute()` (trim, empty→undefined)
- Renders the question in `AdvisorToolCall.tsx` when present
### Phase 2: Same-step context capture
- Added `AdvisorToolCallSnapshot` and `AdvisorStepCaptureRef` types
- Extended `StreamRequestConfig` with an `onChunk` callback, threaded
through to `streamText()`
- In `aiService.ts`, wired `onChunk` to accumulate `text-delta` →
`currentStepText` and `reasoning-delta` → `currentStepReasoning`
- When a `tool-call` chunk with `toolName === "advisor"` arrives,
freezes a snapshot keyed by `toolCallId`
- `prepareStep` resets per-step buffers to prevent cross-step leakage
- Exposed `takeToolCallSnapshot(toolCallId)` with consume-on-read
semantics
### Phase 3: Handoff composition
- In `advisor.ts execute()`, retrieves the frozen snapshot and builds a
synthetic `user` handoff message
- Appended as the final message: `[...transcript, handoffMessage]`
- Sections included only when content exists: `Question`, `Current-step
commentary`, `Current-step reasoning`, `Pending tool call`
- Tail-biased truncation for same-step text/reasoning (configurable via
`ADVISOR_HANDOFF_MAX_TEXT_CHARS` /
`ADVISOR_HANDOFF_MAX_REASONING_CHARS`)
- Falls back to raw transcript when no question/snapshot exists
## Validation
- `make typecheck` ✅
- `make lint` ✅
- All targeted tests pass: `advisor.test.ts`, `streamManager.test.ts`,
`aiService.test.ts`, `toolDefinitions.test.ts`,
`AdvisorToolCall.test.tsx`
- **Live dogfooding** via dev-server sandbox:
- Triggered a real advisor tool call with the experiment flag enabled
- Verified in the Debug tab that the nested advisor request contains the
synthetic `## Advisor Handoff` user message with `**Question:**` and
`**Pending tool call:**` sections
- Captured screenshots and video recordings as evidence
## Risks
- **Context bloat**: Mitigated by tail-biased truncation with
configurable max char limits
- **Multiple advisor calls per step**: Mitigated by per-`toolCallId`
snapshot keying in a `Map`
- **Rollout safety**: `question` field is optional; existing `{}` calls
still work
---
<details>
<summary>📋 Implementation Plan</summary>
# Advisor same-step context handoff implementation plan
## Goal
Make the nested `advisor` model call understand **why it was invoked at
that exact moment** without dropping the existing full-transcript
context.
The implementation should keep the current full transcript, then add a
**structured advisor handoff** that captures the immediate same-step
rationale:
1. the explicit question the caller wants answered
2. any visible same-step commentary/text emitted before the tool call
3. any visible same-step reasoning emitted before the tool call
4. the exact pending `advisor` tool call snapshot keyed by `toolCallId`
## Non-goals
- Do **not** attempt to forward raw hidden reasoning / chain-of-thought
tokens.
- Do **not** patch or fork the AI SDK's `ToolExecutionOptions` shape.
- Do **not** replace the full transcript with a compact handoff; the
handoff is additive.
## Verified evidence driving this plan
- `src/node/services/tools/advisor.ts` sends `messages: transcript` into
`generateText()` and currently rejects all tool args.
- `src/node/services/aiService.ts` populates
`advisorTranscriptRef.messages` from the live stream via `prepareStep`.
- `src/node/services/streamManager.ts` forwards only the **pre-step**
message transcript today (`prepareStep({ messages })`).
- `node_modules/@ai-sdk/provider-utils/src/types/tool.ts` documents that
tool execution `messages` exclude the assistant response that contained
the tool call.
- `node_modules/ai/src/generate-text/stream-text.ts` and
`run-tools-transformation.ts` show the AI SDK already exposes the right
primitives for a better handoff:
- `prepareStep` for per-step reset / input transcript
- `onChunk` for same-step `text-delta`, `reasoning-delta`, and
`tool-call`
- parsed `tool-call` chunks with `toolCallId`, `toolName`, and `input`
- `src/browser/features/RightSidebar/DevToolsTab/DevToolsStepCard.tsx`
already renders the raw AI SDK request JSON, so if we append a synthetic
handoff message to the advisor request it should become visible in Debug
Logs without extra devtools plumbing.
## Decision summary
### Recommended approach: full transcript + frozen same-step snapshot +
synthetic final handoff message
Keep the advisor request shape conceptually as:
```ts
generateText({
model,
system: ADVISOR_SYSTEM_PROMPT,
messages: [
...fullTranscript,
syntheticAdvisorHandoffMessage,
],
tools: {},
});
```
Where `syntheticAdvisorHandoffMessage` is a final **`user`** message
derived from the current advisor invocation, for example:
```ts
{
role: "user",
content: [
"Advisor handoff",
`Question: ${question}`,
`Current-step commentary: ${stepTextExcerpt}`,
`Current-step reasoning: ${stepReasoningExcerpt}`,
`Pending tool call: advisor(${serializedArgs})`,
].join("\n\n"),
}
```
This keeps the base transcript intact while making the immediate
consultation request explicit and inspectable.
## Approaches considered
| Approach | Summary | Product net LoC estimate* | Recommendation |
|---|---|---:|---|
| A | Add `question` only and append a simple handoff message | 40–80 |
Good fallback / rollout step, but incomplete |
| B | **Recommended:** add `question`, capture same-step visible
context, append structured handoff | 140–230 | **Primary path** |
| C | Inject custom step context by patching AI SDK tool execution
options | 220–400 | Avoid |
\*Product-code estimate only; tests/docs/dogfooding artifacts excluded.
<details>
<summary>Why Approach B wins</summary>
- Solves the actual gap: the advisor learns not only the whole
transcript, but also the immediate rationale for the current
consultation.
- Uses supported AI SDK hooks instead of patching internal tool
execution contracts.
- Produces a handoff that is readable in Debug Logs and easy to reason
about in tests.
- Can fall back safely when same-step text/reasoning or `question` is
missing.
</details>
## Planned implementation
## Phase 1 — Extend the advisor tool contract (safe rollout)
### Changes
1. Update `src/common/utils/tools/toolDefinitions.ts`
- Change `AdvisorToolInputSchema` from empty-object-only to a staged
rollout schema:
- `question: z.string().min(1).nullish()`
- Keep the field optional for the first rollout so existing model
behavior does not hard-fail if the model still emits `{}`.
- Update `TOOL_DEFINITIONS.advisor.description` /
`ADVISOR_TOOL_DESCRIPTION` to explicitly instruct the caller to pass a
brief question summarizing the decision or ambiguity.
2. Update `src/node/services/tools/advisor.ts`
- Replace the current `assert(Object.keys(args).length === 0)` with
validation that accepts the staged `question` field.
- Normalize `question` defensively:
- trim whitespace
- treat empty / whitespace-only strings as absent
3. Optional but recommended UI polish in
`src/browser/features/Tools/AdvisorToolCall.tsx`
- Display `args.question` when present so advisor invocations are
self-describing in the chat transcript.
### Why this phase exists
This gives the parent model an explicit way to say what it wants help
with, even before same-step capture lands.
### Quality gate after Phase 1
- Targeted unit tests around the advisor schema / tool execution pass.
- Existing empty-args advisor calls still work.
- Tool description clearly nudges the model to supply `question`.
## Phase 2 — Capture same-step visible context at the tool-call boundary
### Snapshot model
Add an internal advisor-only capture shape along these lines:
```ts
type AdvisorToolCallSnapshot = {
toolCallId: string;
toolName: "advisor";
input: { question?: string | null };
stepText: string;
stepReasoning: string;
};
```
And a mutable per-stream capture ref in `AIService`:
```ts
type AdvisorStepCaptureRef = {
currentStepText: string;
currentStepReasoning: string;
frozenSnapshotsByToolCallId: Map<string, AdvisorToolCallSnapshot>;
};
```
### Changes
1. Extend `src/common/utils/tools/tools.ts`
- Add advisor runtime accessors that let the tool read and consume a
frozen snapshot by `toolCallId`, for example:
- `takeToolCallSnapshot(toolCallId: string): AdvisorToolCallSnapshot |
undefined`
2. Update `src/node/services/streamManager.ts`
- Extend `StreamRequestConfig` with an optional chunk callback (e.g.
`onChunk?: (part) => void`).
- In `createStreamResult()`, pass the callback through to `streamText({
onChunk })`.
- Leave existing `prepareStep` behavior intact.
3. Update `src/node/services/aiService.ts`
- Add a new mutable ref beside `advisorTranscriptRef` for same-step
capture.
- Reset `currentStepText`, `currentStepReasoning`, and any stale frozen
snapshots when `prepareStep` starts a new step.
- In the new chunk callback:
- append `text-delta` to `currentStepText`
- append `reasoning-delta` to `currentStepReasoning`
- when a parsed `tool-call` chunk for `toolName === "advisor"` arrives,
freeze a snapshot into `frozenSnapshotsByToolCallId` keyed by
`toolCallId`
### Defensive programming requirements
- Freeze snapshots **per `toolCallId`**, not globally, so multiple
advisor calls in one step cannot overwrite each other.
- Clone the current strings when freezing; do not keep shared mutable
references inside the frozen snapshot.
- Consume and delete the frozen snapshot once `execute()` reads it.
- Reset per-step buffers in `prepareStep` so data cannot leak across
steps.
- Assert that any snapshot returned for a tool call either matches
`toolName === "advisor"` or is discarded.
### Why this phase exists
This is the key fix for the current gap. It captures the same-step
rationale **at the moment the advisor tool call is parsed**, which is
the closest supported boundary to “why did the parent model decide to
call advisor?”
### Quality gate after Phase 2
- New unit/integration coverage proves:
- same-step text and reasoning accumulate during a step
- advisor snapshots are frozen when the advisor tool call is parsed
- two advisor tool calls in the same step do not clobber each other
- step resets clear stale buffers
## Phase 3 — Compose the advisor handoff request
### Changes
1. Update `src/node/services/tools/advisor.ts`
- Keep `const transcript = runtime.getTranscriptSnapshot()` as the base
context.
- Retrieve the frozen snapshot with
`runtime.takeToolCallSnapshot(toolCallId)`.
- Build a final synthetic `user` handoff message using:
- normalized `question` (preferred explicit ask)
- same-step commentary text excerpt
- same-step reasoning excerpt
- pending tool-call summary (serialized advisor args)
- Call the nested advisor model with:
- `messages: [...transcript, syntheticHandoffMessage]` when any handoff
content exists
- otherwise fall back to `messages: transcript`
2. Add truncation / formatting constants (prefer colocated shared
constants rather than magic numbers)
- Example constants to add near advisor constants:
- `ADVISOR_HANDOFF_MAX_TEXT_CHARS`
- `ADVISOR_HANDOFF_MAX_REASONING_CHARS`
- Use **tail-biased** truncation so the advisor sees the most recent
commentary/reasoning closest to the tool call.
3. Keep `ADVISOR_SYSTEM_PROMPT` mostly static
- Do not overload the system prompt with runtime question/context.
- If needed, add a small static instruction explaining that a final user
handoff message may summarize the immediate consultation request.
### Formatting rules for the synthetic handoff
- Prefer a deterministic labeled block so tests and Debug Logs stay
legible.
- Include only sections that have content.
- Suggested section order:
1. `Question`
2. `Current-step commentary`
3. `Current-step reasoning`
4. `Pending tool call`
- If both question and same-step context are missing, skip the synthetic
handoff entirely.
### Why `user` message over dynamic system prompt augmentation
- Keeps runtime context visible in raw request logs.
- Preserves a stable system prompt.
- Avoids mixing permanent instructions with per-invocation consultation
context.
- Makes replay/debugging easier because the “why now?” handoff is
explicit in the message list.
### Quality gate after Phase 3
- Advisor tests verify the nested `generateText()` call still receives
the full transcript.
- When a question/snapshot exists, the final request contains one
appended handoff message.
- When no question/snapshot exists, behavior falls back cleanly to
today’s transcript-only request.
## Phase 4 — Test coverage and regression protection
### Targeted tests
1. `src/node/services/tools/advisor.test.ts`
- transcript-only fallback still works
- question-only handoff appends one final message
- snapshot + question handoff appends the expected labeled sections
- snapshot is consumed once and not reused accidentally
2. `src/node/services/streamManager.test.ts` and/or nearby focused tests
- new chunk callback plumbing fires for `text-delta`, `reasoning-delta`,
and `tool-call`
- advisor snapshots freeze at the tool-call boundary
- multiple advisor tool calls in one step keep separate snapshots
3. If needed, add a focused `AIService`-level test around the new
advisor runtime refs
- prove `prepareStep` reset + chunk accumulation + snapshot freeze all
work together
4. `src/browser/features/Tools/AdvisorToolCall.test.tsx` (if Phase 1 UI
polish lands)
- renders `question` when present
### Validation commands
Run at minimum:
- targeted advisor tests
- targeted stream manager / AI service tests covering the new capture
path
- targeted browser test if `AdvisorToolCall.tsx` changes
- `make typecheck`
- `make lint`
If the touched test surface is small, keep unit coverage targeted first,
then run the broader static checks.
## Acceptance criteria
- The nested advisor request still includes the full transcript captured
for the parent step.
- The advisor request includes a final explicit handoff message whenever
there is a question and/or same-step context to hand off.
- The handoff message is visible in Debug Logs’ raw AI SDK request view.
- The handoff explains why the advisor was called in terms of:
- explicit question when available
- visible same-step commentary when available
- visible same-step reasoning when available
- pending advisor tool call summary
- Snapshot capture is keyed by `toolCallId`, not a single global mutable
snapshot.
- Missing question or missing same-step context does not break the
advisor tool; it falls back gracefully.
- No other tool execution path regresses.
## Risks and mitigations
- **Risk: context bloat**
- Mitigation: cap same-step text/reasoning with explicit constants and
tail-biased truncation.
- **Risk: multiple advisor calls in one step**
- Mitigation: freeze snapshots into a `Map<toolCallId, snapshot>` and
consume per tool call.
- **Risk: same-step commentary is often empty**
- Mitigation: make the handoff resilient; question + pending tool-call
summary are still valuable.
- **Risk: stale snapshot leakage across steps or retries**
- Mitigation: reset buffers at `prepareStep`, consume snapshots on
execute, and clear stale maps defensively on step resets.
- **Risk: rollout breakage if the model still emits `{}`**
- Mitigation: keep `question` optional in the first rollout, then
consider tightening later once dogfooding shows consistent usage.
## Deferred follow-ups (not part of the first implementation)
- Tighten `question` from optional to required after rollout confidence.
- Add prompt-side guidance encouraging a short preamble before advisor
calls if dogfooding shows that current-step commentary is usually empty.
- Add richer devtools affordances only if the raw request view still
feels too opaque after the explicit handoff lands.
## Dogfooding and proof plan
### Setup
1. Run the desktop app locally (`make dev` or the project’s normal
Electron dev workflow).
2. Open a scratch workspace where the model is likely to call `advisor`
(plan mode or an ambiguous architecture task works well).
3. Use the `electron` skill (preferred) or equivalent browser automation
against the running app to drive the scenario reproducibly.
### Dogfood scenario
1. Trigger a prompt likely to cause an advisor consultation.
2. Wait for the parent step to emit reasoning / commentary and then call
`advisor`.
3. Open Debug Logs and inspect:
- the parent step output
- the nested advisor step request in raw AI SDK view
4. Verify the advisor request contains:
- the prior transcript
- the final handoff message with the expected sections
5. Verify the advisor response is more clearly targeted to the explicit
question / current dilemma than before.
### Required reviewer evidence
Capture and attach:
1. **Screenshot A** — parent step showing the reasoning/commentary that
led to the advisor call
2. **Screenshot B** — advisor nested request JSON showing the appended
handoff message
3. **Screenshot C** — rendered advisor tool call card showing the
`question` (if UI polish lands)
4. **Short video recording** — end-to-end flow from prompt → advisor
invocation → Debug Logs inspection
### Dogfood quality gates
- After Phase 2: verify the advisor request still lacks the final
handoff (expected intermediate state) but internal tests prove the
snapshot is frozen correctly.
- After Phase 3: repeat the same scenario and confirm the raw advisor
request now contains the explicit handoff.
- Before sign-off: collect screenshots/video and confirm they match the
acceptance criteria above.
</details>
---
_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh` • Cost: `$39.11`_
<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=39.11 -->1 parent 7b1f94c commit 1e2b983
14 files changed
Lines changed: 848 additions & 15 deletions
File tree
- docs/hooks
- src
- browser/features/Tools
- common
- constants
- utils/tools
- node/services
- agentSkills
- tools
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
220 | 229 | | |
221 | 230 | | |
222 | 231 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
114 | 134 | | |
115 | 135 | | |
116 | 136 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
90 | 100 | | |
91 | 101 | | |
92 | 102 | | |
| |||
203 | 213 | | |
204 | 214 | | |
205 | 215 | | |
206 | | - | |
| 216 | + | |
207 | 217 | | |
208 | 218 | | |
209 | 219 | | |
| |||
213 | 223 | | |
214 | 224 | | |
215 | 225 | | |
| 226 | + | |
216 | 227 | | |
217 | 228 | | |
218 | 229 | | |
| |||
255 | 266 | | |
256 | 267 | | |
257 | 268 | | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
258 | 276 | | |
259 | 277 | | |
260 | 278 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
4 | 10 | | |
5 | 11 | | |
6 | 12 | | |
| |||
12 | 18 | | |
13 | 19 | | |
14 | 20 | | |
15 | | - | |
| 21 | + | |
| 22 | + | |
16 | 23 | | |
17 | 24 | | |
18 | 25 | | |
| |||
40 | 47 | | |
41 | 48 | | |
42 | 49 | | |
43 | | - | |
| 50 | + | |
| 51 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
247 | 247 | | |
248 | 248 | | |
249 | 249 | | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
250 | 264 | | |
251 | 265 | | |
252 | 266 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
156 | | - | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
157 | 161 | | |
158 | 162 | | |
159 | 163 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
71 | 85 | | |
72 | 86 | | |
73 | 87 | | |
| |||
145 | 159 | | |
146 | 160 | | |
147 | 161 | | |
| 162 | + | |
| 163 | + | |
148 | 164 | | |
149 | 165 | | |
150 | 166 | | |
| |||
Lines changed: 9 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4025 | 4025 | | |
4026 | 4026 | | |
4027 | 4027 | | |
| 4028 | + | |
| 4029 | + | |
| 4030 | + | |
| 4031 | + | |
| 4032 | + | |
| 4033 | + | |
| 4034 | + | |
| 4035 | + | |
| 4036 | + | |
4028 | 4037 | | |
4029 | 4038 | | |
4030 | 4039 | | |
| |||
0 commit comments