Skip to content

Commit 18c8909

Browse files
authored
fix: user feedback issues (#1378)
* fix: skip splash when main window appears quickly * fix(agent): batch fit tool outputs * fix(core): harden splash and tool guards
1 parent d287563 commit 18c8909

23 files changed

Lines changed: 1103 additions & 92 deletions

File tree

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Tool Output Guardrails Plan
2+
3+
## Summary
4+
5+
- Keep the existing single-tool offload behavior.
6+
- Add batch fitting for tool results in the new session agent path only.
7+
- Preserve the largest prefix of tool results that can still fit the next model call.
8+
- Downgrade overflow tail results to the fixed context-window failure message before continuing.
9+
- Keep terminal error fallback when even the fully downgraded batch cannot fit.
10+
11+
## Implementation
12+
13+
- Extend `ToolOutputGuard` with a batch fitting helper that:
14+
- evaluates the full staged batch against the context budget
15+
- downgrades tail items one by one to the fixed failure message
16+
- cleans up offload files for downgraded items
17+
- returns terminal fallback if the fully downgraded batch still does not fit
18+
- Refactor `executeTools()` in `deepchatAgentPresenter/dispatch.ts` into two phases:
19+
- execute tools and stage candidate outputs plus side effects
20+
- fit the staged batch, then commit final tool messages, blocks, hooks, and search persistence once
21+
- Keep `question` and `permission` pauses on the immediate path; they are not part of staged batch fitting.
22+
- Keep deferred permission-resume behavior unchanged.
23+
24+
## Test Plan
25+
26+
- Multi-`read` batch: keep prefix, downgrade overflow tail, continue next provider turn.
27+
- Mixed `exec`/`read`: downgraded offloaded results must delete their `.offload` files.
28+
- Search resource result in downgraded tail: no search block and no persisted search rows.
29+
- Fully downgraded batch still too large: return terminal error.
30+
- Preserve existing deferred single-tool resume regressions.

docs/specs/tool-output-guardrails/spec.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,23 +10,27 @@
1010
- Provider 报错会出现在主进程日志, 但 UI 未必能看到错误信息.
1111
- `directory_tree` 无深度限制, 可能产生巨量输出, 触发 10MB 限制.
1212
- 工具返回过大时会被直接注入到 LLM 上下文, 容易导致请求失败.
13+
- 多个 tool call 在单次 loop 内各自不大, 但累计后仍可能挤爆上下文窗口, 尤其是 `read` 一次读取大量文件时.
1314

1415
## 目标
1516

1617
- 让生成失败时的错误信息可见并可追溯.
1718
-`directory_tree` 增加深度控制, 最大不超过 3.
1819
- 对过大的工具输出做 offload, 用小的 stub 替代进入上下文.
20+
- 当同一轮多个 tool 结果累计超窗时, 保留能放下的前缀结果, 将尾部结果统一降级为固定失败文案并继续后续模型调用.
1921

2022
## 非目标
2123

2224
- 不改动或替换 `agentPresenter/tool` 下的 `ToolRegistry`/`toolRouter`.
2325
- 不改变 MCP UI 资源与搜索结果的解析逻辑.
26+
- 不改 legacy `AgentPresenter` 链路, 本次仅覆盖新 session agent.
2427

2528
## 用户故事
2629

2730
1. 作为用户, 我希望生成失败时能在 UI 直接看到原始错误文本.
2831
2. 作为模型, 我希望能指定目录树深度, 避免一次输出过大.
2932
3. 作为系统, 我希望工具输出过大时自动 offload, 仍可在需要时读取完整内容.
33+
4. 作为模型, 我希望当同一批 tool 结果累计超窗时, 能明确知道哪些尾部 tool 因上下文不足而失败, 从而调整下一步策略.
3034

3135
## 验收标准
3236

@@ -57,3 +61,17 @@
5761
- 模型可以通过文件类工具读取上述路径.
5862
- 文件类读取工具仅放行当前会话 `conversationId` 对应目录.
5963
- `tool_call_response_raw` 不被改写, 避免影响 MCP UI/搜索结果处理.
64+
65+
### 同轮批量尾部降级
66+
67+
- 仅在新 session agent 链路启用.
68+
- 同一轮多个已完成 tool call 在准备进入下一次上下文前, 必须作为一个 batch 统一做预算拟合.
69+
- 如果所有结果都能放下, 保持原样进入上下文.
70+
- 如果累计超窗, 系统从该 batch 的尾部开始逐个降级为固定失败文案:
71+
- `The tool call with ID <id> and name <name> failed because the remaining context window is too small to continue this turn.`
72+
- 降级的 tool 视为失败:
73+
- assistant tool_call block 显示固定失败文案
74+
- 不保留 search block / search result 持久化
75+
- 不保留成功型 hooks
76+
- 经过尾部降级后只要 batch 可以放进上下文, 就继续后续模型调用.
77+
- 如果把该 batch 所有 tool 都降级为固定失败文案后仍无法放进上下文, 保持 terminal error 兜底, 结束该 turn.

src/main/presenter/deepchatAgentPresenter/dispatch.ts

Lines changed: 163 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,27 @@ import type {
1616
} from './types'
1717
import type { ChatMessage } from '@shared/types/core/chat-message'
1818
import { nanoid } from 'nanoid'
19-
import type { ToolOutputGuard } from './toolOutputGuard'
19+
import type { ToolBatchOutputFitItem, ToolOutputGuard } from './toolOutputGuard'
2020
import { buildTerminalErrorBlocks } from './messageStore'
2121

2222
type PermissionType = 'read' | 'write' | 'all' | 'command'
2323

24+
type ExtractedSearchPayload = ReturnType<typeof extractSearchPayload>
25+
26+
type StagedToolResult = {
27+
toolCallId: string
28+
toolName: string
29+
toolArgs: string
30+
responseText: string
31+
isError: boolean
32+
offloadPath?: string
33+
searchPayload: ExtractedSearchPayload
34+
rtkApplied?: boolean
35+
rtkMode?: 'rewrite' | 'direct' | 'bypass'
36+
rtkFallbackReason?: string
37+
postHookKind: 'success' | 'failure'
38+
}
39+
2440
type PermissionRequestLike = {
2541
toolName?: string
2642
serverName?: string
@@ -189,6 +205,90 @@ function updateToolCallBlock(
189205
}
190206
}
191207

208+
function persistToolExecutionState(io: IoParams, state: StreamState): void {
209+
if (!state.dirty) {
210+
return
211+
}
212+
213+
flushBlocksToRenderer(io, state.blocks)
214+
io.messageStore.updateAssistantContent(io.messageId, state.blocks)
215+
state.dirty = false
216+
}
217+
218+
function applyFinalizedToolResults(params: {
219+
stagedResults: StagedToolResult[]
220+
fittedResults: ToolBatchOutputFitItem[]
221+
conversation: ChatMessage[]
222+
state: StreamState
223+
io: IoParams
224+
hooks?: ProcessHooks
225+
appendToConversation: boolean
226+
}): void {
227+
const { stagedResults, fittedResults, conversation, state, io, hooks, appendToConversation } =
228+
params
229+
230+
for (let index = 0; index < stagedResults.length; index += 1) {
231+
const stagedResult = stagedResults[index]
232+
const fittedResult = fittedResults[index]
233+
if (!fittedResult) {
234+
continue
235+
}
236+
237+
if (appendToConversation) {
238+
conversation.push({
239+
role: 'tool',
240+
tool_call_id: fittedResult.toolCallId,
241+
content: fittedResult.contextResponseText
242+
})
243+
}
244+
245+
if (!fittedResult.downgraded && stagedResult.searchPayload) {
246+
state.blocks.push(stagedResult.searchPayload.block)
247+
for (const result of stagedResult.searchPayload.results) {
248+
io.messageStore.addSearchResult({
249+
sessionId: io.sessionId,
250+
messageId: io.messageId,
251+
searchId: result.searchId,
252+
rank: typeof result.rank === 'number' ? result.rank : null,
253+
result
254+
})
255+
}
256+
}
257+
258+
updateToolCallBlock(
259+
state.blocks,
260+
fittedResult.toolCallId,
261+
fittedResult.responseText,
262+
fittedResult.isError,
263+
fittedResult.downgraded
264+
? undefined
265+
: {
266+
rtkApplied: stagedResult.rtkApplied,
267+
rtkMode: stagedResult.rtkMode,
268+
rtkFallbackReason: stagedResult.rtkFallbackReason
269+
}
270+
)
271+
272+
if (fittedResult.isError) {
273+
hooks?.onPostToolUseFailure?.({
274+
callId: stagedResult.toolCallId,
275+
name: stagedResult.toolName,
276+
params: stagedResult.toolArgs,
277+
error: fittedResult.responseText
278+
})
279+
} else if (stagedResult.postHookKind === 'success') {
280+
hooks?.onPostToolUse?.({
281+
callId: stagedResult.toolCallId,
282+
name: stagedResult.toolName,
283+
params: stagedResult.toolArgs,
284+
response: fittedResult.responseText
285+
})
286+
}
287+
}
288+
289+
state.dirty = true
290+
}
291+
192292
function isPermissionType(value: unknown): value is PermissionType {
193293
return value === 'read' || value === 'write' || value === 'all' || value === 'command'
194294
}
@@ -450,6 +550,7 @@ export async function executeTools(
450550

451551
let executed = 0
452552
const pendingInteractions: PendingToolInteraction[] = []
553+
const stagedResults: StagedToolResult[] = []
453554

454555
for (const tc of state.completedToolCalls) {
455556
if (io.abortSignal.aborted) break
@@ -486,8 +587,7 @@ export async function executeTools(
486587
updateToolCallBlock(state.blocks, tc.id, errorText, true)
487588
state.dirty = true
488589
executed += 1
489-
flushBlocksToRenderer(io, state.blocks)
490-
io.messageStore.updateAssistantContent(io.messageId, state.blocks)
590+
persistToolExecutionState(io, state)
491591
continue
492592
}
493593

@@ -584,100 +684,83 @@ export async function executeTools(
584684
toolContext.name,
585685
toolContext.serverName
586686
)
587-
if (searchPayload) {
588-
state.blocks.push(searchPayload.block)
589-
for (const result of searchPayload.results) {
590-
io.messageStore.addSearchResult({
591-
sessionId: io.sessionId,
592-
messageId: io.messageId,
593-
searchId: result.searchId,
594-
rank: typeof result.rank === 'number' ? result.rank : null,
595-
result
596-
})
597-
}
598-
}
599687

600688
const responseText = toolResponseToText(toolRawData.content)
601-
const guardedResult = await toolOutputGuard.guardToolOutput({
689+
const preparedResult = await toolOutputGuard.prepareToolOutput({
602690
sessionId: io.sessionId,
603691
toolCallId: tc.id,
604692
toolName: toolContext.name,
605-
rawContent: responseText,
606-
conversationMessages: conversation,
607-
toolDefinitions: tools,
608-
contextLength,
609-
maxTokens
693+
rawContent: responseText
610694
})
695+
const stagedResponseText =
696+
preparedResult.kind === 'tool_error' ? preparedResult.message : preparedResult.content
697+
const stagedIsError = preparedResult.kind === 'tool_error' || toolRawData.isError === true
611698

612-
if (guardedResult.kind === 'terminal_error') {
613-
updateToolCallBlock(state.blocks, tc.id, guardedResult.message, true)
614-
hooks?.onPostToolUseFailure?.({
615-
callId: tc.id,
616-
name: tc.name,
617-
params: tc.arguments,
618-
error: guardedResult.message
619-
})
620-
state.dirty = true
621-
executed += 1
622-
flushBlocksToRenderer(io, state.blocks)
623-
io.messageStore.updateAssistantContent(io.messageId, state.blocks)
624-
return {
625-
executed,
626-
pendingInteractions,
627-
terminalError: guardedResult.message
628-
}
629-
}
630-
631-
const isToolError = guardedResult.kind === 'tool_error' || toolRawData.isError === true
632-
const toolMessageContent =
633-
guardedResult.kind === 'tool_error' ? guardedResult.message : guardedResult.content
634-
conversation.push({
635-
role: 'tool',
636-
tool_call_id: tc.id,
637-
content: toolMessageContent
638-
})
639-
updateToolCallBlock(state.blocks, tc.id, toolMessageContent, isToolError, {
699+
stagedResults.push({
700+
toolCallId: tc.id,
701+
toolName: tc.name,
702+
toolArgs: tc.arguments,
703+
responseText: stagedResponseText,
704+
isError: stagedIsError,
705+
offloadPath: preparedResult.kind === 'ok' ? preparedResult.offloadPath : undefined,
706+
searchPayload,
640707
rtkApplied: toolRawData.rtkApplied,
641708
rtkMode: toolRawData.rtkMode,
642-
rtkFallbackReason: toolRawData.rtkFallbackReason
709+
rtkFallbackReason: toolRawData.rtkFallbackReason,
710+
postHookKind: stagedIsError ? 'failure' : 'success'
643711
})
644-
if (isToolError) {
645-
hooks?.onPostToolUseFailure?.({
646-
callId: tc.id,
647-
name: tc.name,
648-
params: tc.arguments,
649-
error: toolMessageContent
650-
})
651-
} else {
652-
hooks?.onPostToolUse?.({
653-
callId: tc.id,
654-
name: tc.name,
655-
params: tc.arguments,
656-
response: toolMessageContent
657-
})
658-
}
712+
executed += 1
659713
} catch (err) {
660714
const errorText = err instanceof Error ? err.message : String(err)
661-
conversation.push({
662-
role: 'tool',
663-
tool_call_id: tc.id,
664-
content: `Error: ${errorText}`
665-
})
666-
updateToolCallBlock(state.blocks, tc.id, `Error: ${errorText}`, true)
667-
hooks?.onPostToolUseFailure?.({
668-
callId: tc.id,
669-
name: tc.name,
670-
params: tc.arguments,
671-
error: `Error: ${errorText}`
715+
stagedResults.push({
716+
toolCallId: tc.id,
717+
toolName: tc.name,
718+
toolArgs: tc.arguments,
719+
responseText: `Error: ${errorText}`,
720+
isError: true,
721+
searchPayload: null,
722+
postHookKind: 'failure'
672723
})
724+
executed += 1
673725
}
726+
}
727+
728+
if (stagedResults.length > 0) {
729+
const fittedResults = await toolOutputGuard.fitToolBatchOutputs({
730+
conversationMessages: conversation,
731+
results: stagedResults.map((result) => ({
732+
toolCallId: result.toolCallId,
733+
toolName: result.toolName,
734+
responseText: result.responseText,
735+
isError: result.isError,
736+
offloadPath: result.offloadPath
737+
})),
738+
toolDefinitions: tools,
739+
contextLength,
740+
maxTokens
741+
})
674742

675-
state.dirty = true
676-
executed += 1
677-
flushBlocksToRenderer(io, state.blocks)
678-
io.messageStore.updateAssistantContent(io.messageId, state.blocks)
743+
applyFinalizedToolResults({
744+
stagedResults,
745+
fittedResults: fittedResults.results,
746+
conversation,
747+
state,
748+
io,
749+
hooks,
750+
appendToConversation: fittedResults.kind === 'ok'
751+
})
752+
persistToolExecutionState(io, state)
753+
754+
if (fittedResults.kind === 'terminal_error') {
755+
return {
756+
executed,
757+
pendingInteractions,
758+
terminalError: fittedResults.message
759+
}
760+
}
679761
}
680762

763+
persistToolExecutionState(io, state)
681764
return { executed, pendingInteractions }
682765
}
683766

0 commit comments

Comments
 (0)