Skip to content

Commit 34ce3cc

Browse files
committed
✨ 工具调用防护、斜杠命令菜单与子代理详情持久化
- 新增 tool_call_guard:检测重复/循环工具调用并注入系统警告 - ChatInput 新增斜杠命令弹出菜单(/ 触发,键盘导航) - 子代理执行详情(消息历史、用量)持久化到 ToolCall 并在 UI 展示 - SubAgentBlock / MessageItem UI 改进
1 parent debfc57 commit 34ce3cc

22 files changed

Lines changed: 1010 additions & 147 deletions

src/app/service/agent/system_prompt.ts

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,9 @@ When stopped due to failures:
5555
2. **Suggest next steps** — ask if the user can help (e.g., provide correct selectors, try manually).
5656
3. **Never silently retry** — the user must know when something isn't working.
5757
58-
**Default to asking**: When in doubt between trying another approach and asking the user, always ask.`;
58+
**Default to asking**: When in doubt between trying another approach and asking the user, always ask.
59+
60+
**System guard**: The system automatically detects repetitive tool call patterns and will warn you with a \`[System Warning]\` message. If you receive one, follow its guidance immediately — do not ignore it.`;
5961

6062
const SECTION_SAFETY = `## Safety
6163
@@ -96,19 +98,19 @@ Any task that involves 2+ tool calls (web searching, page reading, page interact
9698
9799
### Delegation Examples
98100
99-
**Example 1: "帮我写一篇关于X的公众号文章"**
101+
**Example 1: "Write an article about X and publish it on the blog platform"**
100102
1. Spawn \`researcher\` sub-agent → "Research X: find key features, advantages, use cases. Return structured notes."
101103
2. Use the research result to draft the article content yourself (or delegate to another sub-agent).
102-
3. Spawn \`page_operator\` sub-agent → "Open mp.weixin.qq.com, navigate to article editor, write this HTML content into the editor: [content]"
104+
3. Spawn \`page_operator\` sub-agent → "Open the blog editor, navigate to new post, write this HTML content into the editor: [content]"
103105
104-
**Example 2: "帮我对比3个网站的价格"**
106+
**Example 2: "Compare prices for product X across 3 websites"**
105107
Spawn 3 \`page_operator\` sub-agents in the same response (parallel):
106108
- "Go to site A, find the price of product X, return price and URL"
107109
- "Go to site B, find the price of product X, return price and URL"
108110
- "Go to site C, find the price of product X, return price and URL"
109111
Then summarize results in a comparison table.
110112
111-
**Example 3: "帮我在这个页面填写表单"**
113+
**Example 3: "Fill out the form on this page"**
112114
This is a single-scope page task → spawn one \`page_operator\` sub-agent with the form data.
113115
114116
### Writing Sub-Agent Prompts
@@ -144,26 +146,24 @@ Sub-agents cannot ask the user questions, cannot spawn nested sub-agents, and ha
144146

145147
const SECTION_TASK_MANAGEMENT = `## Task Management
146148
147-
Use task tools to create a structured task list that tracks your progress. This helps the user understand what you're doing and how much work remains.
149+
Use task tools **only** when tracking progress genuinely helps the user understand a complex workflow.
148150
149151
**When to use:**
150-
- Complex tasks requiring 3+ distinct steps (e.g., navigating multiple pages, multi-stage data processing)
151-
- The user provides multiple things to do at once
152-
- After receiving new instructions — immediately capture requirements as tasks
152+
- The task requires 3+ distinct steps AND benefits from visible progress tracking
153+
- The user provides multiple independent things to do at once
153154
154155
**When NOT to use:**
155-
- Single, straightforward tasks that complete in 1-2 steps
156+
- Tasks with 1-2 steps — just execute directly
157+
- Tasks you will complete in the same or next tool call — creating a task just to immediately complete it wastes tool calls
158+
- Tasks already delegated to sub-agents — sub-agents handle their own execution
156159
- Purely conversational or informational requests
157160
158161
**Workflow:**
159162
1. **Plan** — Call \`list_tasks\` to check for existing tasks, then \`create_task\` for each step with a clear imperative subject and enough description for context.
160163
2. **Execute** — Before starting each task, call \`update_task\` with \`status: "in_progress"\`. When done, set \`status: "completed"\`.
161-
3. **Adapt** — If a completed task reveals follow-up work, create new tasks. If a task becomes irrelevant, use \`delete_task\` to clean up. Use \`get_task\` to review a task's full description before starting it.
164+
3. **Adapt** — If a completed task reveals follow-up work, create new tasks. If a task becomes irrelevant, use \`delete_task\` to clean up.
162165
163-
**Tips:**
164-
- Write subjects as brief imperatives: "Extract product prices", not "I will extract prices".
165-
- Include acceptance criteria in the description so progress is unambiguous.
166-
- Do not create tasks you intend to complete in the same tool call — tasks are for tracking multi-step progress, not logging what you already did.`;
166+
**Important:** Do not create tasks just to log what you already did or are about to do in the same response.`;
167167

168168
const SECTION_OPFS = `## OPFS Workspace
169169
@@ -238,7 +238,9 @@ Read each tool's description before calling — it defines behavior, parameters,
238238
When stopped, describe clearly in your final response:
239239
1. What you tried and what happened.
240240
2. Your best guess at the root cause.
241-
Never silently keep trying — fail fast and report.`;
241+
Never silently keep trying — fail fast and report.
242+
243+
**System guard**: The system automatically detects repetitive tool call patterns and will warn you with a \`[System Warning]\` message. If you receive one, follow its guidance immediately.`;
242244

243245
// 页面交互工作流指南(仅有 tab 工具时包含)
244246
const SUB_AGENT_SECTION_PAGE_INTERACTION = `### Page Interaction Workflow
Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
import { describe, it, expect } from "vitest";
2+
import { detectToolCallIssues, type ToolCallRecord } from "./tool_call_guard";
3+
4+
describe("detectToolCallIssues", () => {
5+
it("历史记录不足时不生成警告", () => {
6+
expect(detectToolCallIssues([])).toBeNull();
7+
expect(
8+
detectToolCallIssues([{ name: "web_search", args: '{"query":"test"}', result: "...", iteration: 1 }])
9+
).toBeNull();
10+
});
11+
12+
describe("完全相同的 tool + args 检测", () => {
13+
it("相同工具和参数调用2次时生成警告", () => {
14+
const history: ToolCallRecord[] = [
15+
{ name: "web_fetch", args: '{"url":"https://example.com"}', result: "...", iteration: 1 },
16+
{ name: "web_fetch", args: '{"url":"https://example.com"}', result: "...", iteration: 2 },
17+
];
18+
const warning = detectToolCallIssues(history);
19+
expect(warning).not.toBeNull();
20+
expect(warning).toContain("web_fetch");
21+
});
22+
23+
it("JSON 格式不同但内容相同时也触发", () => {
24+
const history: ToolCallRecord[] = [
25+
{ name: "web_fetch", args: '{"url": "https://example.com"}', result: "...", iteration: 1 },
26+
{ name: "web_fetch", args: '{"url":"https://example.com"}', result: "...", iteration: 2 },
27+
];
28+
const warning = detectToolCallIssues(history);
29+
expect(warning).not.toBeNull();
30+
});
31+
32+
it("不同参数不触发警告", () => {
33+
const history: ToolCallRecord[] = [
34+
{ name: "web_fetch", args: '{"url":"https://a.com"}', result: "...", iteration: 1 },
35+
{ name: "web_fetch", args: '{"url":"https://b.com"}', result: "...", iteration: 2 },
36+
];
37+
expect(detectToolCallIssues(history)).toBeNull();
38+
});
39+
40+
it("超过最近10条的重复不触发", () => {
41+
const history: ToolCallRecord[] = [
42+
{ name: "web_fetch", args: '{"url":"https://old.com"}', result: "...", iteration: 1 },
43+
];
44+
// 插入11条不同的调用(交替使用不同工具避免触发通用重复检测)
45+
const tools = ["web_search", "web_fetch", "execute_script"];
46+
for (let i = 0; i < 11; i++) {
47+
history.push({
48+
name: tools[i % 3],
49+
args: `{"q":"pad${i}"}`,
50+
result: '{"result":"ok"}',
51+
iteration: i + 2,
52+
});
53+
}
54+
// 再加一条与第1条相同的,但已超出最近10条窗口
55+
history.push({ name: "web_fetch", args: '{"url":"https://old.com"}', result: "...", iteration: 13 });
56+
expect(detectToolCallIssues(history)).toBeNull();
57+
});
58+
});
59+
60+
describe("execute_script 返回 null 检测", () => {
61+
it("连续3次返回 null 时生成警告", () => {
62+
const history: ToolCallRecord[] = [
63+
{
64+
name: "execute_script",
65+
args: '{"code":"a.click()","target":"page"}',
66+
result: '{"result":null,"target":"page","tab_id":123}',
67+
iteration: 1,
68+
},
69+
{
70+
name: "execute_script",
71+
args: '{"code":"b.click()","target":"page"}',
72+
result: '{"result":null,"target":"page","tab_id":123}',
73+
iteration: 2,
74+
},
75+
{
76+
name: "execute_script",
77+
args: '{"code":"c.click()","target":"page"}',
78+
result: '{"result":null,"target":"page","tab_id":123}',
79+
iteration: 3,
80+
},
81+
];
82+
const warning = detectToolCallIssues(history);
83+
expect(warning).not.toBeNull();
84+
expect(warning).toContain("execute_script");
85+
expect(warning).toContain("return");
86+
});
87+
88+
it("中间穿插其他工具但 execute_script 仍然连续 null 时触发", () => {
89+
const history: ToolCallRecord[] = [
90+
{ name: "execute_script", args: '{"code":"a()"}', result: '{"result":null}', iteration: 1 },
91+
{ name: "get_tab_content", args: '{"tab_id":1,"prompt":"find buttons"}', result: "page content...", iteration: 2 },
92+
{ name: "execute_script", args: '{"code":"b()"}', result: '{"result":null}', iteration: 3 },
93+
{ name: "get_tab_content", args: '{"tab_id":1,"prompt":"check state"}', result: "page content...", iteration: 4 },
94+
{ name: "execute_script", args: '{"code":"c()"}', result: '{"result":null}', iteration: 5 },
95+
];
96+
const warning = detectToolCallIssues(history);
97+
expect(warning).not.toBeNull();
98+
expect(warning).toContain("execute_script");
99+
});
100+
101+
it("2次返回 null 不触发", () => {
102+
const history: ToolCallRecord[] = [
103+
{ name: "execute_script", args: '{"code":"a()"}', result: '{"result":null}', iteration: 1 },
104+
{ name: "execute_script", args: '{"code":"b()"}', result: '{"result":null}', iteration: 2 },
105+
];
106+
expect(detectToolCallIssues(history)).toBeNull();
107+
});
108+
109+
it("中间有非 null 结果打断连续计数", () => {
110+
const history: ToolCallRecord[] = [
111+
{ name: "execute_script", args: '{"code":"a()"}', result: '{"result":null}', iteration: 1 },
112+
{ name: "execute_script", args: '{"code":"b()"}', result: '{"result":"ok"}', iteration: 2 },
113+
{ name: "execute_script", args: '{"code":"c()"}', result: '{"result":null}', iteration: 3 },
114+
{ name: "execute_script", args: '{"code":"d()"}', result: '{"result":null}', iteration: 4 },
115+
];
116+
// 从最新往回数只有2个连续 null,不足3个
117+
expect(detectToolCallIssues(history)).toBeNull();
118+
});
119+
});
120+
121+
describe("get_tab_content 重复调用检测", () => {
122+
it("同一 tab 调用3次时生成警告", () => {
123+
const history: ToolCallRecord[] = [
124+
{ name: "get_tab_content", args: '{"tab_id":123,"prompt":"find buttons"}', result: "...", iteration: 1 },
125+
{ name: "execute_script", args: '{"code":"click()"}', result: '{"result":"ok"}', iteration: 2 },
126+
{ name: "get_tab_content", args: '{"tab_id":123,"prompt":"find the button"}', result: "...", iteration: 3 },
127+
{ name: "execute_script", args: '{"code":"click2()"}', result: '{"result":"ok"}', iteration: 4 },
128+
{ name: "get_tab_content", args: '{"tab_id":123,"prompt":"detailed info"}', result: "...", iteration: 5 },
129+
];
130+
const warning = detectToolCallIssues(history);
131+
expect(warning).not.toBeNull();
132+
expect(warning).toContain("get_tab_content");
133+
});
134+
135+
it("不同 tab 不触发", () => {
136+
const history: ToolCallRecord[] = [
137+
{ name: "get_tab_content", args: '{"tab_id":123}', result: "...", iteration: 1 },
138+
{ name: "get_tab_content", args: '{"tab_id":456}', result: "...", iteration: 2 },
139+
{ name: "get_tab_content", args: '{"tab_id":789}', result: "...", iteration: 3 },
140+
];
141+
expect(detectToolCallIssues(history)).toBeNull();
142+
});
143+
});
144+
145+
describe("通用重复调用检测", () => {
146+
it("最近8条中同一工具出现5次时生成警告", () => {
147+
const history: ToolCallRecord[] = [];
148+
for (let i = 1; i <= 5; i++) {
149+
history.push({
150+
name: "web_search",
151+
args: `{"query":"search ${i}"}`,
152+
result: "...",
153+
iteration: i,
154+
});
155+
}
156+
const warning = detectToolCallIssues(history);
157+
expect(warning).not.toBeNull();
158+
expect(warning).toContain("web_search");
159+
});
160+
161+
it("查询类工具不参与通用计数", () => {
162+
const history: ToolCallRecord[] = [];
163+
for (let i = 1; i <= 6; i++) {
164+
history.push({ name: "list_tasks", args: "{}", result: "[]", iteration: i });
165+
}
166+
expect(detectToolCallIssues(history)).toBeNull();
167+
});
168+
169+
it("不同工具不合并计数", () => {
170+
const history: ToolCallRecord[] = [
171+
{ name: "web_search", args: '{"query":"a"}', result: "...", iteration: 1 },
172+
{ name: "web_fetch", args: '{"url":"b"}', result: "...", iteration: 2 },
173+
{ name: "web_search", args: '{"query":"c"}', result: "...", iteration: 3 },
174+
{ name: "web_fetch", args: '{"url":"d"}', result: "...", iteration: 4 },
175+
{ name: "web_search", args: '{"query":"e"}', result: "...", iteration: 5 },
176+
{ name: "web_fetch", args: '{"url":"f"}', result: "...", iteration: 6 },
177+
];
178+
expect(detectToolCallIssues(history)).toBeNull();
179+
});
180+
});
181+
182+
describe("优先级", () => {
183+
it("完全相同参数的 execute_script 优先触发重复检测而非 null 检测", () => {
184+
const history: ToolCallRecord[] = [
185+
{ name: "execute_script", args: '{"code":"a()"}', result: '{"result":null}', iteration: 1 },
186+
{ name: "execute_script", args: '{"code":"b()"}', result: '{"result":null}', iteration: 2 },
187+
{ name: "execute_script", args: '{"code":"a()"}', result: '{"result":null}', iteration: 3 },
188+
];
189+
const warning = detectToolCallIssues(history);
190+
expect(warning).not.toBeNull();
191+
// 应该触发重复检测(规则1),而不是 null 检测(规则2)
192+
expect(warning).toContain("identical arguments");
193+
});
194+
});
195+
});

0 commit comments

Comments
 (0)