Skip to content

Commit 618ecae

Browse files
committed
feat: 实现渐进式工具披露、自动验证传感器和内置验证Agent
- 新增 ToolSearch 工具支持按需加载 deferred 工具 schema - 新增 AutoVerifyStage 在 Edit/Write 后自动运行类型检查并注入错误 - 新增内置验证 Subagent 用于独立代码质量评估 - 新增 PromptHook 支持 LLM 推理型传感器(如代码审查) - 新增 ConfigTool 和 update-config skill 用于配置管理 - 实现 Ralph Loop 机制在 Spec 未完成时自动继续执行 - 改进上下文压缩后自动恢复最近访问的文件内容 - 增强工具结果预算管理,支持消息级聚合限制
1 parent d4e5ad3 commit 618ecae

27 files changed

Lines changed: 2343 additions & 39 deletions

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,3 +85,4 @@ packages/vscode/*.vsix
8585
# Web 构建缓存
8686
packages/web/.vite/
8787
packages/cli/web/.vite/
88+
.gstack/

packages/cli/src/agent/loop/completionPolicy.ts

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
* 1. checkOutputRecovery — finishReason === 'length' 时的恢复/截断判断
66
* 2. checkIncompleteIntent — 检测 LLM "说了要做但没做"的模式
77
* 3. checkStopHook — 执行 stop hook 并加超时保护
8+
* 4. checkRalphLoop — Spec 未完成任务时自动继续(Ralph Loop 模式)
89
*
910
* 所有函数返回 action descriptors,不执行副作用。
1011
*/
@@ -181,3 +182,85 @@ export async function checkStopHook(context: {
181182
return { action: 'stop' };
182183
}
183184
}
185+
186+
// ===== Ralph Loop (Spec-Aware Auto-Continue) =====
187+
188+
/** Ralph Loop 安全阈值:当轮次超过最大轮次的 90% 时停止,防止无限循环 */
189+
const RALPH_LOOP_SAFETY_RATIO = 0.9;
190+
191+
export type RalphLoopAction =
192+
| { action: 'continue'; reason: string }
193+
| { action: 'none' };
194+
195+
/**
196+
* Ralph Loop:当 Spec 处于 implementation 阶段且有未完成任务时,
197+
* 自动继续执行而不停止。
198+
*
199+
* 触发条件(全部满足):
200+
* 1. Spec 模式活跃且处于 implementation 阶段
201+
* 2. 存在未完成任务
202+
* 3. 轮次未超出安全阈值(防止无限循环)
203+
*/
204+
export async function checkRalphLoop(context: {
205+
turnsCount: number;
206+
maxTurns: number;
207+
}): Promise<RalphLoopAction> {
208+
try {
209+
// 延迟导入避免循环依赖
210+
const { SpecManager } = await import('../../spec/SpecManager.js');
211+
const specManager = SpecManager.getInstance();
212+
213+
if (!specManager.isActive()) {
214+
return { action: 'none' };
215+
}
216+
217+
const spec = specManager.getCurrentSpec();
218+
if (!spec || spec.phase !== 'implementation') {
219+
return { action: 'none' };
220+
}
221+
222+
// 安全阈值检查
223+
if (
224+
context.maxTurns > 0 &&
225+
context.turnsCount >= context.maxTurns * RALPH_LOOP_SAFETY_RATIO
226+
) {
227+
logger.info(
228+
`[RalphLoop] 轮次接近上限 (${context.turnsCount}/${context.maxTurns}),停止自动继续`,
229+
);
230+
return { action: 'none' };
231+
}
232+
233+
const tasks = spec.tasks ?? [];
234+
const completed = tasks.filter(
235+
(t: { status: string }) =>
236+
t.status === 'completed' || t.status === 'skipped',
237+
).length;
238+
const total = tasks.length;
239+
240+
if (completed >= total) {
241+
return { action: 'none' };
242+
}
243+
244+
// 找到下一个待执行任务
245+
const nextTask = tasks.find(
246+
(t: { status: string }) =>
247+
t.status === 'pending' || t.status === 'in_progress',
248+
);
249+
250+
const reason =
251+
`[Ralph Loop] Spec "${spec.name}" 仍有未完成任务。\n` +
252+
`进度: ${completed}/${total} 任务已完成。\n` +
253+
(nextTask
254+
? `下一个任务: ${nextTask.title}${nextTask.description ? ` — ${nextTask.description}` : ''}\n`
255+
: '') +
256+
'请继续执行下一个未完成的任务,不要停止。';
257+
258+
logger.info(
259+
`[RalphLoop] Spec "${spec.name}" 进度 ${completed}/${total},自动继续`,
260+
);
261+
return { action: 'continue', reason };
262+
} catch {
263+
// SpecManager 不可用时(如未初始化),静默跳过
264+
return { action: 'none' };
265+
}
266+
}

packages/cli/src/agent/loop/executeLoopGenerator.ts

Lines changed: 60 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,10 @@ import { CompactionService } from '../../context/CompactionService.js';
1212
import { ReactiveCompaction } from '../../context/ReactiveCompaction.js';
1313
import { snipCompact } from '../../context/SnipCompaction.js';
1414
import { createBudgetTracker, recordOutput } from '../../context/TokenBudget.js';
15-
import { applyToolResultBudget } from '../../context/ToolResultBudget.js';
15+
import {
16+
applyToolResultBudget,
17+
MessageBudgetTracker,
18+
} from '../../context/ToolResultBudget.js';
1619
import { createLogger, LogCategory } from '../../logging/Logger.js';
1720
import type {
1821
ChatResponse,
@@ -34,6 +37,7 @@ import {
3437
checkOutputRecovery,
3538
checkIncompleteIntent,
3639
checkStopHook,
40+
checkRalphLoop,
3741
} from './completionPolicy.js';
3842
import {
3943
saveUserMessage,
@@ -415,8 +419,20 @@ export async function* executeLoopGenerator(
415419
rawTools = injectSkillsMetadata(rawTools);
416420
const tools = deps.applySkillToolRestrictions(rawTools);
417421

422+
// 1.5 注入 deferred tools listing 到系统提示
423+
let finalSystemPrompt = systemPrompt;
424+
if (
425+
typeof registry.getDeferredToolsListing === 'function'
426+
) {
427+
const deferredListing = registry.getDeferredToolsListing();
428+
if (deferredListing && finalSystemPrompt) {
429+
finalSystemPrompt =
430+
`${finalSystemPrompt}\n\n${deferredListing}`;
431+
}
432+
}
433+
418434
// 2. 构建消息历史 — 使用 ConversationState 单一消息源
419-
const state = new ConversationState(context, systemPrompt);
435+
const state = new ConversationState(context, finalSystemPrompt);
420436
state.appendUser({ role: 'user', content: message });
421437

422438
// 保存用户消息到 JSONL
@@ -511,6 +527,9 @@ export async function* executeLoopGenerator(
511527
signal: options?.signal,
512528
confirmationHandler: context.confirmationHandler,
513529
permissionMode: context.permissionMode,
530+
toolRegistry: registry,
531+
deferredToolManager:
532+
registry.deferredToolManager,
514533
},
515534
deps.executionPipeline.getRegistry(),
516535
deps.executionEngine?.getContextManager(),
@@ -715,6 +734,37 @@ export async function* executeLoopGenerator(
715734
// 正常完成时归零 incompleteIntentRetryCount
716735
incompleteIntentRetryCount = 0;
717736

737+
// Ralph Loop: Spec 未完成任务时自动继续
738+
const ralphAction = await checkRalphLoop({
739+
turnsCount,
740+
maxTurns,
741+
});
742+
if (ralphAction.action === 'continue') {
743+
state.appendAssistant({
744+
role: 'assistant',
745+
content: turnResult.content || '',
746+
reasoningContent: turnResult.reasoningContent,
747+
});
748+
749+
const ralphAssistantUuid = await saveAssistantMessage(
750+
deps, context, turnResult.content || '', lastMessageUuid,
751+
);
752+
if (ralphAssistantUuid) lastMessageUuid = ralphAssistantUuid;
753+
754+
const ralphMsg: Message = {
755+
role: 'user',
756+
content: `\n\n<system-reminder>\n${ralphAction.reason}\n</system-reminder>`,
757+
};
758+
state.appendControl('user', ralphMsg);
759+
760+
const ralphUserUuid = await saveUserMessage(
761+
deps, context, ralphMsg.content as string, lastMessageUuid,
762+
);
763+
if (ralphUserUuid) lastMessageUuid = ralphUserUuid;
764+
765+
continue;
766+
}
767+
718768
// Stop Hook (via completionPolicy, with timeout)
719769
const stopAction = await checkStopHook({
720770
sessionId: context.sessionId,
@@ -874,6 +924,9 @@ export async function* executeLoopGenerator(
874924
signal: options?.signal,
875925
confirmationHandler: context.confirmationHandler,
876926
permissionMode: context.permissionMode,
927+
toolRegistry: registry,
928+
deferredToolManager:
929+
registry.deferredToolManager,
877930
}
878931
);
879932
return { toolCall, result, toolUseUuid };
@@ -906,6 +959,7 @@ export async function* executeLoopGenerator(
906959
}
907960

908961
// 8. 处理执行结果
962+
const messageBudget = new MessageBudgetTracker();
909963
for (const { toolCall: rawToolCall, result, toolUseUuid } of executionResults) {
910964
// 安全断言:所有 toolCall 都是 function 类型
911965
const toolCall = rawToolCall as {
@@ -991,11 +1045,12 @@ export async function* executeLoopGenerator(
9911045
toolResultContent = JSON.stringify(toolResultContent, null, 2);
9921046
}
9931047

994-
// Apply tool result budget — truncate oversized results
995-
if (typeof toolResultContent === 'string' && toolResultContent.length > 100_000) {
1048+
// Apply tool result budget — per-tool + per-message 截断
1049+
if (typeof toolResultContent === 'string') {
9961050
toolResultContent = applyToolResultBudget(
9971051
toolResultContent,
998-
toolCall.function.name
1052+
toolCall.function.name,
1053+
{ messageBudget },
9991054
) as string;
10001055
}
10011056

packages/cli/src/agent/subagents/builtinAgents.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
*/
77

88
import type { SubagentConfig } from './types.js';
9+
import { verificationAgentConfig } from './builtinVerificationAgent.js';
910

1011
/**
1112
* 内置 Subagent 列表(4 个核心 agent)
@@ -105,6 +106,7 @@ Be thorough but concise. Focus on actionable steps.`,
105106
"Use this agent to configure the user's Claude Code status line setting.",
106107
tools: ['Read', 'Edit'],
107108
},
109+
verificationAgentConfig,
108110
];
109111

110112

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
/**
2+
* 内置验证 Subagent 配置
3+
*
4+
* 独立验证 Agent,用于在实现完成后进行质量评估。
5+
* 严格只读 — 不能修改代码,只能运行构建、测试、lint 和对抗性检查。
6+
*/
7+
8+
import type { SubagentConfig } from './types.js';
9+
10+
/**
11+
* 验证 Agent 系统提示
12+
*/
13+
const VERIFICATION_SYSTEM_PROMPT = `# Verification Agent
14+
15+
You are an **independent verification engineer**. Your sole purpose \
16+
is to find problems — not to praise or reassure. You are the last \
17+
line of defense before code ships.
18+
19+
## Constraints
20+
21+
1. **READ-ONLY**: You have NO write tools (no Edit, Write, or \
22+
NotebookEdit). You cannot modify files. If you discover issues, \
23+
report them — do not attempt to fix them.
24+
2. **NO SUB-AGENTS**: You must not delegate to other agents or use \
25+
the Task tool. Execute all verification steps yourself using your \
26+
tools directly.
27+
3. **TOOL-BASED EVIDENCE ONLY**: Every claim must be backed by \
28+
actual tool output. Never say "looks correct" or "should work" — \
29+
run the command and prove it.
30+
4. **NO ASSUMPTIONS**: Do not assume tests pass. Do not assume types \
31+
are correct. Run the checks.
32+
33+
## Verification Workflow
34+
35+
Execute these phases in order. Do NOT skip any phase.
36+
37+
### Phase 1: Project Setup Detection
38+
39+
1. Use Glob to find project config files: \`package.json\`, \
40+
\`tsconfig.json\`, \`biome.json\`, \`.eslintrc.*\`, \
41+
\`vitest.config.*\`, \`jest.config.*\`, \`Makefile\`, \
42+
\`Cargo.toml\`, \`go.mod\`, etc.
43+
2. Use Read to examine them and determine:
44+
- Package manager (bun/npm/pnpm/yarn)
45+
- Available scripts (test, lint, type-check, build)
46+
- Project language and framework
47+
3. Identify which checks are available for this project.
48+
49+
### Phase 2: Automated Checks
50+
51+
Run all applicable checks. Capture full output.
52+
53+
| Check | Typical Command | Priority |
54+
|-------|----------------|----------|
55+
| **Type checking** | \`bun run type-check\` or \`npx tsc --noEmit\` | HIGH |
56+
| **Tests** | \`bun run test:all\` or \`npm test\` | HIGH |
57+
| **Linting** | \`bun run lint\` or \`npx biome check\` | HIGH |
58+
| **Build** | \`bun run build\` | MEDIUM |
59+
60+
- If a command fails, record the exact error output.
61+
- If a command succeeds, record confirmation.
62+
- Set reasonable timeouts (use Bash timeout parameter).
63+
64+
### Phase 3: Code Review of Changed Files
65+
66+
1. Run \`git diff --name-only HEAD~1\` (or appropriate range) to \
67+
identify changed files.
68+
2. Read each changed file and review for:
69+
- **Logic errors**: off-by-one, null/undefined handling, race \
70+
conditions
71+
- **Type safety**: any casts, type assertions, missing null checks
72+
- **Error handling**: uncaught exceptions, missing error paths
73+
- **Edge cases**: empty arrays, empty strings, boundary values
74+
- **Security**: injection risks, credential exposure, unsafe eval
75+
- **Code style**: naming conventions, dead code, commented-out code
76+
77+
### Phase 4: Adversarial Analysis
78+
79+
Think like an attacker or a hostile user:
80+
81+
1. **Input validation**: Are all inputs validated? What happens with \
82+
malformed data?
83+
2. **Boundary conditions**: What happens at limits? (max length, \
84+
zero, negative)
85+
3. **Concurrency**: Are there race conditions or shared mutable \
86+
state issues?
87+
4. **Dependency risks**: Are new dependencies trustworthy? Pinned \
88+
versions?
89+
5. **Regression potential**: Could these changes break existing \
90+
functionality?
91+
92+
## Output Format
93+
94+
You MUST end your response with a structured verification report:
95+
96+
\`\`\`
97+
## Verification Result: PASS | FAIL | PARTIAL
98+
99+
### Automated Checks
100+
- [ ] Type check: PASS/FAIL — [details]
101+
- [ ] Tests: PASS/FAIL — [details, including test count]
102+
- [ ] Lint: PASS/FAIL — [details]
103+
- [ ] Build: PASS/FAIL — [details]
104+
105+
### Code Review Findings
106+
- [Issue severity: HIGH/MEDIUM/LOW] [file:line] Description
107+
Evidence: [exact code or output]
108+
109+
### Adversarial Analysis
110+
- [Risk level: HIGH/MEDIUM/LOW] Description
111+
Impact: [what could go wrong]
112+
113+
### Summary
114+
[1-3 sentence overall assessment with specific evidence]
115+
\`\`\`
116+
117+
### Verdict Rules
118+
119+
- **PASS**: All automated checks pass AND no HIGH severity issues \
120+
found.
121+
- **FAIL**: Any automated check fails OR any HIGH severity issue \
122+
found.
123+
- **PARTIAL**: All automated checks pass BUT MEDIUM severity issues \
124+
exist.
125+
126+
Be thorough. Be skeptical. Find the bugs.`;
127+
128+
/**
129+
* 验证 Agent 配置
130+
*
131+
* 独立验证 Agent,在实现完成后运行构建、测试、lint 和对抗性分析。
132+
* 严格只读 — 明确排除 Edit/Write/NotebookEdit/Task 等写入工具。
133+
*/
134+
export const verificationAgentConfig: SubagentConfig = {
135+
name: 'verification',
136+
description:
137+
'Independent verification agent that validates implementation'
138+
+ ' by running builds, tests, linters, and adversarial'
139+
+ ' probes. Strictly read-only — cannot modify code. Use'
140+
+ ' after completing implementation to get an independent'
141+
+ ' quality assessment.',
142+
tools: ['Read', 'Glob', 'Grep', 'Bash'],
143+
systemPrompt: VERIFICATION_SYSTEM_PROMPT,
144+
source: 'builtin',
145+
};

0 commit comments

Comments
 (0)