Skip to content

Commit 6f1c15f

Browse files
feat: add plan mode for task execution — allows users to generate and review plans before execution, includes caching improvements for Anthropic prompts
1 parent b124fd1 commit 6f1c15f

13 files changed

Lines changed: 563 additions & 18 deletions

File tree

CHANGELOG.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,66 @@ For releases before v1.3.35, see [GitHub Releases](https://github.com/VladoIvank
1111
> as the social-share summary (IFTTT → X/Bluesky), capped at 220 chars.
1212
> If omitted, the feed falls back to the first paragraph.
1313
14+
## [2.0.2] — 2026-05-19
15+
16+
> Two big quality-of-life additions: Anthropic prompt caching is on by default (60–90% cheaper on cache-eligible input), and `/plan` lets you preview an agent's full plan before any file gets touched. Run `/go` to execute, or `/plan <revised task>` to refine.
17+
18+
### Added — Anthropic prompt caching, automatic
19+
20+
- **Two cache breakpoints per request**: the system prompt (and embedded
21+
skills catalog / project intelligence) and the tools array. Cache hits
22+
bill at 0.1× the input rate; cache writes at 1.25×. Net win after the
23+
second same-shape request, which is every iteration in an agent loop.
24+
Below 1024 input tokens Anthropic silently skips caching — no error
25+
path. Applies to the agent chat path, the agent fallback path, and
26+
the chat() path used by `/agent` and inline replies. Also propagates
27+
through OpenRouter → Anthropic routes (caching headers honoured
28+
upstream).
29+
- **`TokenUsage.cacheCreationTokens` + `cacheReadTokens`** fields
30+
surfaced on every record. `getCacheStats()` aggregates per-session
31+
cache hits, misses, and estimated USD savings vs running without
32+
caching. `/cost` (and `/stats`) renders a new "Prompt caching"
33+
section when at least one cached call landed.
34+
35+
### Added — Plan mode (`/plan` + `/go`)
36+
37+
- **`/plan <task>`** — generates a numbered plan for the task (no tool
38+
calls, no file changes), surfaces it as a Markdown message so you can
39+
review what the agent would do, which files it would touch, what
40+
commands it would run, and the risk level it self-assesses. Holds
41+
the (task, plan) pair as the *pending* plan, scoped to the current
42+
process. Re-running `/plan <revised task>` replaces the pending plan
43+
with a new one (you pay one extra LLM call but get readable revision
44+
history in the chat).
45+
- **`/go`** — executes the pending plan: hands the task + approved plan
46+
as a single prompt to the regular agent loop, so all MCP tools,
47+
lifecycle hooks, verification, permissions, and skill bundles apply
48+
unchanged. Includes an explicit anti-improvisation clause in the
49+
injected prompt — if any step turns out to be wrong mid-execution
50+
the agent must stop and report rather than silently rewriting the
51+
plan.
52+
- Available in **both the TUI and ACP clients** (Zed, VS Code). ACP
53+
`/plan` streams the plan back via `session/update`; ACP `/go` runs
54+
the agent inline and streams iterations through onChunk.
55+
- Surfaced in `/help` ("Agent Mode" section) and `/` autocomplete.
56+
57+
### Fixed
58+
59+
- **Anthropic streaming usage extraction missed cache fields.** Both
60+
the agent stream handler (`utils/agentStream.ts`) and the chat
61+
stream handler (`api/index.ts`) now pick up
62+
`cache_creation_input_tokens` and `cache_read_input_tokens` from the
63+
`message_start` event, so cached requests no longer undercount
64+
prompt tokens or display $0 savings.
65+
66+
### Notes
67+
68+
- OpenAI-format providers (OpenAI direct, Z.AI, DeepSeek, MiniMax,
69+
Ollama) don't expose explicit cache markers — those providers
70+
generally apply automatic prefix caching server-side. No code change
71+
on our end needed; cost reports stay accurate via standard
72+
`prompt_tokens` accounting.
73+
1474
## [2.0.1] — 2026-05-18
1575

1676
> Patch: `/mcp` now works in the CLI TUI (was only wired into the ACP path

src/acp/commands.ts

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -680,6 +680,63 @@ Anything else the agent should know — edge cases, gotchas, things to double-ch
680680

681681
// ─── Export ────────────────────────────────────────────────────────────────
682682

683+
// ─── Plan mode (2.0.2) ────────────────────────────────────────────────────
684+
685+
case 'plan': {
686+
// Identical contract to TUI /plan: generate a pre-execution plan,
687+
// surface it, hold as pending so /go can execute it without re-planning.
688+
if (!args.length) {
689+
const { getPendingPlan } = await import('../utils/planMode.js');
690+
const cur = getPendingPlan();
691+
return {
692+
handled: true,
693+
response: cur
694+
? `**Pending plan for:** _${cur.task}_\n\n${cur.plan}\n\n---\nRun \`/go\` to execute, or \`/plan <revised task>\` to revise.`
695+
: 'Usage: `/plan <task>` — generates a plan you can review, then `/go` to execute.',
696+
};
697+
}
698+
const task = args.join(' ');
699+
onChunk(`_Generating plan for: ${task.slice(0, 80)}${task.length > 80 ? '…' : ''}_\n\n`);
700+
try {
701+
const { generatePlan } = await import('../utils/planMode.js');
702+
const plan = await generatePlan(task);
703+
return {
704+
handled: true,
705+
response: `${plan}\n\n---\nRun \`/go\` to execute this plan, or \`/plan <revised task>\` to refine it.`,
706+
streaming: true,
707+
};
708+
} catch (err) {
709+
return { handled: true, response: `Plan generation failed: ${(err as Error).message}`, streaming: true };
710+
}
711+
}
712+
713+
case 'go': {
714+
const { getPendingPlan, composeExecutionPrompt, clearPendingPlan } = await import('../utils/planMode.js');
715+
const cur = getPendingPlan();
716+
if (!cur) {
717+
return { handled: true, response: 'No pending plan. Run `/plan <task>` first.' };
718+
}
719+
const prompt = composeExecutionPrompt(cur);
720+
clearPendingPlan();
721+
onChunk(`_Executing approved plan…_\n\n`);
722+
try {
723+
const { buildProjectContext } = await import('./session.js');
724+
const ctx = buildProjectContext(session.workspaceRoot);
725+
const agentResult = await runAgent(prompt, ctx, {
726+
abortSignal,
727+
onIteration: (_i: number, msg: string) => { onChunk(msg + '\n'); },
728+
onThinking: (text: string) => { onChunk(text); },
729+
});
730+
return {
731+
handled: true,
732+
response: agentResult.finalResponse || '_(plan executed; no final summary)_',
733+
streaming: true,
734+
};
735+
} catch (err) {
736+
return { handled: true, response: `Plan execution failed: ${(err as Error).message}`, streaming: true };
737+
}
738+
}
739+
683740
case 'export': {
684741
if (!session.history.length) return { handled: true, response: 'No messages to export.' };
685742
const format = (args[0] || 'md').toLowerCase();

src/acp/server.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,9 @@ const AVAILABLE_COMMANDS = [
7474
{ name: 'mcp', description: 'Manage MCP servers, marketplace, resources, prompts', input: { hint: '[browse | install <id> | add | remove | reload | resources | read <uri> | prompts | prompt <server> <name>]' } },
7575
{ name: 'openrouter', description: 'OpenRouter routing preferences (prefer/ignore/fallbacks/privacy/clear)', input: { hint: '[show | prefer <p,...> | ignore <p,...> | fallbacks on|off | privacy strict|allow | clear]' } },
7676
{ name: 'export', description: 'Export conversation', input: { hint: 'json | md | txt' } },
77+
// Plan mode (2.0.2)
78+
{ name: 'plan', description: 'Generate a numbered plan for a task — review before /go executes', input: { hint: '<task>' } },
79+
{ name: 'go', description: 'Execute the pending plan from /plan' },
7780
// Project intelligence
7881
{ name: 'scan', description: 'Scan project structure and generate summary' },
7982
{ name: 'review', description: 'Run code review on project or specific files', input: { hint: '[file…]' } },

src/api/index.ts

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -669,6 +669,13 @@ async function chatAnthropic(
669669
}
670670

671671
try {
672+
// Anthropic prompt caching: wrap system as an array with a
673+
// `cache_control` marker so the static system prompt (typically large
674+
// and stable across a session) is cached. Below 1024 input tokens
675+
// Anthropic silently skips caching — no error.
676+
const cachedSystem = useNativeSystem
677+
? { system: [{ type: 'text' as const, text: systemPrompt, cache_control: { type: 'ephemeral' as const } }] }
678+
: {};
672679
const response = await fetch(`${baseUrl}/v1/messages`, {
673680
method: 'POST',
674681
headers,
@@ -678,7 +685,7 @@ async function chatAnthropic(
678685
max_tokens: maxTokens,
679686
temperature,
680687
stream,
681-
...(useNativeSystem ? { system: systemPrompt } : {}),
688+
...cachedSystem,
682689
}),
683690
signal: controller.signal,
684691
});
@@ -727,6 +734,8 @@ async function handleAnthropicStream(
727734
let buffer = '';
728735
let inputTokens = 0;
729736
let outputTokens = 0;
737+
let cacheCreationTokens = 0;
738+
let cacheReadTokens = 0;
730739
let streamModel = '';
731740

732741
while (true) {
@@ -740,7 +749,7 @@ async function handleAnthropicStream(
740749
for (const line of lines) {
741750
if (line.startsWith('data: ')) {
742751
const data = line.slice(6);
743-
752+
744753
try {
745754
const parsed = JSON.parse(data);
746755
if (parsed.type === 'content_block_delta') {
@@ -750,9 +759,13 @@ async function handleAnthropicStream(
750759
onChunk(text);
751760
}
752761
}
753-
// message_start contains input_tokens
762+
// message_start contains input_tokens (and cache create/read
763+
// when prompt caching is in play).
754764
if (parsed.type === 'message_start' && parsed.message?.usage) {
755-
inputTokens = parsed.message.usage.input_tokens || 0;
765+
const u = parsed.message.usage;
766+
inputTokens = u.input_tokens || 0;
767+
cacheCreationTokens = u.cache_creation_input_tokens || 0;
768+
cacheReadTokens = u.cache_read_input_tokens || 0;
756769
streamModel = parsed.message.model || '';
757770
}
758771
// message_delta contains output_tokens
@@ -767,9 +780,16 @@ async function handleAnthropicStream(
767780
}
768781

769782
// Record token usage
770-
if (inputTokens > 0 || outputTokens > 0) {
783+
if (inputTokens > 0 || outputTokens > 0 || cacheReadTokens > 0 || cacheCreationTokens > 0) {
784+
const totalPrompt = inputTokens + cacheCreationTokens + cacheReadTokens;
771785
recordTokenUsage(
772-
{ promptTokens: inputTokens, completionTokens: outputTokens, totalTokens: inputTokens + outputTokens },
786+
{
787+
promptTokens: totalPrompt,
788+
completionTokens: outputTokens,
789+
totalTokens: totalPrompt + outputTokens,
790+
cacheCreationTokens: cacheCreationTokens || undefined,
791+
cacheReadTokens: cacheReadTokens || undefined,
792+
},
773793
streamModel || 'unknown',
774794
config.get('provider')
775795
);

src/renderer/App.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,8 @@ const COMMAND_DESCRIPTIONS: Record<string, string> = {
105105
'hooks': 'List installed lifecycle hooks (.codeep/hooks/<event>.sh)',
106106
'mcp': 'Manage MCP servers (browse, install, add, remove, resources, prompts)',
107107
'openrouter': 'Tune OpenRouter routing (preferred / ignore providers, fallbacks, privacy)',
108+
'plan': 'Generate a numbered plan for a task — review before /go executes it',
109+
'go': 'Execute the pending plan from /plan',
108110
};
109111

110112
import { helpCategories, keyboardShortcuts } from './components/Help';
@@ -297,6 +299,8 @@ export class App {
297299
// Keep in lockstep with COMMAND_DESCRIPTIONS below and helpCategories.
298300
'compact', 'commands', 'checkpoint', 'checkpoints', 'rewind',
299301
'hooks', 'mcp', 'openrouter',
302+
// 2.0.2 — plan mode.
303+
'plan', 'go',
300304
'c', 't', 'd', 'r', 'f', 'e', 'o', 'b', 'p',
301305
];
302306

src/renderer/commands.ts

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,6 +248,59 @@ export async function handleCommand(
248248
break;
249249
}
250250

251+
case 'plan': {
252+
// Plan mode: ask the model for a plan, surface it, hold as pending.
253+
// The user runs /go to execute or /plan <revised> to revise. See
254+
// src/utils/planMode.ts for the rationale + system prompt.
255+
if (!args.length) {
256+
const { getPendingPlan } = await import('../utils/planMode');
257+
const cur = getPendingPlan();
258+
if (cur) {
259+
ctx.app.addMessage({
260+
role: 'system',
261+
content: `**Pending plan for:** _${cur.task}_\n\n${cur.plan}\n\n---\nRun \`/go\` to execute, or \`/plan <revised task>\` to revise.`,
262+
});
263+
} else {
264+
ctx.app.notify('Usage: /plan <task> — generates a plan you can review, then /go to execute.');
265+
}
266+
return;
267+
}
268+
if (ctx.isAgentRunning()) { ctx.app.notify('Agent already running. Use /stop first.'); return; }
269+
const task = args.join(' ');
270+
ctx.app.addMessage({ role: 'user', content: `/plan ${task}` });
271+
ctx.app.notify('Generating plan…');
272+
try {
273+
const { generatePlan } = await import('../utils/planMode');
274+
const plan = await generatePlan(task);
275+
ctx.app.addMessage({
276+
role: 'assistant',
277+
content: `${plan}\n\n---\nRun \`/go\` to execute this plan, or \`/plan <revised task>\` to refine it.`,
278+
});
279+
} catch (err) {
280+
ctx.app.notify(`Plan generation failed: ${(err as Error).message}`);
281+
}
282+
break;
283+
}
284+
285+
case 'go': {
286+
// Execute the pending plan from /plan. The agent loop sees the
287+
// task + plan as a single prompt, so MCP tools, hooks, permissions,
288+
// and verification all apply unchanged.
289+
const { getPendingPlan, composeExecutionPrompt, clearPendingPlan } = await import('../utils/planMode');
290+
const cur = getPendingPlan();
291+
if (!cur) {
292+
ctx.app.notify('No pending plan. Run `/plan <task>` first.');
293+
return;
294+
}
295+
if (ctx.isAgentRunning()) { ctx.app.notify('Agent already running. Use /stop first.'); return; }
296+
const prompt = composeExecutionPrompt(cur);
297+
clearPendingPlan();
298+
ctx.app.notify(`Executing plan for: ${cur.task.slice(0, 80)}${cur.task.length > 80 ? '…' : ''}`);
299+
const { runAgentTask } = await import('./agentExecution');
300+
runAgentTask(prompt, false, ctx, () => null, () => {});
301+
break;
302+
}
303+
251304
case 'stop': {
252305
if (ctx.isAgentRunning() && ctx.abortController) {
253306
ctx.abortController.abort();

src/renderer/components/Help.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,8 @@ export const helpCategories: HelpCategory[] = [
5656
items: [
5757
{ key: '/agent <task>', description: 'Run agent with task' },
5858
{ key: '/agent-dry <task>', description: 'Dry run (no changes)' },
59+
{ key: '/plan <task>', description: 'Generate a plan first — review before /go executes' },
60+
{ key: '/go', description: 'Execute the pending plan from /plan' },
5961
{ key: '/stop', description: 'Stop running agent' },
6062
{ key: '/undo', description: 'Undo last agent action' },
6163
{ key: '/undo-all', description: 'Undo all agent actions' },

src/utils/agentChat.ts

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -338,9 +338,25 @@ export async function agentChat(
338338
};
339339
} else {
340340
endpoint = `${baseUrl}/v1/messages`;
341+
// Anthropic prompt caching. Two cache breakpoints:
342+
// 1. `system` (largest stable block — system prompt + skills catalog)
343+
// 2. last tool in `tools` (Anthropic caches everything up to and
344+
// including the marker, so this caches the entire tools array)
345+
// Cache hits cost 0.1× input. Misses ("cache creation") cost 1.25×.
346+
// Net win after the 2nd same-shape request. Below 1024 input tokens
347+
// Anthropic silently skips caching — no error path to handle.
348+
const anthropicTools = getAnthropicTools(additionalTools);
349+
const cachedTools = anthropicTools.length > 0
350+
? [
351+
...anthropicTools.slice(0, -1),
352+
{ ...anthropicTools[anthropicTools.length - 1], cache_control: { type: 'ephemeral' as const } },
353+
]
354+
: anthropicTools;
341355
body = {
342-
model, system: systemPrompt, messages,
343-
tools: getAnthropicTools(additionalTools), stream: useStreaming,
356+
model,
357+
system: [{ type: 'text', text: systemPrompt, cache_control: { type: 'ephemeral' as const } }],
358+
messages,
359+
tools: cachedTools, stream: useStreaming,
344360
...tempParam, max_tokens: getEffectiveMaxTokens(providerId, Math.max(config.get('maxTokens'), 16384)),
345361
};
346362
}
@@ -477,10 +493,15 @@ export async function agentChatFallback(
477493
};
478494
} else {
479495
endpoint = `${baseUrl}/v1/messages`;
496+
// Fallback path injects system+tools as the first user message
497+
// (no native tool API). Cache that block — it's large and stable.
480498
body = {
481499
model,
482500
messages: [
483-
{ role: 'user', content: fallbackPrompt },
501+
{
502+
role: 'user',
503+
content: [{ type: 'text', text: fallbackPrompt, cache_control: { type: 'ephemeral' as const } }],
504+
},
484505
{ role: 'assistant', content: 'Understood. I will use the tools as specified.' },
485506
...messages,
486507
],

src/utils/agentStream.ts

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -210,12 +210,29 @@ export async function handleAnthropicAgentStream(
210210
try {
211211
const parsed = JSON.parse(data);
212212

213-
// message_start has input tokens; message_delta has output tokens — merge both
213+
// message_start has input tokens (incl. cache create/read fields if
214+
// prompt caching is in use); message_delta has output tokens —
215+
// merge both so extractAnthropicUsage sees the full picture.
214216
if (parsed.type === 'message_start' && parsed.message?.usage) {
215-
usageData = { usage: { input_tokens: parsed.message.usage.input_tokens || 0, output_tokens: 0 } };
217+
const u = parsed.message.usage;
218+
usageData = {
219+
usage: {
220+
input_tokens: u.input_tokens || 0,
221+
output_tokens: 0,
222+
cache_creation_input_tokens: u.cache_creation_input_tokens || 0,
223+
cache_read_input_tokens: u.cache_read_input_tokens || 0,
224+
},
225+
};
216226
} else if (parsed.type === 'message_delta' && parsed.usage) {
217-
const inputTokens: number = (usageData as any)?.usage?.input_tokens || 0;
218-
usageData = { usage: { input_tokens: inputTokens, output_tokens: parsed.usage.output_tokens || 0 } };
227+
const prev: Record<string, number> = (usageData as { usage?: Record<string, number> } | null)?.usage ?? {};
228+
usageData = {
229+
usage: {
230+
input_tokens: prev.input_tokens || 0,
231+
output_tokens: parsed.usage.output_tokens || 0,
232+
cache_creation_input_tokens: prev.cache_creation_input_tokens || 0,
233+
cache_read_input_tokens: prev.cache_read_input_tokens || 0,
234+
},
235+
};
219236
}
220237

221238
if (parsed.type === 'content_block_start') {

0 commit comments

Comments
 (0)