You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .squad/agents/bender/history.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -97,3 +97,7 @@
97
97
- Hermes completed the semantic cleanup, not just file removal: plugin metadata now advertises the guide + agent surface, repo docs no longer teach repo-local skill installation, and late Part 4 numbering is contiguous again.
98
98
- Live filesystem and `rg` checks satisfied the main acceptance criteria: no top-level `skills/`, no `skills-lock.json`, and one canonical image copy under `docs/assets/`.
99
99
- Remaining gap is environmental, not yet a content failure: strict MkDocs build still cannot be executed here because `mkdocs` is not installed.
- Hermes placed RTK Windows cautions, VS Code extension/profile cleanup, custom-agent cost control, model-switch risk, and Copilot CLI AIC value framing in natural locations without overstating hidden internals or exact billing math.
Copy file name to clipboardExpand all lines: .squad/decisions.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -196,6 +196,13 @@
196
196
- Cleanup stays semantic, not just structural: docs and plugin metadata must stop advertising shipped installable skills.
197
197
- Validation target stays bounded: live `rg` and filesystem checks must pass; strict MkDocs build remains desirable but may be blocked if `mkdocs` is unavailable.
198
198
199
+
### 2026-06-17: Token Recommendations Placement
200
+
201
+
**Author:** Hermes | **Status:** Active | **Requested by:** Marco Olivo
202
+
203
+
- Placed RTK Windows caution in MCP/tool-cost sections, VS Code extension/profile and custom-agent guidance in MCP/practical setup, model-switch cache risk in model-pricing anti-patterns, and Copilot CLI AIC value-framing in habit-building maintenance.
204
+
- Each recommendation now sits beside the mechanism it affects, avoiding a new page and keeping README changes limited to high-impact quick-start nudges.
Copy file name to clipboardExpand all lines: README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,12 +21,12 @@ Don't have time to read the full guide? Do these today and cut your token usage:
21
21
| 1 |**Request code-only responses** — add `Code only, no explanation.` to `copilot-instructions.md`. Highest per-token ROI: output costs 5× more than input, and this cuts 40-70% of output on every code task, permanently | Shrinks response length | 0 minutes |
22
22
| 2 |**Constrain output format by default** — add `Bullets over paragraphs. No explanations unless asked.` to `copilot-instructions.md`| Keeps answers terse | 0 minutes |
23
23
| 3 |**Shrink your always-on context** — compress `copilot-instructions.md` AND prune `AGENTS.md` to landmines only. Every token in either file is billed on every interaction (and every agent step). Strip filler, delete anything the agent discovers by reading code, delete LLM-generated `/init` boilerplate | Reduces always-on input/context | 15 minutes |
24
-
| 4 |**Default to Auto model selection** — use Auto as the baseline because it chooses from the supported Auto pool and gives a paid-plan discount. Pin higher-cost models manually when a task clearly justifies them. See [Model Selection & Pricing](docs/11-models-and-pricing.md)| Lowers billed rate on eligible usage | 0 minutes |
24
+
| 4 |**Default to Auto model selection** — use Auto as the baseline because it chooses from the supported Auto pool and gives a paid-plan discount. Pin higher-cost models manually only when a task clearly justifies them, and start a fresh chat when switching cost lanes in a long session. See [Model Selection & Pricing](docs/11-models-and-pricing.md)| Lowers billed rate on eligible usage | 0 minutes |
25
25
| 5 |**Use Ask Mode for simple questions** — reserve Agent Mode for multi-step tasks | Avoids agent overhead | 0 minutes (just choose the right mode) |
26
26
| 6 |**Scope context with `applyTo:` paths** — split one large instructions file into small scoped ones that load only when relevant | Reduces always-on input/context | 15 minutes |
27
27
| 7 |**Be precise in your prompts** — "Add null check to `getUser()`" not "Can you please look at this and maybe add some error handling?" Note: your typed prompt is a small fraction of total input; precision matters more for quality than for raw token savings | Improves task targeting | 0 minutes |
28
28
| 8 |**Retune prompts to the target model** — provider prompting guides change by model/version. Paste the official guide URL into Copilot and ask it to adapt `.github/copilot-instructions.md`, agent profiles, or app prompts for the model you actually use | Reduces rework | 10 minutes per model change |
29
-
| 9 |**Audit your MCP servers** — disable servers you're not using; each costs ~100-500 tokens per agent step | Removes tool/schema overhead | 5 minutes |
29
+
| 9 |**Audit your MCP servers and injected tools** — disable unused MCP servers and VS Code extensions that add skills/tools; use a clean coding profile or focused custom agent for repeat workflows. Each MCP tool costs ~100-500 tokens per agent step | Removes tool/schema overhead | 5-10 minutes |
30
30
| 10 |**Convert rich files to Markdown before AI work** — `.docx`, `.pdf`, `.pptx`, `.xlsx`, HTML, images, audio, video, and ZIPs carry format tax. [Marc Bara's writeup](https://medium.com/@marc.bara.iniesta/your-docx-is-wasting-33-of-your-ai-budget-86a3d229d042) shows the cost; use [Microsoft MarkItDown](https://github.com/microsoft/markitdown) before chat, agent, or RAG ingestion | Reduces noisy input context | 5 minutes |
31
31
| 11 |**Run `/chronicle improve` weekly** (**Copilot CLI only**, experimental) — this slash command works in interactive Copilot CLI sessions, not as a general Copilot Chat feature. It finds recurring confusion in your CLI session history and generates custom-instruction fixes so the same misread intent stops costing tokens forever | Cuts recurring rework | 2 minutes per run |
32
32
| 12 |**Try CodeAct for long tool chains** (**Copilot CLI only**, optional external plugin) — [`copilot-codeact-plugin`](https://github.com/jsturtevant/copilot-codeact-plugin) collapses multi-step tool chains into one sandboxed execution, which can reduce repeated replay of system prompt, prior messages, and tool definitions | Reduces tool-loop replay | 10-15 minutes |
@@ -124,7 +124,7 @@ Ranked by cost impact. Output first — it costs 5× more per token than input.
124
124
1.**Output control** — "Code only, no explanation" + terse default in `copilot-instructions.md`. 40-70% output savings on code tasks, 30-60% across all interactions. One instruction, permanent.
125
125
2.**Shrink always-on context** (`copilot-instructions.md` + `AGENTS.md`) — compress filler, prune to landmines only, delete LLM-generated boilerplate. Compounds on every interaction and agent step; 20-23% agent-task reduction plus better correctness
126
126
3.**Ask Mode for simple questions** — 60-90% savings by avoiding Agent overhead
127
-
4.**Audit MCP servers** — disable unused servers, save 5K-190K tokens per agent task
127
+
4.**Audit MCP servers and injected tools** — disable unused servers/extensions, or use a clean coding profile/custom agent, to save 5K-190K tokens per agent task
128
128
5.**Auto model selection** — lower-cost default routing plus paid-plan discount on eligible usage, zero effort
129
129
6.**Convert rich files to Markdown first** — avoid paying for Word/PDF/HTML layout noise in chat, agent, and RAG workflows
130
130
7.**Retune prompts to the target model** — better first-pass output reduces repeated clarification turns
Copy file name to clipboardExpand all lines: docs/06-workflow-optimization.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -99,7 +99,7 @@ Keep the claim bounded: this guide is **not** benchmarking CodeAct itself. The p
99
99
100
100
CodeAct reduces the *number* of tool calls. [**RTK (Rust Token Killer)**](https://github.com/rtk-ai/rtk) reduces the *size* of each tool call's result. They address different sides of the same problem and can be used together.
101
101
102
-
RTK is a CLI proxy that intercepts `git`, `cargo test`, `grep`, `ls`, and 100+ other dev commands and compresses their output before it reaches the agent — 60–90% savings per command. Unlike CodeAct, RTK works in all Copilot surfaces (VS Code, CLI, and other AI tools), not just Copilot CLI. See [MCP & Tool Costs §2.7.7](08-mcp-tool-costs.md#277-compress-tool-output-at-the-source-rtk) for setup and the full command list.
102
+
RTK is a CLI proxy that intercepts `git`, `cargo test`, `grep`, `ls`, and 100+ other dev commands and compresses their output before it reaches the agent — 60–90% savings per command. Unlike CodeAct, RTK is not limited to Copilot CLI; it can help across Copilot surfaces when the shell hook is reliable. Treat Windows setups as a pilot, not a default rollout. See [MCP & Tool Costs §2.7.7](08-mcp-tool-costs.md#277-compress-tool-output-at-the-source-rtk) for setup and the full command list.
Copy file name to clipboardExpand all lines: docs/08-mcp-tool-costs.md
+7-2Lines changed: 7 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ Free Space: 55.3k (28%)
18
18
Buffer: 40.4k (20%)
19
19
```
20
20
21
-
**VS Code Copilot:** no equivalent command, but you can estimate your `System/Tools` baseline by counting active MCP servers × tools × ~200 tokens average (see §2.7.2).
21
+
**VS Code Copilot:** no equivalent command, but you can estimate your `System/Tools` baseline by counting active MCP servers × tools × ~200 tokens average (see §2.7.2). Also audit extensions that add skills, agents, MCP servers, or tool surfaces. If an extension injects tools you do not need for coding, disable it for that workspace or move coding work into a VS Code profile with only the essentials enabled.
22
22
23
23
**The critical distinction — always-loaded vs. on-demand:**
24
24
@@ -151,6 +151,8 @@ Don't enable every MCP server globally. Use workspace-level configuration:
151
151
152
152
**The rule:** If you don't need it for the current task, disable it. You can always re-enable it later. Every idle MCP server costs tokens on every agent step.
153
153
154
+
**VS Code extensions count too.** MCP servers are the obvious source of tool schemas, but some extensions also add skills, chat participants, agent profiles, or tool surfaces that can appear in the AI context. For cost-sensitive coding sessions, keep a lean VS Code profile: core language tooling, GitHub Copilot, and only the MCP/tools needed for that repo. Disable everything else at the workspace or profile level.
155
+
154
156
## 2.7.6 Practical Guidance
155
157
156
158
1.**Audit your MCP servers** — run through your enabled servers. Do you actually use all of them? Disable the rest
@@ -160,7 +162,8 @@ Don't enable every MCP server globally. Use workspace-level configuration:
160
162
5.**Custom instructions help** — add "Minimize tool calls. Read files only when necessary." to reduce call frequency
161
163
6.**Use skills instead of MCPs for occasional capabilities** — MCP tool schemas load on every step whether used or not. Skills load only title and description upfront; the full content pulls on demand. If a capability is used in fewer than half your sessions, a skill is cheaper. See [Practical Setup §4.2](10-practical-setup.md#mcps-vs-skills-eager-vs-lazy-context-loading) for the full comparison
162
164
7.**Optional, Copilot CLI only: try CodeAct for long tool chains** — external plugin [`copilot-codeact-plugin`](https://github.com/jsturtevant/copilot-codeact-plugin) collapses many small tool hops into one sandboxed execution. That does not shrink any one server's schema, but it can reduce how often the full tool catalog gets replayed on CLI-heavy tasks
163
-
8.**Compress tool output at the source with RTK** — [RTK (Rust Token Killer)](https://github.com/rtk-ai/rtk) is a CLI proxy that filters the *results* of shell commands before they reach the agent. Confirmed to work well in VS Code Copilot (repo-by-repo setup). Reductions are real but vary by command and project output volume. See §2.7.7
165
+
8.**Use a focused custom agent for repeat coding workflows** — a custom agent can carry a narrow tool list and stable instructions, so the same coding workflow starts with the same active surface instead of whatever the default chat currently exposes. Where your Copilot surface supports model selection in agent/profile files, pin the intended model there too
166
+
9.**Compress tool output at the source with RTK** — [RTK (Rust Token Killer)](https://github.com/rtk-ai/rtk) is a CLI proxy that filters the *results* of shell commands before they reach the agent. Confirmed to work well in VS Code Copilot on macOS/Linux with repo-by-repo setup. Treat Windows as experimental and validate before rolling it out broadly. Reductions are real but vary by command and project output volume. See §2.7.7
164
167
165
168
## 2.7.7 Compress Tool Output at the Source: RTK
166
169
@@ -198,6 +201,8 @@ brew install rtk
198
201
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh
199
202
```
200
203
204
+
**Windows caveat:** RTK is strongest today on Unix-like shell paths. On Windows, shell-hook behavior and path handling can be brittle, especially across PowerShell, Git Bash, WSL, and VS Code agent execution. Treat it as a pilot, not a default recommendation: test it on the exact repo and shell your team uses, and skip it if the setup causes command failures or noisy behavior.
205
+
201
206
**Setting up for VS Code Copilot — per-repo:**
202
207
203
208
For VS Code Copilot, RTK installs a PreToolUse hook scoped to the current repository. Run this once inside each repo where you want RTK active:
Focused agents carry less instruction overhead than a general-purpose instruction set.
300
+
Focused agents carry less instruction overhead than a general-purpose instruction set. They also give you a stable control surface: the same task profile can declare the tools it is allowed to use, the instructions it carries, and, where your Copilot surface supports it, the model it should use. For repeat coding workflows, prefer a focused custom agent over the default agent when you care about predictable cost. The default agent inherits more of the current environment: active tools, extension-provided surfaces, and whatever model is currently selected.
301
+
302
+
Keep the tool list narrow. This repo's `agents/token-saver.agent.md` is the pattern: built-in `bash`, `edit`, and `view`; no duplicate filesystem MCP; terse output rules; explicit tool minimization.
301
303
302
304
### 4.3.6 Compress Shell Command Output with RTK
303
305
@@ -317,6 +319,8 @@ rtk init --copilot
317
319
318
320
RTK installs a PreToolUse hook into the current repository. Repeat per repo — there is no global VS Code Copilot install. Once active, the hook is transparent: your terminal is unchanged; only the agent's Bash tool calls are intercepted.
319
321
322
+
On Windows, validate RTK before recommending it to a team. The hook path can be more fragile across PowerShell, Git Bash, WSL, and VS Code agent execution. If RTK adds setup friction or command failures, skip it and focus first on clean profiles, fewer MCP servers, precise prompts, and shorter command output.
323
+
320
324
Commands with verbose output (test failures, large diffs) see the biggest reductions. Short-output commands see smaller gains. Actual savings depend on your project's output volume.
321
325
322
326
Combine with `copilot-setup-steps.yml` (§4.3.2) and precise issue descriptions (§4.3.3) for maximum session efficiency. Full setup, command list, and other AI tool support: [MCP & Tool Costs §2.7.7](08-mcp-tool-costs.md#277-compress-tool-output-at-the-source-rtk).
@@ -335,8 +339,10 @@ Combine with `copilot-setup-steps.yml` (§4.3.2) and precise issue descriptions
335
339
- Review your `copilot-instructions.md` — has it grown? Compress it back down
336
340
- Check if any memory files have gotten verbose — compress them back down
337
341
- Audit which files are habitually open in your editor — close ones you're not working on (open tabs auto-feed context)
342
+
- Audit VS Code profiles and extensions — disable extensions that inject AI skills, agents, MCP servers, or tools unless the current repo needs them
338
343
- (Business/Enterprise) Review repository / org **Content Exclusion** settings for new sensitive paths
339
344
- Check your model usage — are you pinning high-effort models for tasks Auto would route to a cheaper tier?
345
+
- In Copilot CLI, watch the bottom-right **AIC** counter. Divide by 100 for the approximate dollar value, then ask whether the output saved more time or cost than it consumed. If spend is high for weak output, treat that as feedback on prompt scope, context size, tool count, or model choice
340
346
- Review budgets, user-level caps, and model policies before expanding premium access further
341
347
- When default model changes, retune prompts/instructions against that provider's current prompting guide
342
348
- Check token usage by user/team — are agents and power users driving outsized consumption? See [Enterprise Governance](12-enterprise-governance.md)
**`maxRequests`** caps how many tool-call requests the agent can make. Lower = fewer tokens, but the agent might not finish complex tasks. Start at 10-15, increase only when needed.
463
469
470
+
For repeat workflows, pair this with a custom agent profile and a clean VS Code profile. Disable extensions that inject skills, agents, MCP servers, or tool surfaces you do not need for coding. The most predictable setup is boring: one focused agent, one intended model, and only the tools required for the repo.
471
+
464
472
### 4.5.5 Custom Instructions for Agent Efficiency
Copy file name to clipboardExpand all lines: docs/11-models-and-pricing.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -122,11 +122,14 @@ This is especially relevant when comparing a cheap reasoning-capable model at `m
122
122
### Anti-patterns
123
123
124
124
- Leaving an expensive premium model pinned for the whole session
125
+
- Changing models mid-chat in a long session without thinking about accumulated context. Prior messages, tool results, and cacheable prefixes can still be part of the next request; switching into a higher-cost lane can make that carried context more expensive than starting fresh
125
126
- Assuming Auto will escalate to Opus when a task gets hard
126
127
- Using vendor API prices and Copilot pricing signals as if they were the same metric
127
128
- Recommending a model without checking whether the plan includes it
128
129
- Turning on every premium model for the whole org before checking who actually needs it
129
130
131
+
**Model-switch rule:** choose the cost lane before the work starts. If you need to move from cheap/Auto to a premium model for a hard subtask, start a fresh chat with only the relevant summary and files. This preserves cache-friendly stability in the original session and avoids dragging a long low-value history into a higher-cost request. The exact billing implementation can change by surface and plan, so frame this as risk control rather than guaranteed repricing math.
132
+
130
133
## Org Rollout Rule: Review Before Enablement
131
134
132
135
For teams, model choice is a governance problem as much as a prompt problem.
0 commit comments