Skip to content

Commit 001d60e

Browse files
author
Scribe
committed
Enhance documentation with updates on token recommendation placements and model-switching guidelines for improved cost efficiency
1 parent ef84db9 commit 001d60e

8 files changed

Lines changed: 41 additions & 7 deletions

File tree

.squad/agents/bender/history.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,3 +97,7 @@
9797
- Hermes completed the semantic cleanup, not just file removal: plugin metadata now advertises the guide + agent surface, repo docs no longer teach repo-local skill installation, and late Part 4 numbering is contiguous again.
9898
- Live filesystem and `rg` checks satisfied the main acceptance criteria: no top-level `skills/`, no `skills-lock.json`, and one canonical image copy under `docs/assets/`.
9999
- Remaining gap is environmental, not yet a content failure: strict MkDocs build still cannot be executed here because `mkdocs` is not installed.
100+
101+
### 2026-06-17: Token Recommendation Placement Review — Approved
102+
103+
- Hermes placed RTK Windows cautions, VS Code extension/profile cleanup, custom-agent cost control, model-switch risk, and Copilot CLI AIC value framing in natural locations without overstating hidden internals or exact billing math.

.squad/agents/hermes/history.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,13 @@
99

1010
## Learnings
1111

12+
### 2026-06-17: Placement for Tool, Profile, Model-Switch, and AIC Recommendations
13+
14+
- Added RTK Windows caveat beside existing RTK setup, not as a new technique. Readers need the warning where they copy commands.
15+
- Extension/profile cleanup belongs with MCP/tool costs and practical agent setup because extension-injected tools behave like hidden context surface.
16+
- Model mid-chat switching belongs in model-pricing anti-patterns with careful cache/history wording; avoid claiming fixed implementation internals.
17+
- Copilot CLI AIC counter fits habit-building/monthly maintenance: value framing is a behavior loop, not a setup prerequisite.
18+
1219
### 2026-04-14: Wrote TOKEN-OPTIMIZATION-GUIDE.md (v1)
1320

1421
- 1107 lines. Followed Leela's outline (5 parts), filled with Farnsworth's data.

.squad/decisions.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,13 @@
196196
- Cleanup stays semantic, not just structural: docs and plugin metadata must stop advertising shipped installable skills.
197197
- Validation target stays bounded: live `rg` and filesystem checks must pass; strict MkDocs build remains desirable but may be blocked if `mkdocs` is unavailable.
198198

199+
### 2026-06-17: Token Recommendations Placement
200+
201+
**Author:** Hermes | **Status:** Active | **Requested by:** Marco Olivo
202+
203+
- Placed RTK Windows caution in MCP/tool-cost sections, VS Code extension/profile and custom-agent guidance in MCP/practical setup, model-switch cache risk in model-pricing anti-patterns, and Copilot CLI AIC value-framing in habit-building maintenance.
204+
- Each recommendation now sits beside the mechanism it affects, avoiding a new page and keeping README changes limited to high-impact quick-start nudges.
205+
199206
## Governance
200207

201208
- All meaningful changes require team consensus

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,12 @@ Don't have time to read the full guide? Do these today and cut your token usage:
2121
| 1 | **Request code-only responses** — add `Code only, no explanation.` to `copilot-instructions.md`. Highest per-token ROI: output costs 5× more than input, and this cuts 40-70% of output on every code task, permanently | Shrinks response length | 0 minutes |
2222
| 2 | **Constrain output format by default** — add `Bullets over paragraphs. No explanations unless asked.` to `copilot-instructions.md` | Keeps answers terse | 0 minutes |
2323
| 3 | **Shrink your always-on context** — compress `copilot-instructions.md` AND prune `AGENTS.md` to landmines only. Every token in either file is billed on every interaction (and every agent step). Strip filler, delete anything the agent discovers by reading code, delete LLM-generated `/init` boilerplate | Reduces always-on input/context | 15 minutes |
24-
| 4 | **Default to Auto model selection** — use Auto as the baseline because it chooses from the supported Auto pool and gives a paid-plan discount. Pin higher-cost models manually when a task clearly justifies them. See [Model Selection & Pricing](docs/11-models-and-pricing.md) | Lowers billed rate on eligible usage | 0 minutes |
24+
| 4 | **Default to Auto model selection** — use Auto as the baseline because it chooses from the supported Auto pool and gives a paid-plan discount. Pin higher-cost models manually only when a task clearly justifies them, and start a fresh chat when switching cost lanes in a long session. See [Model Selection & Pricing](docs/11-models-and-pricing.md) | Lowers billed rate on eligible usage | 0 minutes |
2525
| 5 | **Use Ask Mode for simple questions** — reserve Agent Mode for multi-step tasks | Avoids agent overhead | 0 minutes (just choose the right mode) |
2626
| 6 | **Scope context with `applyTo:` paths** — split one large instructions file into small scoped ones that load only when relevant | Reduces always-on input/context | 15 minutes |
2727
| 7 | **Be precise in your prompts** — "Add null check to `getUser()`" not "Can you please look at this and maybe add some error handling?" Note: your typed prompt is a small fraction of total input; precision matters more for quality than for raw token savings | Improves task targeting | 0 minutes |
2828
| 8 | **Retune prompts to the target model** — provider prompting guides change by model/version. Paste the official guide URL into Copilot and ask it to adapt `.github/copilot-instructions.md`, agent profiles, or app prompts for the model you actually use | Reduces rework | 10 minutes per model change |
29-
| 9 | **Audit your MCP servers** — disable servers you're not using; each costs ~100-500 tokens per agent step | Removes tool/schema overhead | 5 minutes |
29+
| 9 | **Audit your MCP servers and injected tools** — disable unused MCP servers and VS Code extensions that add skills/tools; use a clean coding profile or focused custom agent for repeat workflows. Each MCP tool costs ~100-500 tokens per agent step | Removes tool/schema overhead | 5-10 minutes |
3030
| 10 | **Convert rich files to Markdown before AI work**`.docx`, `.pdf`, `.pptx`, `.xlsx`, HTML, images, audio, video, and ZIPs carry format tax. [Marc Bara's writeup](https://medium.com/@marc.bara.iniesta/your-docx-is-wasting-33-of-your-ai-budget-86a3d229d042) shows the cost; use [Microsoft MarkItDown](https://github.com/microsoft/markitdown) before chat, agent, or RAG ingestion | Reduces noisy input context | 5 minutes |
3131
| 11 | **Run `/chronicle improve` weekly** (**Copilot CLI only**, experimental) — this slash command works in interactive Copilot CLI sessions, not as a general Copilot Chat feature. It finds recurring confusion in your CLI session history and generates custom-instruction fixes so the same misread intent stops costing tokens forever | Cuts recurring rework | 2 minutes per run |
3232
| 12 | **Try CodeAct for long tool chains** (**Copilot CLI only**, optional external plugin) — [`copilot-codeact-plugin`](https://github.com/jsturtevant/copilot-codeact-plugin) collapses multi-step tool chains into one sandboxed execution, which can reduce repeated replay of system prompt, prior messages, and tool definitions | Reduces tool-loop replay | 10-15 minutes |
@@ -124,7 +124,7 @@ Ranked by cost impact. Output first — it costs 5× more per token than input.
124124
1. **Output control** — "Code only, no explanation" + terse default in `copilot-instructions.md`. 40-70% output savings on code tasks, 30-60% across all interactions. One instruction, permanent.
125125
2. **Shrink always-on context** (`copilot-instructions.md` + `AGENTS.md`) — compress filler, prune to landmines only, delete LLM-generated boilerplate. Compounds on every interaction and agent step; 20-23% agent-task reduction plus better correctness
126126
3. **Ask Mode for simple questions** — 60-90% savings by avoiding Agent overhead
127-
4. **Audit MCP servers** — disable unused servers, save 5K-190K tokens per agent task
127+
4. **Audit MCP servers and injected tools** — disable unused servers/extensions, or use a clean coding profile/custom agent, to save 5K-190K tokens per agent task
128128
5. **Auto model selection** — lower-cost default routing plus paid-plan discount on eligible usage, zero effort
129129
6. **Convert rich files to Markdown first** — avoid paying for Word/PDF/HTML layout noise in chat, agent, and RAG workflows
130130
7. **Retune prompts to the target model** — better first-pass output reduces repeated clarification turns

docs/06-workflow-optimization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Keep the claim bounded: this guide is **not** benchmarking CodeAct itself. The p
9999

100100
CodeAct reduces the *number* of tool calls. [**RTK (Rust Token Killer)**](https://github.com/rtk-ai/rtk) reduces the *size* of each tool call's result. They address different sides of the same problem and can be used together.
101101

102-
RTK is a CLI proxy that intercepts `git`, `cargo test`, `grep`, `ls`, and 100+ other dev commands and compresses their output before it reaches the agent — 60–90% savings per command. Unlike CodeAct, RTK works in all Copilot surfaces (VS Code, CLI, and other AI tools), not just Copilot CLI. See [MCP & Tool Costs §2.7.7](08-mcp-tool-costs.md#277-compress-tool-output-at-the-source-rtk) for setup and the full command list.
102+
RTK is a CLI proxy that intercepts `git`, `cargo test`, `grep`, `ls`, and 100+ other dev commands and compresses their output before it reaches the agent — 60–90% savings per command. Unlike CodeAct, RTK is not limited to Copilot CLI; it can help across Copilot surfaces when the shell hook is reliable. Treat Windows setups as a pilot, not a default rollout. See [MCP & Tool Costs §2.7.7](08-mcp-tool-costs.md#277-compress-tool-output-at-the-source-rtk) for setup and the full command list.
103103

104104
## 2.5.4 Default to Auto Model Selection
105105

docs/08-mcp-tool-costs.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Free Space: 55.3k (28%)
1818
Buffer: 40.4k (20%)
1919
```
2020

21-
**VS Code Copilot:** no equivalent command, but you can estimate your `System/Tools` baseline by counting active MCP servers × tools × ~200 tokens average (see §2.7.2).
21+
**VS Code Copilot:** no equivalent command, but you can estimate your `System/Tools` baseline by counting active MCP servers × tools × ~200 tokens average (see §2.7.2). Also audit extensions that add skills, agents, MCP servers, or tool surfaces. If an extension injects tools you do not need for coding, disable it for that workspace or move coding work into a VS Code profile with only the essentials enabled.
2222

2323
**The critical distinction — always-loaded vs. on-demand:**
2424

@@ -151,6 +151,8 @@ Don't enable every MCP server globally. Use workspace-level configuration:
151151

152152
**The rule:** If you don't need it for the current task, disable it. You can always re-enable it later. Every idle MCP server costs tokens on every agent step.
153153

154+
**VS Code extensions count too.** MCP servers are the obvious source of tool schemas, but some extensions also add skills, chat participants, agent profiles, or tool surfaces that can appear in the AI context. For cost-sensitive coding sessions, keep a lean VS Code profile: core language tooling, GitHub Copilot, and only the MCP/tools needed for that repo. Disable everything else at the workspace or profile level.
155+
154156
## 2.7.6 Practical Guidance
155157

156158
1. **Audit your MCP servers** — run through your enabled servers. Do you actually use all of them? Disable the rest
@@ -160,7 +162,8 @@ Don't enable every MCP server globally. Use workspace-level configuration:
160162
5. **Custom instructions help** — add "Minimize tool calls. Read files only when necessary." to reduce call frequency
161163
6. **Use skills instead of MCPs for occasional capabilities** — MCP tool schemas load on every step whether used or not. Skills load only title and description upfront; the full content pulls on demand. If a capability is used in fewer than half your sessions, a skill is cheaper. See [Practical Setup §4.2](10-practical-setup.md#mcps-vs-skills-eager-vs-lazy-context-loading) for the full comparison
162164
7. **Optional, Copilot CLI only: try CodeAct for long tool chains** — external plugin [`copilot-codeact-plugin`](https://github.com/jsturtevant/copilot-codeact-plugin) collapses many small tool hops into one sandboxed execution. That does not shrink any one server's schema, but it can reduce how often the full tool catalog gets replayed on CLI-heavy tasks
163-
8. **Compress tool output at the source with RTK**[RTK (Rust Token Killer)](https://github.com/rtk-ai/rtk) is a CLI proxy that filters the *results* of shell commands before they reach the agent. Confirmed to work well in VS Code Copilot (repo-by-repo setup). Reductions are real but vary by command and project output volume. See §2.7.7
165+
8. **Use a focused custom agent for repeat coding workflows** — a custom agent can carry a narrow tool list and stable instructions, so the same coding workflow starts with the same active surface instead of whatever the default chat currently exposes. Where your Copilot surface supports model selection in agent/profile files, pin the intended model there too
166+
9. **Compress tool output at the source with RTK**[RTK (Rust Token Killer)](https://github.com/rtk-ai/rtk) is a CLI proxy that filters the *results* of shell commands before they reach the agent. Confirmed to work well in VS Code Copilot on macOS/Linux with repo-by-repo setup. Treat Windows as experimental and validate before rolling it out broadly. Reductions are real but vary by command and project output volume. See §2.7.7
164167

165168
## 2.7.7 Compress Tool Output at the Source: RTK
166169

@@ -198,6 +201,8 @@ brew install rtk
198201
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh
199202
```
200203

204+
**Windows caveat:** RTK is strongest today on Unix-like shell paths. On Windows, shell-hook behavior and path handling can be brittle, especially across PowerShell, Git Bash, WSL, and VS Code agent execution. Treat it as a pilot, not a default recommendation: test it on the exact repo and shell your team uses, and skip it if the setup causes command failures or noisy behavior.
205+
201206
**Setting up for VS Code Copilot — per-repo:**
202207

203208
For VS Code Copilot, RTK installs a PreToolUse hook scoped to the current repository. Run this once inside each repo where you want RTK active:

docs/10-practical-setup.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,9 @@ Mock: external services only. No impl mocking.
297297
Coverage: branch coverage ≥80%.
298298
```
299299

300-
Focused agents carry less instruction overhead than a general-purpose instruction set.
300+
Focused agents carry less instruction overhead than a general-purpose instruction set. They also give you a stable control surface: the same task profile can declare the tools it is allowed to use, the instructions it carries, and, where your Copilot surface supports it, the model it should use. For repeat coding workflows, prefer a focused custom agent over the default agent when you care about predictable cost. The default agent inherits more of the current environment: active tools, extension-provided surfaces, and whatever model is currently selected.
301+
302+
Keep the tool list narrow. This repo's `agents/token-saver.agent.md` is the pattern: built-in `bash`, `edit`, and `view`; no duplicate filesystem MCP; terse output rules; explicit tool minimization.
301303

302304
### 4.3.6 Compress Shell Command Output with RTK
303305

@@ -317,6 +319,8 @@ rtk init --copilot
317319

318320
RTK installs a PreToolUse hook into the current repository. Repeat per repo — there is no global VS Code Copilot install. Once active, the hook is transparent: your terminal is unchanged; only the agent's Bash tool calls are intercepted.
319321

322+
On Windows, validate RTK before recommending it to a team. The hook path can be more fragile across PowerShell, Git Bash, WSL, and VS Code agent execution. If RTK adds setup friction or command failures, skip it and focus first on clean profiles, fewer MCP servers, precise prompts, and shorter command output.
323+
320324
Commands with verbose output (test failures, large diffs) see the biggest reductions. Short-output commands see smaller gains. Actual savings depend on your project's output volume.
321325

322326
Combine with `copilot-setup-steps.yml` (§4.3.2) and precise issue descriptions (§4.3.3) for maximum session efficiency. Full setup, command list, and other AI tool support: [MCP & Tool Costs §2.7.7](08-mcp-tool-costs.md#277-compress-tool-output-at-the-source-rtk).
@@ -335,8 +339,10 @@ Combine with `copilot-setup-steps.yml` (§4.3.2) and precise issue descriptions
335339
- Review your `copilot-instructions.md` — has it grown? Compress it back down
336340
- Check if any memory files have gotten verbose — compress them back down
337341
- Audit which files are habitually open in your editor — close ones you're not working on (open tabs auto-feed context)
342+
- Audit VS Code profiles and extensions — disable extensions that inject AI skills, agents, MCP servers, or tools unless the current repo needs them
338343
- (Business/Enterprise) Review repository / org **Content Exclusion** settings for new sensitive paths
339344
- Check your model usage — are you pinning high-effort models for tasks Auto would route to a cheaper tier?
345+
- In Copilot CLI, watch the bottom-right **AIC** counter. Divide by 100 for the approximate dollar value, then ask whether the output saved more time or cost than it consumed. If spend is high for weak output, treat that as feedback on prompt scope, context size, tool count, or model choice
340346
- Review budgets, user-level caps, and model policies before expanding premium access further
341347
- When default model changes, retune prompts/instructions against that provider's current prompting guide
342348
- Check token usage by user/team — are agents and power users driving outsized consumption? See [Enterprise Governance](12-enterprise-governance.md)
@@ -461,6 +467,8 @@ Relevant settings that affect agent token usage:
461467

462468
**`maxRequests`** caps how many tool-call requests the agent can make. Lower = fewer tokens, but the agent might not finish complex tasks. Start at 10-15, increase only when needed.
463469

470+
For repeat workflows, pair this with a custom agent profile and a clean VS Code profile. Disable extensions that inject skills, agents, MCP servers, or tool surfaces you do not need for coding. The most predictable setup is boring: one focused agent, one intended model, and only the tools required for the repo.
471+
464472
### 4.5.5 Custom Instructions for Agent Efficiency
465473

466474
Add to `.github/copilot-instructions.md`:

docs/11-models-and-pricing.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,11 +122,14 @@ This is especially relevant when comparing a cheap reasoning-capable model at `m
122122
### Anti-patterns
123123

124124
- Leaving an expensive premium model pinned for the whole session
125+
- Changing models mid-chat in a long session without thinking about accumulated context. Prior messages, tool results, and cacheable prefixes can still be part of the next request; switching into a higher-cost lane can make that carried context more expensive than starting fresh
125126
- Assuming Auto will escalate to Opus when a task gets hard
126127
- Using vendor API prices and Copilot pricing signals as if they were the same metric
127128
- Recommending a model without checking whether the plan includes it
128129
- Turning on every premium model for the whole org before checking who actually needs it
129130

131+
**Model-switch rule:** choose the cost lane before the work starts. If you need to move from cheap/Auto to a premium model for a hard subtask, start a fresh chat with only the relevant summary and files. This preserves cache-friendly stability in the original session and avoids dragging a long low-value history into a higher-cost request. The exact billing implementation can change by surface and plan, so frame this as risk control rather than guaranteed repricing math.
132+
130133
## Org Rollout Rule: Review Before Enablement
131134

132135
For teams, model choice is a governance problem as much as a prompt problem.

0 commit comments

Comments
 (0)