Skip to content

Commit c6ea6a8

Browse files
committed
Refine documentation on context management and model selection to clarify cache stability and cost implications during long sessions
1 parent a7e8eba commit c6ea6a8

2 files changed

Lines changed: 0 additions & 11 deletions

File tree

docs/04-context-management.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -168,15 +168,9 @@ Most teams have it inverted — everything in always-on. Flipping the ratio cuts
168168

169169
The cheapest token is the one the platform doesn't have to re-process. Modern Copilot interactions cache stable portions of context (system prompt, instruction files, recently-loaded files) so they don't pay the full input-token cost on every turn.
170170

171-
<<<<<<< HEAD
172171
In long sessions, this is often the biggest single cost lever. When most of your input is cache-hit input, effective input cost can drop dramatically (commonly cited as up to ~90% discount on cached input, depending on provider/model/surface billing rules).
173172

174173
You can lean into this. Two practical patterns:
175-
||||||| parent of 68607d7 (Enhance documentation on `/chronicle` commands for token optimization and add VS Code settings for improved UI customization)
176-
You can lean into this. Two practical patterns:
177-
=======
178-
You can lean into this. Three practical patterns:
179-
>>>>>>> 68607d7 (Enhance documentation on `/chronicle` commands for token optimization and add VS Code settings for improved UI customization)
180174

181175
**1. Stable instructions at the top, volatile work at the bottom.** Cached context only works if the prefix of your conversation is stable. Don't reshuffle your `copilot-instructions.md` or rotate which files are open between every prompt — keep the stable layer stable, and let only the most recent message change.
182176

docs/11-models-and-pricing.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -123,13 +123,8 @@ This is especially relevant when comparing a cheap reasoning-capable model at `m
123123

124124
- Leaving an expensive premium model pinned for the whole session
125125
- Changing models mid-chat in a long session without thinking about accumulated context. Prior messages, tool results, and cacheable prefixes can still be part of the next request; switching into a higher-cost lane can make that carried context more expensive than starting fresh
126-
<<<<<<< HEAD
127126
- Enabling/disabling MCP servers mid-thread in long sessions. Tool-surface changes often invalidate stable cached prefixes
128127
- Switching default/custom agent profiles mid-thread during expensive runs. Agent/profile changes can break cache continuity for the same conversation
129-
||||||| parent of 68607d7 (Enhance documentation on `/chronicle` commands for token optimization and add VS Code settings for improved UI customization)
130-
=======
131-
- Switching the **custom agent** or toggling **MCP servers / tools** mid-session. Like a model switch, this rewrites the cacheable prefix (system prompt, tool definitions, instructions), so cache savings are forfeited and prior context becomes pollution under the new setup. See [Caching §2.3.5](04-context-management.md#235-caching-store-and-reuse-context-within-prompts)
132-
>>>>>>> 68607d7 (Enhance documentation on `/chronicle` commands for token optimization and add VS Code settings for improved UI customization)
133128
- Assuming Auto will escalate to Opus when a task gets hard
134129
- Using vendor API prices and Copilot pricing signals as if they were the same metric
135130
- Recommending a model without checking whether the plan includes it

0 commit comments

Comments
 (0)