Refine documentation on context management and model selection to clarify cache stability and cost implications during long sessions

olivomarco · olivomarco · commit c6ea6a88dab5 · 2026-06-24T13:29:35.000+02:00
diff --git a/docs/04-context-management.md b/docs/04-context-management.md
@@ -168,15 +168,9 @@ Most teams have it inverted — everything in always-on. Flipping the ratio cuts
 
 The cheapest token is the one the platform doesn't have to re-process. Modern Copilot interactions cache stable portions of context (system prompt, instruction files, recently-loaded files) so they don't pay the full input-token cost on every turn.
 
-<<<<<<< HEAD
 In long sessions, this is often the biggest single cost lever. When most of your input is cache-hit input, effective input cost can drop dramatically (commonly cited as up to ~90% discount on cached input, depending on provider/model/surface billing rules).
 
 You can lean into this. Two practical patterns:
-||||||| parent of 68607d7 (Enhance documentation on `/chronicle` commands for token optimization and add VS Code settings for improved UI customization)
-You can lean into this. Two practical patterns:
-=======
-You can lean into this. Three practical patterns:
->>>>>>> 68607d7 (Enhance documentation on `/chronicle` commands for token optimization and add VS Code settings for improved UI customization)
 
 **1. Stable instructions at the top, volatile work at the bottom.** Cached context only works if the prefix of your conversation is stable. Don't reshuffle your `copilot-instructions.md` or rotate which files are open between every prompt — keep the stable layer stable, and let only the most recent message change.
 
diff --git a/docs/11-models-and-pricing.md b/docs/11-models-and-pricing.md
@@ -123,13 +123,8 @@ This is especially relevant when comparing a cheap reasoning-capable model at `m
 
 - Leaving an expensive premium model pinned for the whole session
 - Changing models mid-chat in a long session without thinking about accumulated context. Prior messages, tool results, and cacheable prefixes can still be part of the next request; switching into a higher-cost lane can make that carried context more expensive than starting fresh
-<<<<<<< HEAD
 - Enabling/disabling MCP servers mid-thread in long sessions. Tool-surface changes often invalidate stable cached prefixes
 - Switching default/custom agent profiles mid-thread during expensive runs. Agent/profile changes can break cache continuity for the same conversation
-||||||| parent of 68607d7 (Enhance documentation on `/chronicle` commands for token optimization and add VS Code settings for improved UI customization)
-=======
-- Switching the **custom agent** or toggling **MCP servers / tools** mid-session. Like a model switch, this rewrites the cacheable prefix (system prompt, tool definitions, instructions), so cache savings are forfeited and prior context becomes pollution under the new setup. See [Caching §2.3.5](04-context-management.md#235-caching-store-and-reuse-context-within-prompts)
->>>>>>> 68607d7 (Enhance documentation on `/chronicle` commands for token optimization and add VS Code settings for improved UI customization)
 - Assuming Auto will escalate to Opus when a task gets hard
 - Using vendor API prices and Copilot pricing signals as if they were the same metric
 - Recommending a model without checking whether the plan includes it