Add usage optimization guide and cross-references

ntrogh · ntrogh · commit dedb3d8e65b9 · 2026-05-26T13:09:48.000+02:00
diff --git a/build/sitemap.xml b/build/sitemap.xml
@@ -610,6 +610,11 @@
         <changefreq>weekly</changefreq>
         <priority>0.8</priority>
     </url>
+    <url>
+        <loc>https://code.visualstudio.com/docs/copilot/guides/optimize-usage</loc>
+        <changefreq>weekly</changefreq>
+        <priority>0.8</priority>
+    </url>
     <url>
         <loc>https://code.visualstudio.com/docs/copilot/guides/customize-copilot-guide</loc>
         <changefreq>weekly</changefreq>
diff --git a/docs/copilot/best-practices.md b/docs/copilot/best-practices.md
@@ -133,11 +133,13 @@ Each AI model has different strengths. Some are better at reasoning, others exce
 
 * **Use BYOK for additional control.** Bring your own API key for more model choices and hosting options.
 
+* **Consider credit consumption.** More capable models consume more [AI credits](/docs/copilot/concepts/language-models.md#ai-credits-and-model-costs) per token. Auto model selection balances quality and cost automatically. For more tips, see [optimize AI credit usage](/docs/copilot/guides/optimize-usage.md).
+
 For more information, see [selecting AI models](/docs/copilot/customization/language-models.md) and [available models for Copilot Chat](https://docs.github.com/en/copilot/using-github-copilot/ai-models/changing-the-ai-model-for-copilot-chat).
 
 ## Plan first, then implement
 
-For complex changes that span multiple files, separate planning from implementation. This approach prevents the AI from solving the wrong problem.
+For complex changes that span multiple files, separate planning from implementation. This approach prevents the AI from solving the wrong problem and avoids spending [AI credits](/docs/copilot/concepts/language-models.md#ai-credits-and-model-costs) on code that needs to be thrown away.
 
 1. **Explore.** Use ask mode or a subagent to read the relevant code and understand how it works before making changes.
 1. **Plan.** Use the [Plan agent](/docs/copilot/agents/planning.md) to create a structured implementation plan. Review and refine the plan before executing.
@@ -164,19 +166,21 @@ For more information, see [GitHub Copilot security](/docs/copilot/security.md) a
 
 AI responses might degrade as the conversation fills with irrelevant context. Manage your sessions proactively.
 
-* **Start new sessions for unrelated tasks.** Don't keep piling unrelated questions into one conversation. Context pollution reduces response quality.
+* **Start new sessions for unrelated tasks.** Don't keep piling unrelated questions into one conversation. Context pollution reduces response quality and wastes tokens on irrelevant history.
 
 * **Remove irrelevant history.** Delete past questions and responses that are no longer relevant, or start a fresh session.
 
-* **Compact context.** Use [/compact](/docs/copilot/chat/copilot-chat-context.md#context-compaction) and provide instructions to selectively compact the context and retain only the most relevant information.
+* **Compact context.** Use [/compact](/docs/copilot/chat/copilot-chat-context.md#context-compaction) and provide instructions to selectively compact the context and retain only the most relevant information. Compacting reduces the tokens sent with each subsequent request, which helps [manage AI credit usage](/docs/copilot/guides/optimize-usage.md).
 
 * **Use subagents for investigation.** Hint the AI to perform research and exploration in isolation by using [subagents](/docs/copilot/agents/subagents.md) so the findings don't clutter your main context.
 
 * **Choose the right session type.** Use local sessions for quick tasks on your current code that need your immediate attention, background tasks for tasks that can run locally and isolated from your main context, or cloud sessions that can benefit from team-collaboration.
 
 * **Scale with parallel sessions.** Run multiple sessions in parallel for independent tasks to save time and keep contexts separate. You can have multiple sessions running at once, across local, background, and cloud environments, and switch between them via the [sessions list](/docs/copilot/chat/chat-sessions.md#sessions-list) in VS Code.
 
-For more information, see [session management](/docs/copilot/chat/chat-sessions.md) and [workspace indexing](/docs/copilot/reference/workspace-context.md).
+* **Fork instead of re-prompting.** Use [`/fork`](/docs/copilot/chat/chat-sessions.md#fork-a-chat-session) to explore alternatives without losing context, instead of starting over and re-establishing context from scratch.
+
+For more information, see [session management](/docs/copilot/chat/chat-sessions.md), [workspace indexing](/docs/copilot/reference/workspace-context.md), and [optimize AI credit usage](/docs/copilot/guides/optimize-usage.md).
 
 ## Work with large codebases
 
diff --git a/docs/copilot/chat/copilot-chat-context.md b/docs/copilot/chat/copilot-chat-context.md
@@ -127,7 +127,7 @@ As you send more requests in a conversation, the control updates to reflect the
 
 ## Context compaction
 
-As a conversation grows, the accumulated messages and context can fill up the model's context window. Context compaction summarizes the conversation history to free up space, so you can continue working in the same session without losing important details.
+As a conversation grows, the accumulated messages and context can fill up the model's context window. Context compaction summarizes the conversation history to free up space, so you can continue working in the same session without losing important details. Compacting also reduces the number of tokens sent with each subsequent request, which helps manage [AI credit consumption](/docs/copilot/guides/optimize-usage.md).
 
 ### Automatic compaction
 
diff --git a/docs/copilot/concepts/language-models.md b/docs/copilot/concepts/language-models.md
@@ -80,6 +80,8 @@ For more details, see [About Copilot auto model selection](https://docs.github.c
 
 Each Copilot plan includes a monthly allowance of [AI credits](https://docs.github.com/en/copilot/concepts/billing/usage-based-billing-for-individuals). Different models consume AI credits at different rates, based on the model and the number of tokens processed. More capable models cost more per token, while lighter models extend your usage further. When you use auto model selection, VS Code routes each request to an efficient model that balances quality and cost.
 
+Other factors also affect credit consumption, such as [thinking effort](/docs/copilot/customization/language-models.md#configure-thinking-effort) (higher effort produces more thinking tokens), context window size, and tool usage. For practical tips on reducing credit consumption, see [optimize AI credit usage](/docs/copilot/guides/optimize-usage.md).
+
 Learn how to [choose and configure language models](/docs/copilot/customization/language-models.md) in VS Code.
 
 ## Bring your own language model key
diff --git a/docs/copilot/concepts/tools.md b/docs/copilot/concepts/tools.md
@@ -43,6 +43,7 @@ Use the **Configure Tools** button in the chat input field to enable or disable
 Limiting the available tools can help in several ways:
 
 * **Preserve context**: every tool call produces output that consumes space in the [context window](/docs/copilot/concepts/language-models.md#context-window). Fewer tools means the agent is less likely to make unnecessary calls that fill up the context.
+* **Reduce credit consumption**: unnecessary tool calls increase token usage and consume more [AI credits](/docs/copilot/concepts/language-models.md#ai-credits-and-model-costs). Disabling tools you don't need for a task helps keep costs down.
 * **Get more relevant results**: when fewer tools are available, the agent focuses on the most appropriate ones rather than choosing from a large set.
 * **Improve performance**: a smaller tool set reduces the decision space for the model, which can speed up responses.
 
diff --git a/docs/copilot/customization/language-models.md b/docs/copilot/customization/language-models.md
@@ -42,6 +42,9 @@ Some models support configurable thinking effort, which controls how much reason
 
 By default, VS Code sets recommended effort levels and has adaptive reasoning enabled, where the model dynamically determines how much to think based on the complexity of each request. For most use cases, the defaults work well.
 
+> [!TIP]
+> Higher thinking effort produces more thinking tokens, which increases [AI credit](/docs/copilot/concepts/language-models.md#ai-credits-and-model-costs) consumption. Only increase thinking effort for genuinely complex tasks. Learn more about [optimizing AI credit usage](/docs/copilot/guides/optimize-usage.md).
+
 To configure the thinking effort:
 
 1. Open the model picker in the chat input field and select a reasoning model.
diff --git a/docs/copilot/faq.md b/docs/copilot/faq.md
@@ -40,7 +40,7 @@ You can view the current Copilot usage in the Copilot status dashboard, availabl
 * **Inline suggestions**: The percentage of inline suggestions quota you have used in the current month. Paid plans have an unlimited quota for inline suggestions.
 * **AI credits**: The percentage of your monthly AI credits allowance you have used in the current month.
 
-Visit the GitHub Copilot documentation for more information about [monitoring usage and entitlements](https://docs.github.com/en/copilot/managing-copilot/monitoring-usage-and-entitlements/monitoring-your-copilot-usage-and-entitlements).
+Visit the GitHub Copilot documentation for more information about [monitoring usage and entitlements](https://docs.github.com/en/copilot/managing-copilot/monitoring-usage-and-entitlements/monitoring-your-copilot-usage-and-entitlements). For tips on reducing credit consumption, see [optimize AI credit usage](/docs/copilot/guides/optimize-usage.md).
 
 ### I reached my inline suggestions or AI credits limit
 
@@ -52,6 +52,8 @@ For users on Copilot Free, to access more inline suggestions and AI credits, you
 
 If you're on a paid plan and exhaust your AI credits, you can set a budget for additional usage and keep working, or wait until the next monthly cycle when your allowance resets. Learn more about [what happens if you exceed your included AI credits](https://docs.github.com/en/copilot/concepts/billing/usage-based-billing-for-individuals#what-happens-if-i-exceed-my-included-ai-credits) in the GitHub Copilot documentation.
 
+For tips on reducing credit consumption, see [optimize AI credit usage](/docs/copilot/guides/optimize-usage.md).
+
 ### My Copilot subscription is not detected in VS Code
 
 To use chat in Visual Studio Code, you must be signed into Visual Studio Code with a GitHub account that has access to GitHub Copilot.
diff --git a/docs/copilot/guides/context-engineering-guide.md b/docs/copilot/guides/context-engineering-guide.md
@@ -247,6 +247,8 @@ Following these best practices helps you establish a sustainable and effective c
 
 **Maintain context isolation**: Keep different types of work (planning, coding, testing, debugging) in separate chat sessions to prevent context mixing and confusion.
 
+**Be mindful of credit consumption**: More context files, larger instruction sets, and complex agent chains all increase token usage and [AI credit](/docs/copilot/concepts/language-models.md#ai-credits-and-model-costs) consumption. Start with concise context and expand only when needed. For more tips, see [optimize AI credit usage](/docs/copilot/guides/optimize-usage.md).
+
 ### Documentation strategies
 
 **Create living documents**: Treat your custom instructions, custom agents, and templates as evolving resources. Refine them based on observed AI mistakes or shortcomings.
@@ -269,6 +271,8 @@ Following these best practices helps you establish a sustainable and effective c
 
 **Version your context**: Use git to track changes to your context engineering setup, allowing you to revert problematic changes and understand what works best.
 
+**Verify cache performance**: Use the [Agent Debug Logs](/docs/copilot/chat/chat-debug-view.md) to check prompt cache hit rates and token usage. Good cache performance means your context setup is structured in a way that the model provider can reuse previous request prefixes, reducing latency and token costs.
+
 ### Anti-patterns to avoid
 
 **Context dumping**: Avoid providing excessive, unfocused information that doesn't directly help with decision-making.
@@ -279,6 +283,8 @@ Following these best practices helps you establish a sustainable and effective c
 
 **One-size-fits-all**: Different team members or project phases may need different context configurations. Be flexible in your approach.
 
+**Over-engineering agent chains**: Deeply nested subagent workflows and excessive tool calls multiply token usage and [credit consumption](/docs/copilot/concepts/language-models.md#ai-credits-and-model-costs). Keep agent chains as shallow as practical and limit tools to what each agent actually needs.
+
 ### Measuring success
 
 A successful context engineering setup should result in:
diff --git a/docs/copilot/guides/optimize-usage.md b/docs/copilot/guides/optimize-usage.md
diff --git a/docs/toc.json b/docs/toc.json