This page exists because model advice in token-optimization discussions gets mixed together too easily. There are three different pricing views in play:
- GitHub Copilot public docs — model availability by plan, Auto behavior, and whatever relative pricing signals GitHub has published
- This repo's UBB framing — how to think about the June 1, 2026 usage-based billing shift for Business and Enterprise: AI-credit metering and budgets by enterprise, org, cost center, and user
- Vendor API pricing — per-token pricing from providers like Anthropic, where input and output are billed differently
Do not treat these as identical units. Use each for the question it actually answers.
Those three pages cover almost everything a practitioner needs today:
- which models exist
- which plans include them
- which models are free vs premium on paid plans
- which models are currently documented as cheaper or more expensive relative to others
- what Auto actually does
One timing point matters for the rest of this page: usage-based billing for Copilot Business and Copilot Enterprise starts on June 1, 2026. After that date, AI-credit usage is the main billing lens for Business and Enterprise governance. Any premium-request language below should be read as legacy transition context only.
The repo already recommends Auto as the default, and that is still directionally right. But the official GitHub docs are more specific than the repo previously was.
Per About Copilot auto model selection:
- Auto chooses from the supported Auto pool, subject to your plan and org policies
- Auto selection is based on real-time system health and model performance
- On paid plans, Auto in Copilot Chat gets a 10% discount
- Auto stays in the supported default pool rather than escalating into every premium model
That last point matters.
Practical consequence: Auto is not "pick any model, including the expensive ones, when needed." It is a low-friction default across the supported Auto set. If you want a higher-cost premium model, you should expect to pin it manually.
| Pricing view | Unit | Best use | Source |
|---|---|---|---|
| GitHub public Copilot billing | Model multipliers and, after June 1, AI-credit usage | "What does GitHub currently expose about this model's cost inside Copilot?" | Requests in GitHub Copilot |
| Repo UBB framing | AI credits and future usage-based budgets | "How should we think about spend once Business and Enterprise move onto usage-based billing on June 1, 2026?" | Budgets for metered products |
| Vendor API pricing | Input/output price per MTok | "How do raw tokens differ in value, especially input vs output?" | Anthropic Pricing |
The official public GitHub Copilot docs do not publish a public table of Copilot input-token price vs output-token price by model. Anthropic's API pricing still gives a clear example of the asymmetry, and it remains useful for intuition. Just do not describe that as today's Copilot billing math.
| Model | Input / MTok | Output / MTok |
|---|---|---|
| Claude Haiku 4.5 | $1 | $5 |
| Claude Sonnet 4.6 | $3 | $15 |
| Claude Opus 4.6 | $5 | $25 |
Source: Anthropic Pricing
This is useful for intuition:
- output tokens are materially more expensive than input tokens
- verbose responses can dominate cost faster than people expect
- output control remains one of the highest-ROI habits in this repo
Model choice is not the only dial. On supported reasoning-capable models, thinking effort or reasoning effort changes how much work the model does before answering.
This is already covered in more detail in Practical Setup, but it belongs on this page too because it directly affects spend.
- how many tokens the model spends thinking before responding
- how much tool use and preamble it tends to generate
- latency, especially on harder tasks
| Situation | Recommended effort | Why |
|---|---|---|
| High-volume, simple chat or classification | low |
Cheapest. Good when small quality loss is acceptable |
| Typical coding, refactors, tool-heavy work on supported reasoning models | medium |
Best balance of cost and quality; Anthropic recommends this as the default for Sonnet 4.6 |
| Hard architecture, security review, novel decomposition | high or max |
Spend more only when the task clearly justifies it |
Use reasoning effort only on models that support it. Non-reasoning models such as GPT-4.1 and GPT-4o do not expose this control.
So the full decision stack is:
- choose the right model tier
- if the model supports it, choose the lowest reasoning effort that still gets the job done
This is especially relevant when comparing a cheap reasoning-capable model at medium effort versus a high-cost premium model at high effort. In practice, effort tuning can be a cheaper substitute for jumping to a more expensive model.
- Use Auto by default for general day-to-day work
- Use included or lower-cost models for trivial tasks when you know you do not need anything stronger
- Pin a premium model manually when the task clearly justifies it
- Review org model policy before enablement so premium access expands intentionally, not by drift
| Task | Default choice | Why |
|---|---|---|
| Syntax lookup, quick explanation, tiny edit | Included model or Auto | Cheapest path, good enough quality |
| Typical implementation, bug fix, refactor | Auto or standard model | Best quality-cost tradeoff |
| Architecture, threat modeling, novel decomposition | Manually pin premium model | Auto will not automatically escalate into the premium lane |
- Leaving an expensive premium model pinned for the whole session
- Changing models mid-chat in a long session without thinking about accumulated context. Prior messages, tool results, and cacheable prefixes can still be part of the next request; switching into a higher-cost lane can make that carried context more expensive than starting fresh
- Enabling/disabling MCP servers mid-thread in long sessions. Tool-surface changes often invalidate stable cached prefixes
- Switching default/custom agent profiles mid-thread during expensive runs. Agent/profile changes can break cache continuity for the same conversation
- Assuming Auto will escalate to Opus when a task gets hard
- Using vendor API prices and Copilot pricing signals as if they were the same metric
- Recommending a model without checking whether the plan includes it
- Turning on every premium model for the whole org before checking who actually needs it
Cache-protection rule: choose the lane before work starts and hold it stable in long sessions:
{ model, active MCP set, active agent/profile }
If you must change the lane (for example cheap/Auto to premium for a hard subtask), start a fresh chat with only the relevant summary and files. This preserves cache-friendly stability in the original session and avoids dragging long low-value history into a higher-cost request. The exact billing implementation can change by surface and plan, so frame this as risk control rather than guaranteed repricing math.
For teams, model choice is a governance problem as much as a prompt problem.
Use Configuring access to AI models in Copilot to control which AI models are available. GitHub documents that organization owners and enterprise owners can enable or disable access to AI models for members.
Practical rule:
- enable cheaper models first
- review premium model need by workflow, team, and expected ROI
- enable premium access narrowly
- watch usage reports and AI-credit consumption before widening access
Use user-level AI credit budgets when you need a direct per-user cap. Remember that code completions and next edit suggestions are not billed in AI credits. More on that in Enterprise Governance.
- Practical Setup — day-to-day setup and routing advice
- Practical Setup § Reasoning Effort — the deeper treatment of effort levels
- Workflow Optimization — why Auto is still the best default
- Enterprise Governance — budgets, user-level caps, model policy, instruction scope
- Home — quick terms and supporting links
Next: Enterprise Governance →