| layout | default |
|---|---|
| title | Chapter 7: Multi-Model Strategy and Providers |
| nav_order | 7 |
| parent | Kiro Tutorial |
Welcome to Chapter 7: Multi-Model Strategy and Providers. In this part of Kiro Tutorial: Spec-Driven Agentic IDE from AWS, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Kiro uses Claude Sonnet 4.0 and 3.7 by default and routes different task types to different model configurations. This chapter teaches you how to configure the model strategy for your team's workload profile.
- understand Kiro's default model routing between Claude Sonnet 4.0 and 3.7
- configure model preferences for different task categories
- understand the cost and latency tradeoffs between model tiers
- set up budget controls and usage monitoring
- plan model upgrades as new Claude versions become available
- open Kiro settings and navigate to the Model section
- confirm the default model is Claude Sonnet 4.0
- optionally override to Claude Sonnet 3.7 for faster or lower-cost interactive chat
- set a daily token budget for cost control
- review the model usage dashboard after a full session
Kiro ships with two default model profiles:
| Profile | Model | Best For |
|---|---|---|
| Primary | Claude Sonnet 4.0 | autonomous agent tasks, spec generation, complex code synthesis |
| Fast | Claude Sonnet 3.7 | interactive chat, quick edits, explanation and Q&A |
Kiro automatically selects the appropriate model based on the interaction type. You can override this selection for specific use cases.
{
"models": {
"primary": {
"provider": "anthropic",
"model": "claude-sonnet-4-0",
"maxTokens": 8192,
"temperature": 0.1
},
"fast": {
"provider": "anthropic",
"model": "claude-sonnet-3-7",
"maxTokens": 4096,
"temperature": 0.2
},
"routing": {
"specGeneration": "primary",
"taskExecution": "primary",
"interactiveChat": "fast",
"hookActions": "fast",
"codeExplanation": "fast"
}
}
}| Capability | Claude Sonnet 4.0 | Claude Sonnet 3.7 |
|---|---|---|
| Code synthesis quality | higher | good |
| Multi-step reasoning | stronger | capable |
| Response latency | moderate | faster |
| Cost per token | higher | lower |
| Context window | 200k tokens | 200k tokens |
| Best use case | spec generation, complex tasks | chat, quick edits |
Map task types to model profiles based on your team's cost and quality priorities:
{
"models": {
"routing": {
"specGeneration": "primary", // requirements → design → tasks: quality matters most
"taskExecution": "primary", // autonomous agent: complex multi-step reasoning
"codeReview": "primary", // security and correctness review: quality matters
"interactiveChat": "fast", // quick Q&A and exploration: speed matters
"hookActions": "fast", // frequent event-driven actions: cost matters
"codeExplanation": "fast", // explaining existing code: speed and cost
"documentationUpdate": "fast" // doc updates: lower complexity
}
}
}Set daily and monthly token budgets to prevent unexpected cost spikes:
{
"budget": {
"daily": {
"inputTokens": 500000,
"outputTokens": 200000,
"alertThreshold": 0.8,
"action": "notify"
},
"monthly": {
"inputTokens": 10000000,
"outputTokens": 4000000,
"alertThreshold": 0.9,
"action": "restrict"
}
}
}Budget actions:
notify: send an alert to the chat panel when the threshold is reachedrestrict: switch all routing to thefast(lower-cost) model when the threshold is reachedpause: stop all agent activity and require manual reset when the limit is reached
Track model usage in the Kiro dashboard:
# In the Chat panel:
> /usage
# Output:
Session token usage:
Input: 47,832 tokens (Claude Sonnet 4.0: 31,200 | Claude Sonnet 3.7: 16,632)
Output: 12,441 tokens (Claude Sonnet 4.0: 9,800 | Claude Sonnet 3.7: 2,641)
Estimated cost: $0.43
Daily usage: 182,341 input / 48,902 output tokens (36% of daily budget)
| Pattern | Description | Token Savings |
|---|---|---|
| Route chat to fast model | use Sonnet 3.7 for all interactive chat | 30-50% reduction on chat costs |
| Scope task context | pass only relevant spec sections to agents | 20-40% reduction per task |
| Compress steering files | remove redundant rules from steering files | 5-15% reduction on base context |
| Limit hook frequency | use commit-level hooks instead of save-level | 60-80% reduction on hook costs |
| Batch spec generation | generate all spec documents in one call | 10-20% reduction vs. sequential calls |
When AWS releases a new Claude version in Kiro, follow this upgrade protocol:
- review the release notes for the new model version
- test spec generation on a sample feature spec with the new model
- compare output quality against the previous model on the same spec
- if quality is equal or better, update the
primaryrouting to the new model - run the full test suite on an autonomous agent task using the new model
- monitor token usage for the first week on the new model
- update the model configuration in version control and notify the team
You now know how to configure Kiro's model routing, set budget controls, monitor usage, and plan for model upgrades.
Next: Chapter 8: Team Operations and Governance
This chapter is expanded to v1-style depth for production-grade learning and implementation quality.
- tutorial: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- tutorial slug: kiro-tutorial
- chapter focus: Chapter 7: Multi-Model Strategy and Providers
- system context: Kiro Tutorial
- objective: move from surface-level usage to repeatable engineering operation
- Define the runtime boundary for
Chapter 7: Multi-Model Strategy and Providers— the model routing layer, the budget controller, and the provider API gateway. - Separate control-plane decisions (model selection, routing policy, budget limits) from data-plane execution (token generation, inference calls).
- Capture input contracts: task type classification from interaction context; output: model-routed inference request and response.
- Trace state transitions: task initiated → type classified → routing rule applied → model selected → request sent → response received → cost tracked.
- Identify extension hooks: custom routing rules per task type, budget action policies, provider failover paths.
- Map ownership boundaries: developers choose fast/primary preference; team leads set routing policy; finance owns budget limits.
- Specify rollback paths: switch routing back to previous model; restore budget settings from version control.
- Track observability signals: token consumption per model per task type, cost per session, budget threshold alerts, model latency distribution.
| Decision Area | Low-Risk Path | High-Control Path | Tradeoff |
|---|---|---|---|
| Model selection | Kiro defaults (Sonnet 4.0 primary) | explicit routing per task type | ease vs cost optimization |
| Budget controls | monthly soft cap with notification | daily hard cap with auto-restrict | flexibility vs cost certainty |
| Upgrade cadence | upgrade immediately on release | validation protocol before upgrade | speed vs quality assurance |
| Usage monitoring | check manually via /usage | automated daily usage reports | effort vs visibility |
| Cost allocation | project-level budget | per-developer or per-team budgets | simplicity vs granularity |
| Failure Mode | Early Signal | Root Cause Pattern | Countermeasure |
|---|---|---|---|
| budget overrun | unexpected high token usage | hooks or autonomous tasks using primary model at high frequency | audit routing config and redirect high-frequency actions to fast model |
| model quality regression | lower spec generation quality after upgrade | new model performs differently on the team's task profile | run quality benchmark before upgrading primary model |
| provider outage | 503 errors on model API calls | Anthropic service disruption | configure fallback model or degrade to interactive-only mode |
| token waste on large contexts | high input token counts for simple tasks | full codebase context sent for small tasks | scope context explicitly in task descriptions |
| routing misconfiguration | wrong model used for expensive tasks | misconfigured routing JSON | audit routing config and verify with /usage after changes |
| cost spike from hook frequency | daily budget hits threshold early | save-level hooks using primary model | switch hook routing to fast model and add conditions to reduce frequency |
- Review the Kiro model documentation to understand the current Claude Sonnet 4.0 and 3.7 capability profiles.
- Map your team's top five task types to the appropriate model tier based on quality vs. cost priority.
- Configure the routing policy in Kiro settings or
.kiro/settings.json. - Set a daily token budget with a notify action at 80% of the limit.
- Run a full one-day session with the new configuration and review the
/usageoutput. - Identify the three highest-cost task types and optimize their routing or context scope.
- Set the monthly budget with a restrict action at 90% of the limit.
- Document the model routing rationale in
.kiro/settings.jsoncomments for team transparency. - Schedule a quarterly model upgrade review to assess whether new Claude versions improve quality or reduce cost.
- routing policy is explicitly configured for at least five task types in settings
- daily and monthly token budgets are set with appropriate alert thresholds
- budget action for monthly limit is set to
restrictorpauseto prevent overruns -
/usageis reviewed after the first full day with the new routing configuration - high-frequency hook actions are routed to the fast model
- a model upgrade validation protocol is documented before the first upgrade
- routing configuration is committed to version control with clear comments
- team members are informed of the routing policy and budget limits
- Kiro Docs: Model Configuration
- Kiro Docs: Budget Controls
- Kiro Docs: Usage Dashboard
- Anthropic Models Overview
- Kiro Repository
- LiteLLM Tutorial
- Claude Code Tutorial
- OpenCode Tutorial
- Cline Tutorial
- Chapter 8: Team Operations and Governance
- Configure a complete routing policy for six task types and document the quality vs. cost rationale for each routing decision.
- Run identical spec generation tasks with Sonnet 4.0 and Sonnet 3.7 and compare output quality in a structured evaluation table.
- Simulate a budget overrun by setting a very low daily limit and observe the restrict action behavior; then restore the correct limit.
- Build a model upgrade validation checklist for your team's specific task profile and run it against a hypothetical new Claude version.
- Analyze one week of
/usageoutput and identify the top three opportunities to reduce token consumption without reducing quality.
- Why does Kiro route spec generation to the primary (Sonnet 4.0) model rather than the fast model by default?
- What is the difference between the
restrictandpausebudget actions, and when should you use each? - What tradeoff did you make between model quality and cost when routing hook actions to the fast model?
- How would you validate that a new Claude model version is safe to use as the primary routing target for your team's spec generation tasks?
- What conditions trigger an automatic routing switch in Kiro's budget control system?
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: daily token budget alert fires at 9am because file:save hooks are consuming primary model tokens at high frequency
- initial hypothesis: hooks are routing to the primary model and activating on every TypeScript file save in a large codebase
- immediate action: switch all hook routing to the fast model and add file-pattern conditions to reduce activation rate
- engineering control: update the routing config to explicitly map
hookActionstofastmodel - verification target: token usage at end of day stays below 60% of the daily budget after routing change
- rollback trigger: if fast model produces lower-quality hook outputs that are actionable, add a flag for critical hooks to use primary
- communication step: notify the team of the routing change and explain the cost rationale
- learning capture: add hook routing as a required configuration step in the team's Kiro onboarding checklist
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: spec generation quality drops noticeably after the team upgraded to a new Claude version
- initial hypothesis: the new model has different default behaviors for EARS requirement parsing and design generation
- immediate action: revert the primary model routing to the previous version while the quality issue is investigated
- engineering control: run the quality benchmark suite on the new model version and document the delta
- verification target: benchmark scores for spec generation match or exceed the previous model version
- rollback trigger: if the new model cannot match previous quality after prompt adjustments, remain on the previous version
- communication step: share the benchmark results with the team and the model upgrade status
- learning capture: add a quality benchmark run as a mandatory step before any future model version upgrade
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: Anthropic API returns 503 errors causing all Kiro model calls to fail
- initial hypothesis: the Anthropic service is experiencing an outage affecting the Claude Sonnet endpoints
- immediate action: check the Anthropic status page and switch Kiro to interactive-only mode for in-flight autonomous tasks
- engineering control: configure a fallback model in Kiro settings pointing to an alternative provider if available
- verification target: team can continue interactive chat in degraded mode while the outage is active
- rollback trigger: restore full model routing once Anthropic reports the incident resolved
- communication step: notify the team of the outage status and expected recovery time from the Anthropic status page
- learning capture: add provider outage response steps to the team's Kiro incident runbook
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition:
/usageshows extremely high input token counts for tasks that should be simple - initial hypothesis: the agent is loading the full codebase context for tasks that only require a single file or module
- immediate action: add explicit context constraints to the task descriptions in tasks.md: "only read files in src/auth/"
- engineering control: update the spec generation prompt template to include a "context scope" field for each task
- verification target: input token count per task decreases by at least 30% after scope constraints are applied
- rollback trigger: if scope constraints cause the agent to miss necessary context, expand the scope incrementally
- communication step: share the context scoping pattern with the team as a best practice in the Kiro usage guide
- learning capture: add a context scope field to the tasks.md template and document the expected files per task type
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: interactive chat is using the primary (Sonnet 4.0) model despite routing being configured for fast model
- initial hypothesis: the routing configuration in settings.json has a syntax error or the key name does not match Kiro's expected format
- immediate action: validate the settings.json against the Kiro settings schema and fix any key name mismatches
- engineering control: add a JSON schema validation step to the CI pipeline for
.kiro/settings.json - verification target:
/usageconfirms interactive chat is routed to Sonnet 3.7 after the configuration fix - rollback trigger: if schema validation is not feasible, revert settings.json to the last known good commit
- communication step: share the corrected settings.json format with the team and update the configuration docs
- learning capture: add a settings.json validation step to the Kiro onboarding checklist
Most agentic coding tools treat model selection as a binary choice. Kiro's multi-model routing strategy recognizes that different task types have fundamentally different quality and cost requirements. Spec generation demands the highest-quality reasoning; interactive chat demands the lowest latency. Routing these to the same model either wastes money on fast interactions or underserves the tasks that matter most.
In practical terms, this chapter helps you avoid three common failures:
- paying primary-model prices for every lint check, code explanation, and quick question
- using a fast model for spec generation and getting design documents that miss key architectural considerations
- running out of daily token budget before the high-value autonomous tasks run
After working through this chapter, you should be able to treat model routing as a cost-quality optimization policy that is explicit, versioned, and tuned to your team's actual workload distribution.
Under the hood, Chapter 7: Multi-Model Strategy and Providers follows a repeatable control path:
- Task type classification: Kiro inspects the interaction type (chat, spec generation, hook action, etc.) to classify the task.
- Routing rule lookup: the routing policy in settings is consulted to select the model profile for the task type.
- Budget check: before dispatching, Kiro checks the current usage against the configured budget limits.
- Model API call: Kiro sends the inference request to the Anthropic API endpoint for the selected model.
- Response tracking: the token counts from the API response are recorded against the session and daily budgets.
- Usage aggregation: the dashboard aggregates usage by model, task type, and time window for monitoring.
When debugging cost or quality issues, trace this sequence from task classification through budget tracking to identify where the routing or consumption is diverging from expectations.
Use the following upstream sources to verify implementation details while reading this chapter:
- Kiro Docs: Model Configuration Why it matters: the primary reference for routing configuration format and available model identifiers.
- Kiro Docs: Budget Controls Why it matters: documents the exact budget action behaviors and threshold configuration options.
- Anthropic Models Overview Why it matters: the canonical reference for Claude model capabilities, context windows, and pricing tiers.
- Kiro Repository Why it matters: source for model configuration schema and community discussions on routing strategies.
Suggested trace strategy:
- check the Anthropic models page before configuring routing to confirm the current model identifier strings
- run
/usageafter each configuration change to confirm routing is working as intended
- Tutorial Index
- Previous Chapter: Chapter 6: Hooks and Automation
- Next Chapter: Chapter 8: Team Operations and Governance
- Main Catalog
- A-Z Tutorial Directory
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: incoming request volume spikes after release
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: introduce adaptive concurrency limits and queue bounds
- verification target: latency p95 and p99 stay within defined SLO windows
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: tool dependency latency increases under concurrency
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: enable staged retries with jitter and circuit breaker fallback
- verification target: error budget burn rate remains below escalation threshold
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: schema updates introduce incompatible payloads
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: pin schema versions and add compatibility shims
- verification target: throughput remains stable under target concurrency
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: environment parity drifts between staging and production
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: restore environment parity via immutable config promotion
- verification target: retry volume stays bounded without feedback loops
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: access policy changes reduce successful execution rates
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: re-scope credentials and rotate leaked or stale keys
- verification target: data integrity checks pass across write/read cycles
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: background jobs accumulate and exceed processing windows
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: activate degradation mode to preserve core user paths
- verification target: audit logs capture all control-plane mutations
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: incoming request volume spikes after release
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: introduce adaptive concurrency limits and queue bounds
- verification target: latency p95 and p99 stay within defined SLO windows
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: tool dependency latency increases under concurrency
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: enable staged retries with jitter and circuit breaker fallback
- verification target: error budget burn rate remains below escalation threshold
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: schema updates introduce incompatible payloads
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: pin schema versions and add compatibility shims
- verification target: throughput remains stable under target concurrency
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: environment parity drifts between staging and production
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: restore environment parity via immutable config promotion
- verification target: retry volume stays bounded without feedback loops
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: access policy changes reduce successful execution rates
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: re-scope credentials and rotate leaked or stale keys
- verification target: data integrity checks pass across write/read cycles
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: background jobs accumulate and exceed processing windows
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: activate degradation mode to preserve core user paths
- verification target: audit logs capture all control-plane mutations
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: incoming request volume spikes after release
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: introduce adaptive concurrency limits and queue bounds
- verification target: latency p95 and p99 stay within defined SLO windows
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: tool dependency latency increases under concurrency
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: enable staged retries with jitter and circuit breaker fallback
- verification target: error budget burn rate remains below escalation threshold
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: schema updates introduce incompatible payloads
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: pin schema versions and add compatibility shims
- verification target: throughput remains stable under target concurrency
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests
- tutorial context: Kiro Tutorial: Spec-Driven Agentic IDE from AWS
- trigger condition: environment parity drifts between staging and production
- initial hypothesis: identify the smallest reproducible failure boundary
- immediate action: protect user-facing stability before optimization work
- engineering control: restore environment parity via immutable config promotion
- verification target: retry volume stays bounded without feedback loops
- rollback trigger: pre-defined quality gate fails for two consecutive checks
- communication step: publish incident status with owner and ETA
- learning capture: add postmortem and convert findings into automated tests