|
| 1 | +# Investigation: Opus 4.7 Day 0 Launch Regression - Haiku 4.5 Cost Spike on Bedrock |
| 2 | + |
| 3 | +## Issue Summary |
| 4 | + |
| 5 | +**Reported by:** User on Slack (#llms-engineering) |
| 6 | +**Date:** April 17, 2026 |
| 7 | +**Versions affected:** |
| 8 | +- **Working:** `v1.82.3-stable.patch.4` |
| 9 | +- **Broken:** `v1.82.3-stable.opus-4.7` |
| 10 | + |
| 11 | +### Symptoms |
| 12 | +- Upgrade from `v1.82.3-stable.patch.4` to `v1.82.3-stable.opus-4.7` caused a **significant cost spike** on Haiku 4.5 on Bedrock |
| 13 | +- Downgrade restored normal costs |
| 14 | +- Errors reported: **"unsupported thinking metadata"** |
| 15 | +- The issue affects Haiku 4.5 specifically, not Opus 4.7 |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## Investigation Findings |
| 20 | + |
| 21 | +### 1. Opus 4.7 Day 0 Launch PRs |
| 22 | + |
| 23 | +**PR #25867** - "Litellm day 0 opus 4.7 support" (merged into main via PR #25876) |
| 24 | +- **Merged:** April 16, 2026 |
| 25 | +- **Changes:** Added Claude Opus 4.7 support across all providers (Anthropic direct, Bedrock, Vertex AI, Azure AI) |
| 26 | + |
| 27 | +**PR #25876** - "Litellm hotfix opus 4.7" (merged April 16, 2026) |
| 28 | +- **Purpose:** Hotfix for Opus 4.7 support |
| 29 | +- **Key changes (per Greptile review):** |
| 30 | + - Added `_is_opus_4_7_model()` and `_is_claude_4_7_model()` functions |
| 31 | + - Extended `_is_adaptive_thinking_model()` to include Opus 4.7 |
| 32 | + - Updated `is_claude_4_5_on_bedrock()` to include Opus 4.7 patterns |
| 33 | + - Added Opus 4.7 to `_supports_extended_thinking_on_bedrock()` |
| 34 | + - Added model entries to `model_prices_and_context_window.json` |
| 35 | + |
| 36 | +### 2. Root Cause: Hardcoded Model Checks Violating Design Principles |
| 37 | + |
| 38 | +#### The Architectural Problem |
| 39 | + |
| 40 | +LiteLLM has documented "Day 0 Model Release Gaps" (GitHub Discussion #22555) which explicitly identifies the problem: |
| 41 | + |
| 42 | +**Gap G4 (Anthropic):** |
| 43 | +> "Thinking/reasoning gated on hardcoded Claude version strings" |
| 44 | +> |
| 45 | +> File: `litellm/llms/anthropic/chat/transformation.py:172–210` |
| 46 | +> |
| 47 | +> `_is_claude_4_6_model()` and similar checks control whether `thinking` blocks and `budget_tokens` are sent. Any Claude 5.x model added only to JSON will silently get thinking stripped out. |
| 48 | +
|
| 49 | +**Gap G13 (Bedrock):** |
| 50 | +> "Parallel tool use on Bedrock gated on hardcoded Claude 4.5/4.6 patterns" |
| 51 | +> |
| 52 | +> File: `litellm/llms/bedrock/common_utils.py:462–490` |
| 53 | +> |
| 54 | +> `is_claude_4_5_on_bedrock()` controls whether `parallel_tool_use` is enabled. Claude 5.x models on Bedrock will have features silently disabled until this list is updated. |
| 55 | +
|
| 56 | +#### What Happened with Opus 4.7 |
| 57 | + |
| 58 | +When Opus 4.7 support was added: |
| 59 | + |
| 60 | +1. **Extended existing hardcoded checks** - Added "opus-4.7", "opus_4_7", "opus-4-7", "opus_4_7" patterns to functions like: |
| 61 | + - `is_claude_4_5_on_bedrock()` (which now covers 4.5, 4.6, AND 4.7 despite its name) |
| 62 | + - `_supports_extended_thinking_on_bedrock()` |
| 63 | + - `_is_adaptive_thinking_model()` |
| 64 | + |
| 65 | +2. **Broad substring matching** - Some checks use broad patterns like: |
| 66 | + - `"claude-3-7"` in model |
| 67 | + - `"claude-sonnet-4"` in model |
| 68 | + - `"claude-opus-4"` in model |
| 69 | + - `"claude-haiku-4"` in model |
| 70 | + |
| 71 | +### 3. The Haiku 4.5 Cost Spike Root Cause |
| 72 | + |
| 73 | +#### Hypothesis 1: Thinking Metadata Incorrectly Sent to Haiku 4.5 |
| 74 | + |
| 75 | +**Most likely cause:** The broad substring checks or function extensions added for Opus 4.7 are now causing thinking/reasoning metadata to be sent to Haiku 4.5 models that don't support it (or support it differently), resulting in: |
| 76 | + |
| 77 | +1. **API errors** - Bedrock rejecting requests with "unsupported thinking metadata" |
| 78 | +2. **Retries** - LiteLLM retrying failed requests, increasing costs |
| 79 | +3. **Fallback behavior** - Potentially using different (more expensive) code paths |
| 80 | + |
| 81 | +#### Evidence from PR #24053 (Open, Not Merged) |
| 82 | + |
| 83 | +PR #24053 attempted to add thinking support for all Claude 4+ models on Bedrock (including Haiku 4.5) but was flagged by Greptile as violating design principles: |
| 84 | + |
| 85 | +> "The correct approach is to let `supports_reasoning()` (backed by `model_prices_and_context_window.json`) be the single source of truth... The real problem is that `supports_reasoning()` fails for Bedrock models due to a provider/cost-map key mismatch." |
| 86 | +
|
| 87 | +**The core bug:** `supports_reasoning()` doesn't work properly for Bedrock because: |
| 88 | +- Model entries use `litellm_provider: "bedrock_converse"` or `litellm_provider: "bedrock"` |
| 89 | +- But lookups use `custom_llm_provider: "bedrock"` or `"bedrock_converse"` |
| 90 | +- The mismatch causes `supports_reasoning()` to return `False` even when the model supports it |
| 91 | + |
| 92 | +#### Hypothesis 2: Function Name/Scope Mismatch |
| 93 | + |
| 94 | +From Greptile review of PR #25876: |
| 95 | + |
| 96 | +> "`is_claude_4_5_on_bedrock` now covers 4.5, 4.6, *and* 4.7 model families. The name and docstring are misleading to any caller." |
| 97 | +
|
| 98 | +**Risk:** If `is_claude_4_5_on_bedrock()` is used to gate Bedrock-specific thinking behavior, adding Opus 4.7 patterns might have inadvertently changed behavior for other Claude 4.x models (including Haiku 4.5). |
| 99 | + |
| 100 | +### 4. Specific Code Locations to Investigate |
| 101 | + |
| 102 | +Based on the analysis, these files likely contain the bug: |
| 103 | + |
| 104 | +1. **`litellm/llms/bedrock/common_utils.py`** |
| 105 | + - `is_claude_4_5_on_bedrock()` - Extended to include Opus 4.7 patterns |
| 106 | + - Line 565-597 per Greptile review |
| 107 | + |
| 108 | +2. **`litellm/llms/anthropic/chat/transformation.py`** |
| 109 | + - `_is_opus_4_7_model()` and thinking/reasoning logic |
| 110 | + - Lines 172-210, 237-248 per Gap G4 and Greptile review |
| 111 | + |
| 112 | +3. **`litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py`** |
| 113 | + - `_supports_extended_thinking_on_bedrock()` - Extended to include Opus 4.7 |
| 114 | + - May have incorrect logic for Haiku 4.5 |
| 115 | + |
| 116 | +4. **`litellm/llms/bedrock/chat/converse_transformation.py`** |
| 117 | + - `get_supported_openai_params()` with hardcoded substring checks |
| 118 | + - Lines 565-580 per PR #24053 review |
| 119 | + |
| 120 | +### 5. Why Downgrading Fixed It |
| 121 | + |
| 122 | +The previous version (`v1.82.3-stable.patch.4`) likely had: |
| 123 | +- **More conservative checks** - Thinking metadata only sent to models explicitly known to support it |
| 124 | +- **Narrower pattern matching** - Only Opus 4.5/4.6 covered, not all Claude 4.x |
| 125 | +- **No Opus 4.7 extensions** - The functions weren't modified to include new patterns |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +## Reproduction Scenario |
| 130 | + |
| 131 | +1. User makes request to Bedrock with Haiku 4.5 model |
| 132 | +2. LiteLLM routes through Bedrock provider |
| 133 | +3. **Bug trigger:** One of these happens: |
| 134 | + - `is_claude_4_5_on_bedrock()` returns `True` for Haiku 4.5 due to broad pattern matching |
| 135 | + - Substring check `"claude-haiku-4" in model` matches Haiku 4.5 |
| 136 | + - Thinking metadata is added to request |
| 137 | +4. Bedrock API returns error: "unsupported thinking metadata" |
| 138 | +5. LiteLLM retries or uses fallback logic |
| 139 | +6. **Cost spike:** Multiple retries or expensive fallback code path |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +## Recommended Investigation Steps for LiteLLM Team |
| 144 | + |
| 145 | +1. **Check git diff between versions:** |
| 146 | + ```bash |
| 147 | + git diff v1.82.3-stable.patch.4..v1.82.3-stable.opus-4.7 -- \ |
| 148 | + litellm/llms/bedrock/common_utils.py \ |
| 149 | + litellm/llms/anthropic/chat/transformation.py \ |
| 150 | + litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py |
| 151 | + ``` |
| 152 | + |
| 153 | +2. **Test Haiku 4.5 on Bedrock with thinking parameters:** |
| 154 | + - Send request with `thinking` or `reasoning_effort` params |
| 155 | + - Verify if Bedrock rejects it with "unsupported thinking metadata" |
| 156 | + |
| 157 | +3. **Review function scope:** |
| 158 | + - Audit `is_claude_4_5_on_bedrock()` call sites |
| 159 | + - Check what behavior is gated by this function |
| 160 | + - Verify if Opus 4.7 addition affected Haiku 4.5 logic |
| 161 | + |
| 162 | +4. **Check model capability flags:** |
| 163 | + - Review `model_prices_and_context_window.json` entries for Haiku 4.5 |
| 164 | + - Verify `supports_reasoning`, `supports_extended_thinking`, etc. |
| 165 | + - Ensure Haiku 4.5 is correctly marked (likely should NOT have these flags) |
| 166 | + |
| 167 | +--- |
| 168 | + |
| 169 | +## Recommended Fixes |
| 170 | + |
| 171 | +### Short-term Fix (Hotfix) |
| 172 | +Ensure Haiku 4.5 is explicitly excluded from thinking/reasoning metadata logic: |
| 173 | +- Add explicit checks: `if "haiku-4-5" in model: return False` before broad Claude 4.x checks |
| 174 | +- Or narrow the patterns in `is_claude_4_5_on_bedrock()` to only match 4.5/4.6/4.7 Opus and Sonnet |
| 175 | + |
| 176 | +### Long-term Fix (Architectural) |
| 177 | +Implement the fix described in Discussion #22555: |
| 178 | + |
| 179 | +1. **Fix `supports_reasoning()` for Bedrock:** |
| 180 | + - Normalize provider names in `_get_model_info_helper()` or `_supports_factory()` |
| 181 | + - Handle `"bedrock"` vs `"bedrock_converse"` mismatch |
| 182 | + - Make `model_prices_and_context_window.json` the single source of truth |
| 183 | + |
| 184 | +2. **Remove hardcoded model checks:** |
| 185 | + - Delete `_is_claude_4_6_model()`, `_is_opus_4_7_model()`, etc. |
| 186 | + - Replace with `supports_reasoning(model, provider)` calls |
| 187 | + - Let JSON entries drive behavior |
| 188 | + |
| 189 | +3. **Add proper capability flags to JSON:** |
| 190 | + - `supports_extended_thinking` |
| 191 | + - `supports_adaptive_thinking` |
| 192 | + - `supports_reasoning_effort` |
| 193 | + - `supports_xhigh_reasoning_effort` |
| 194 | + |
| 195 | +--- |
| 196 | + |
| 197 | +## Related Issues & Context |
| 198 | + |
| 199 | +- **GitHub Discussion #22555** - "Day 0 Model Release Gaps" (documents the architectural problem) |
| 200 | +- **PR #24053** (Open) - "fix: added thinking and reasoning support for all claude 4+ models on bedrock" |
| 201 | + - Flagged for violating no-hardcode policy |
| 202 | + - Shows the root cause in `supports_reasoning()` provider mismatch |
| 203 | +- **PR #25867** - "Litellm day 0 opus 4.7 support" (merged) |
| 204 | +- **PR #25876** - "Litellm hotfix opus 4.7" (merged April 16, 2026) |
| 205 | +- **Anthropic Opus 4.7 Announcement** - April 16, 2026 |
| 206 | + |
| 207 | +--- |
| 208 | + |
| 209 | +## Conclusion |
| 210 | + |
| 211 | +The Opus 4.7 day 0 launch PRs extended hardcoded model checks (known anti-pattern) to support the new model. This likely caused unintended side effects for Haiku 4.5 on Bedrock: |
| 212 | + |
| 213 | +1. **Thinking metadata incorrectly sent** to Haiku 4.5 models |
| 214 | +2. **Bedrock API rejections** with "unsupported thinking metadata" errors |
| 215 | +3. **Retry storms** or expensive fallback logic |
| 216 | +4. **Cost spike** observed by users |
| 217 | + |
| 218 | +The root cause is the architectural pattern of hardcoded model checks instead of using the model capability JSON as the single source of truth, combined with the `supports_reasoning()` function not working correctly for Bedrock providers. |
| 219 | + |
| 220 | +**Immediate action needed:** Revert or patch the Opus 4.7 changes to explicitly exclude Haiku 4.5 from thinking/reasoning metadata logic. |
| 221 | + |
| 222 | +**Follow-up action:** Implement the architectural fix described in Discussion #22555 to prevent this category of bugs in future model releases. |
0 commit comments