Add Amazon Bedrock support for Claude models by gbaydin · Pull Request #66 · huggingface/ml-intern

gbaydin · 2026-04-23T00:05:16Z

Summary

Routes bedrock/<model> model IDs through LiteLLM's Bedrock adapter with the same thinking/effort params as the anthropic/ path (same Claude models, same API shape)
Adds boto3 as a core dependency (required by LiteLLM's Bedrock adapter)
Updates model switcher, error messages, and README with Bedrock guidance

Usage

# Set AWS credentials
export AWS_REGION_NAME=us-east-1
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
# Or for SSO / identity-center auth:
export AWS_BEARER_TOKEN_BEDROCK=...

# Use cross-region inference profile IDs
ml-intern --model bedrock/us.anthropic.claude-opus-4-6-v1 "your prompt"

# Or switch interactively
/model bedrock/us.anthropic.claude-opus-4-6-v1

Test plan

Verified _resolve_llm_params returns correct params for bedrock/ prefix
Verified is_valid_model_id accepts bedrock/ model IDs
Verified _get_max_tokens_safe returns 1M for bedrock/anthropic.claude-opus-4-6-v1
Verified existing anthropic/, openai/, and HF router paths are unchanged
Tested /model bedrock/us.anthropic.claude-opus-4-6-v1 end-to-end with live AWS credentials

🤖 Generated with Claude Code

Route `bedrock/<model>` model IDs through LiteLLM's Bedrock adapter with the same thinking/effort params as the `anthropic/` path. LiteLLM handles AWS auth (access keys or bearer token for SSO). Also adds boto3 as a core dependency (required by LiteLLM's Bedrock adapter). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ption) The retry loop previously only wrapped the acompletion() call. Transient errors during stream iteration (async for chunk in response) crashed the agent. Now the entire streaming path is inside the retry loop — on a mid-stream failure, partial content is discarded and the call retries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

akseljoonas · 2026-04-24T14:08:18Z

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

claude · 2026-04-24T14:08:38Z

Claude finished @akseljoonas's task in 2m 38s —— View job

PR Review: Add Amazon Bedrock Support

Read changed files and diff
Analyze correctness, security, and routing impact
Post review findings

Verdict: LGTM with two items to address. The core routing logic is correct and the mid-stream retry fix is a genuine improvement. Two issues worth resolving before merge, one moderate and one minor.

Commit 1 — `cbb8570`: Bedrock routing

`agent/core/llm_params.py` — ✅ Correct

The _resolve_anthropic_effort extraction is a clean refactor. The inline logic was copied exactly — same minimal → low normalization, same _ANTHROPIC_EFFORTS guard, same thinking/output_config shape. No behavior change on the anthropic/ path.

The bedrock/ branch delegates to the same function:

# llm_params.py:165-166
if model_name.startswith("bedrock/"):
    return _resolve_anthropic_effort({"model": model_name}, reasoning_effort, strict)

LiteLLM's Bedrock adapter receives thinking={"type": "adaptive"} and output_config={"effort": level} as top-level kwargs, the same way the Anthropic adapter does. The effort probe cascade (in effort_probe.py) will handle the case where thinking isn't accepted on a specific Bedrock model — it walks down max → xhigh → high → … and caches the result. Graceful degradation is covered. ✅

`agent/core/model_switcher.py` — ✅ Correct

The _print_hf_routing_info update correctly short-circuits the HF router catalog lookup for bedrock/ models:

# model_switcher.py:67
if model_id.startswith(("anthropic/", "openai/", "bedrock/")):
    return True

Without this, a bedrock/ id would hit the HF catalog, get a "not found" warning, and confuse the user.

`pyproject.toml` — ⚠️ Moderate concern

"boto3>=1.35.0",   # core dependency

boto3 is ~30 MB and pulls in botocore, s3transfer, etc. All users — including those who never touch Bedrock — pay this cost. Suggest moving to an optional extra:

[project.optional-dependencies]
bedrock = ["boto3>=1.35.0"]

and documenting pip install hf-agent[bedrock] (or uv sync --extra bedrock) in the README. Fix this →

Commit 2 — `83727ef`: Mid-stream retry fix

`agent/core/agent_loop.py` — ✅ Correct, genuine bug fix

Previously the retry loop only wrapped acompletion() and broke out immediately:

response = await acompletion(...)
break  # <-- stream iteration happened OUTSIDE the retry loop

A internalServerException thrown mid-stream would escape the retry entirely and crash the turn. The fix correctly moves the async for chunk in response: iteration inside the try block so transient errors during streaming are caught and retried.

The reset of accumulators at the top of each attempt is correct:

# agent_loop.py:307-310
full_content = ""
tool_calls_acc: dict[int, dict] = {}
token_count = 0
finish_reason = None

The sentinel raise RuntimeError("Exhausted LLM retries without returning or raising") at the end is the right safeguard — previously the code would have fallen through and returned an empty LLMResult.

One thing to be aware of: if a mid-stream retry is triggered after some assistant_chunk events were already emitted to the frontend, the user sees partial text that gets "reset" invisibly. This is a UX quirk, not a correctness bug, and is inherent to the retry-from-scratch design.

Minor concern: cross-region inference profile IDs and `_get_max_tokens_safe`

The README example uses bedrock/us.anthropic.claude-opus-4-6-v1 (with us. prefix — the standard cross-region inference profile format). The test plan verifies _get_max_tokens_safe for bedrock/anthropic.claude-opus-4-6-v1 (no prefix).

_get_max_tokens_safe in session.py:21 calls litellm.get_model_info(model_name) with the full model name. LiteLLM's catalog knows bedrock/anthropic.claude-opus-4-6-v1 → 1M tokens, but almost certainly does NOT know bedrock/us.anthropic.claude-opus-4-6-v1. This causes silent fallback to 200k, which sets the compaction threshold at ~180k instead of ~900k — users on Opus 4.6 via cross-region profiles would see unnecessarily frequent compaction.

session.py strips the huggingface/ prefix and :tag suffix but not regional prefixes. A targeted fix would be to normalize bedrock/us.<rest> → bedrock/<rest> (and similarly eu./ap.) before the get_model_info lookup. Fix this →

Security

No concerns. AWS credentials flow exclusively through LiteLLM's boto3 credential chain (env vars, ~/.aws/, IAM role) — nothing hardcoded, nothing logged. Model IDs follow the same unvalidated-pass-through pattern as existing anthropic/ and openai/ paths.

Summary

Item	Severity	Notes
`boto3` as core dep	Moderate	~30 MB penalty for all users; should be optional extra
Cross-region profile ID → 200k fallback in `_get_max_tokens_safe`	Minor	Premature compaction for Opus users with `us.` prefix
Mid-stream retry fix	✅	Genuine bug fix, correct implementation
Bedrock effort/thinking routing	✅	Correct; cascade covers degradation
Existing paths unchanged	✅	Verified by inspection

gbaydin and others added 2 commits April 23, 2026 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Amazon Bedrock support for Claude models#66

Add Amazon Bedrock support for Claude models#66
gbaydin wants to merge 2 commits intohuggingface:mainfrom
gbaydin:feature/bedrock-support

gbaydin commented Apr 23, 2026

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gbaydin commented Apr 23, 2026

Summary

Usage

Test plan

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add Amazon Bedrock Support

Commit 1 — cbb8570: Bedrock routing

agent/core/llm_params.py — ✅ Correct

agent/core/model_switcher.py — ✅ Correct

pyproject.toml — ⚠️ Moderate concern

Commit 2 — 83727ef: Mid-stream retry fix

agent/core/agent_loop.py — ✅ Correct, genuine bug fix

Minor concern: cross-region inference profile IDs and _get_max_tokens_safe

Security

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Apr 24, 2026 •

edited

Loading

Commit 1 — `cbb8570`: Bedrock routing

`agent/core/llm_params.py` — ✅ Correct

`agent/core/model_switcher.py` — ✅ Correct

`pyproject.toml` — ⚠️ Moderate concern

Commit 2 — `83727ef`: Mid-stream retry fix

`agent/core/agent_loop.py` — ✅ Correct, genuine bug fix

Minor concern: cross-region inference profile IDs and `_get_max_tokens_safe`