Skip to content

Add Amazon Bedrock support for Claude models#66

Open
gbaydin wants to merge 2 commits intohuggingface:mainfrom
gbaydin:feature/bedrock-support
Open

Add Amazon Bedrock support for Claude models#66
gbaydin wants to merge 2 commits intohuggingface:mainfrom
gbaydin:feature/bedrock-support

Conversation

@gbaydin
Copy link
Copy Markdown

@gbaydin gbaydin commented Apr 23, 2026

Summary

  • Routes bedrock/<model> model IDs through LiteLLM's Bedrock adapter with the same thinking/effort params as the anthropic/ path (same Claude models, same API shape)
  • Adds boto3 as a core dependency (required by LiteLLM's Bedrock adapter)
  • Updates model switcher, error messages, and README with Bedrock guidance

Usage

# Set AWS credentials
export AWS_REGION_NAME=us-east-1
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
# Or for SSO / identity-center auth:
export AWS_BEARER_TOKEN_BEDROCK=...

# Use cross-region inference profile IDs
ml-intern --model bedrock/us.anthropic.claude-opus-4-6-v1 "your prompt"

# Or switch interactively
/model bedrock/us.anthropic.claude-opus-4-6-v1

Test plan

  • Verified _resolve_llm_params returns correct params for bedrock/ prefix
  • Verified is_valid_model_id accepts bedrock/ model IDs
  • Verified _get_max_tokens_safe returns 1M for bedrock/anthropic.claude-opus-4-6-v1
  • Verified existing anthropic/, openai/, and HF router paths are unchanged
  • Tested /model bedrock/us.anthropic.claude-opus-4-6-v1 end-to-end with live AWS credentials

🤖 Generated with Claude Code

gbaydin and others added 2 commits April 23, 2026 01:04
Route `bedrock/<model>` model IDs through LiteLLM's Bedrock adapter with
the same thinking/effort params as the `anthropic/` path. LiteLLM handles
AWS auth (access keys or bearer token for SSO). Also adds boto3 as a core
dependency (required by LiteLLM's Bedrock adapter).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ption)

The retry loop previously only wrapped the acompletion() call. Transient
errors during stream iteration (async for chunk in response) crashed the
agent. Now the entire streaming path is inside the retry loop — on a
mid-stream failure, partial content is discarded and the call retries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@akseljoonas
Copy link
Copy Markdown
Collaborator

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @akseljoonas's task in 2m 38s —— View job


PR Review: Add Amazon Bedrock Support

  • Read changed files and diff
  • Analyze correctness, security, and routing impact
  • Post review findings

Verdict: LGTM with two items to address. The core routing logic is correct and the mid-stream retry fix is a genuine improvement. Two issues worth resolving before merge, one moderate and one minor.


Commit 1 — cbb8570: Bedrock routing

agent/core/llm_params.py — ✅ Correct

The _resolve_anthropic_effort extraction is a clean refactor. The inline logic was copied exactly — same minimal → low normalization, same _ANTHROPIC_EFFORTS guard, same thinking/output_config shape. No behavior change on the anthropic/ path.

The bedrock/ branch delegates to the same function:

# llm_params.py:165-166
if model_name.startswith("bedrock/"):
    return _resolve_anthropic_effort({"model": model_name}, reasoning_effort, strict)

LiteLLM's Bedrock adapter receives thinking={"type": "adaptive"} and output_config={"effort": level} as top-level kwargs, the same way the Anthropic adapter does. The effort probe cascade (in effort_probe.py) will handle the case where thinking isn't accepted on a specific Bedrock model — it walks down max → xhigh → high → … and caches the result. Graceful degradation is covered. ✅

agent/core/model_switcher.py — ✅ Correct

The _print_hf_routing_info update correctly short-circuits the HF router catalog lookup for bedrock/ models:

# model_switcher.py:67
if model_id.startswith(("anthropic/", "openai/", "bedrock/")):
    return True

Without this, a bedrock/ id would hit the HF catalog, get a "not found" warning, and confuse the user.

pyproject.toml⚠️ Moderate concern

"boto3>=1.35.0",   # core dependency

boto3 is ~30 MB and pulls in botocore, s3transfer, etc. All users — including those who never touch Bedrock — pay this cost. Suggest moving to an optional extra:

[project.optional-dependencies]
bedrock = ["boto3>=1.35.0"]

and documenting pip install hf-agent[bedrock] (or uv sync --extra bedrock) in the README. Fix this →


Commit 2 — 83727ef: Mid-stream retry fix

agent/core/agent_loop.py — ✅ Correct, genuine bug fix

Previously the retry loop only wrapped acompletion() and broke out immediately:

response = await acompletion(...)
break  # <-- stream iteration happened OUTSIDE the retry loop

A internalServerException thrown mid-stream would escape the retry entirely and crash the turn. The fix correctly moves the async for chunk in response: iteration inside the try block so transient errors during streaming are caught and retried.

The reset of accumulators at the top of each attempt is correct:

# agent_loop.py:307-310
full_content = ""
tool_calls_acc: dict[int, dict] = {}
token_count = 0
finish_reason = None

The sentinel raise RuntimeError("Exhausted LLM retries without returning or raising") at the end is the right safeguard — previously the code would have fallen through and returned an empty LLMResult.

One thing to be aware of: if a mid-stream retry is triggered after some assistant_chunk events were already emitted to the frontend, the user sees partial text that gets "reset" invisibly. This is a UX quirk, not a correctness bug, and is inherent to the retry-from-scratch design.


Minor concern: cross-region inference profile IDs and _get_max_tokens_safe

The README example uses bedrock/us.anthropic.claude-opus-4-6-v1 (with us. prefix — the standard cross-region inference profile format). The test plan verifies _get_max_tokens_safe for bedrock/anthropic.claude-opus-4-6-v1 (no prefix).

_get_max_tokens_safe in session.py:21 calls litellm.get_model_info(model_name) with the full model name. LiteLLM's catalog knows bedrock/anthropic.claude-opus-4-6-v1 → 1M tokens, but almost certainly does NOT know bedrock/us.anthropic.claude-opus-4-6-v1. This causes silent fallback to 200k, which sets the compaction threshold at ~180k instead of ~900k — users on Opus 4.6 via cross-region profiles would see unnecessarily frequent compaction.

session.py strips the huggingface/ prefix and :tag suffix but not regional prefixes. A targeted fix would be to normalize bedrock/us.<rest>bedrock/<rest> (and similarly eu./ap.) before the get_model_info lookup. Fix this →


Security

No concerns. AWS credentials flow exclusively through LiteLLM's boto3 credential chain (env vars, ~/.aws/, IAM role) — nothing hardcoded, nothing logged. Model IDs follow the same unvalidated-pass-through pattern as existing anthropic/ and openai/ paths.


Summary

Item Severity Notes
boto3 as core dep Moderate ~30 MB penalty for all users; should be optional extra
Cross-region profile ID → 200k fallback in _get_max_tokens_safe Minor Premature compaction for Opus users with us. prefix
Mid-stream retry fix Genuine bug fix, correct implementation
Bedrock effort/thinking routing Correct; cascade covers degradation
Existing paths unchanged Verified by inspection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants