Skip to content
This repository was archived by the owner on Apr 20, 2026. It is now read-only.

Sync with upstream#24

Open
lionello wants to merge 37 commits intoDefangLabs:defangfrom
aws-samples:main
Open

Sync with upstream#24
lionello wants to merge 37 commits intoDefangLabs:defangfrom
aws-samples:main

Conversation

@lionello
Copy link
Copy Markdown
Member

No description provided.

UniMa007 and others added 5 commits May 27, 2025 21:52
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#131)

---------

Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>
@lionello lionello requested a review from nullfunc July 11, 2025 14:22
heisenbergye and others added 23 commits July 21, 2025 16:44
Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>
* models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs

* Fix rebase issue

---------

Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>
* chore: update requirements to fix vulnerability

* Update Python base image to version 3.13-slim
#180)

This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929),
Anthropic's most intelligent model with enhanced coding capabilities and complex agent support.

Changes:
- Added global cross-region inference profile discovery (global.anthropic.*)
- Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously)
- Fixed reasoning_effort parameter handling to prevent KeyError
- Added extended thinking/interleaved thinking support via extra_body parameter
- Updated documentation with Claude Sonnet 4.5 examples (English and Chinese)
- Updated README with Sonnet 4.5 announcement

Technical Details:
- src/api/models/bedrock.py: Added global profile support in list_bedrock_models()
- src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter
- src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError
- docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples
- docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples
- docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation

Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
- Run Docker container as non-root user (appuser) to minimize security risks
- Add Docker HEALTHCHECK for better container orchestration
- Make CORS configurable via ALLOWED_ORIGINS env var with security warning
- Replace assertions with proper error handling (TypeError/ValueError)
- Add 30s timeout to HTTP requests to prevent hanging connections
- Disable auto-reload in production uvicorn settings
Add comprehensive prompt caching support with flexible control options:

Features:
- ENV variable control (ENABLE_PROMPT_CACHING, default: false)
- Per-request control via extra_body.prompt_caching
- Pattern-based model detection (Claude, Nova)
- Token limit warnings (Nova 20K limit)
- OpenAI-compatible response format (prompt_tokens_details.cached_tokens)

Supported models:
- Claude 3+ models (anthropic.claude-*)
- Nova models (amazon.nova-*)
- Auto-detection prevents breaking unsupported models

Implementation:
- System prompts caching via extra_body.prompt_caching.system
- Messages caching via extra_body.prompt_caching.messages
- Non-streaming and streaming modes
- Compatible with reasoning, thinking, and tool calls
- Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles
- Remove unused region prefix functions and defaultdict import
- Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts
- Improve model ARN parsing and error handling in profile enumeration
- Consolidate profile metadata storage to enable consistent feature detection
Added handling for message and content block deltas, including safety checks for open thinking tags.

Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.
The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.
…requiring the user to cd manually (#202)

* fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually

* Add docker-compose to support running locally
…lity

Docker BuildKit (especially with docker-container driver) may create
OCI image manifests with attestations that AWS Lambda does not support.
Lambda requires Docker V2 Schema 2 format without multi-manifest index.

This fix ensures the build script generates Lambda-compatible images
regardless of the user's Docker/BuildKit configuration.

Fixes #206
Co-authored-by: Hooman Yar <yarhooma@amazon.com>
Replace ALB + Lambda architecture with API Gateway REST API + Lambda
using response streaming for SSE support. This provides:

- No VPC required, reducing complexity and cost
- Native streaming support via API Gateway response streaming
- Pay-per-request pricing model

Changes:
- Add Lambda Web Adapter to Dockerfile for streaming support
- Replace BedrockProxy.template with API Gateway configuration
- Update README with new deployment options and latest models
- Update architecture diagram for API Gateway flow
Update dependencies to fix HIGH severity ReDoS vulnerability:
- fastapi==0.128.0
- starlette==0.49.1

CVE-2025-62727 allows unauthenticated attackers to send crafted HTTP
Range headers that trigger quadratic-time processing in FileResponse
Range parsing, causing CPU exhaustion and DoS.

Fixes #215
0xhmn and others added 9 commits February 12, 2026 15:21
Co-authored-by: Hooman Yar <yarhooma@amazon.com>
PR #193 added tiktoken preloading to Dockerfile_ecs but the same fix
was not applied to the Lambda Dockerfile. This causes a ConnectTimeout
error in network-restricted environments (e.g. Lambda in VPC without
NAT Gateway) when tiktoken tries to download cl100k_base encoding at
runtime from openaipublic.blob.core.windows.net.

Cache the encoding at build time, consistent with Dockerfile_ecs.

Related to #118
* feat: add Amazon Nova 2 multimodal embeddings support

Adds support for `amazon.nova-2-multimodal-embeddings-v1:0` via the
new `NovaEmbeddingsModel` class, using the `taskType`/`singleEmbeddingParams`
request format documented in the Nova 2 user guide.

- Supports single and batch text inputs
- Respects the `dimensions` parameter (256/512/1024/2048/3072, default 3072)
- Supports `float` and `base64` encoding formats
- Includes `test_nova_embed.py` for quick end-to-end verification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove test script from repo

Test script moved to PR description instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: validate Nova embedding dimensions and fix falsy-zero bug

- Add VALID_DIMENSIONS set and upfront validation with a clear error message
- Fix `dimensions or DEFAULT` which would incorrectly ignore dimensions=0
- Add inline comment explaining approximate token counting (Nova API
  does not return token counts in the response)

* fix: address PR review comments for NovaEmbeddingsModel

- Fix VALID_DIMENSIONS to {256, 384, 1024, 3072} per Nova embeddings schema docs
  (previous values 512/2048 were mistakenly referenced from Titan embedding model docs)
- Replace str(item) fallback with HTTPException(400) to avoid silent garbage embeddings
- Update schema.py dimensions comment: 'not used' -> 'Used by Nova embeddings'
- Replace getattr() with direct .dimensions access on Pydantic model
- Move dimension validation before the loop (validates once, not per-text)
- Add enumerate to batch loop; include input index in error detail
- Switch isinstance(item, Iterable) to isinstance(item, list) for precise matching
- Add comment explaining embeddingPurpose hardcoded to GENERIC_INDEX

---------

Co-authored-by: Gabriel <gabrielkoo@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When both reasoning_effort and extra_body are provided,
additionalModelRequestFields set by reasoning_effort (containing
reasoning_config) was silently overwritten by extra_body processing.
This prevented features like anthropic_beta for 1M context from
coexisting with reasoning_effort.
Bumps [requests](https://github.com/psf/requests) from 2.32.4 to 2.33.0.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.32.4...v2.33.0)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.33.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.