Sync with upstream by lionello · Pull Request #24 · DefangLabs/openai-access-gateway

lionello · 2025-06-27T17:21:37Z

No description provided.

Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.3...v2.32.4) --- updated-dependencies: - dependency-name: requests dependency-version: 2.32.4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…#131) --------- Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>

Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>

Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>

* models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs * Fix rebase issue --------- Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>

* chore: update requirements to fix vulnerability * Update Python base image to version 3.13-slim

#180) This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929), Anthropic's most intelligent model with enhanced coding capabilities and complex agent support. Changes: - Added global cross-region inference profile discovery (global.anthropic.*) - Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously) - Fixed reasoning_effort parameter handling to prevent KeyError - Added extended thinking/interleaved thinking support via extra_body parameter - Updated documentation with Claude Sonnet 4.5 examples (English and Chinese) - Updated README with Sonnet 4.5 announcement Technical Details: - src/api/models/bedrock.py: Added global profile support in list_bedrock_models() - src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter - src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError - docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples - docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples - docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0

- Run Docker container as non-root user (appuser) to minimize security risks - Add Docker HEALTHCHECK for better container orchestration - Make CORS configurable via ALLOWED_ORIGINS env var with security warning - Replace assertions with proper error handling (TypeError/ValueError) - Add 30s timeout to HTTP requests to prevent hanging connections - Disable auto-reload in production uvicorn settings

…oken (#184)

Add comprehensive prompt caching support with flexible control options: Features: - ENV variable control (ENABLE_PROMPT_CACHING, default: false) - Per-request control via extra_body.prompt_caching - Pattern-based model detection (Claude, Nova) - Token limit warnings (Nova 20K limit) - OpenAI-compatible response format (prompt_tokens_details.cached_tokens) Supported models: - Claude 3+ models (anthropic.claude-*) - Nova models (amazon.nova-*) - Auto-detection prevents breaking unsupported models Implementation: - System prompts caching via extra_body.prompt_caching.system - Messages caching via extra_body.prompt_caching.messages - Non-streaming and streaming modes - Compatible with reasoning, thinking, and tool calls

- Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles - Remove unused region prefix functions and defaultdict import - Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts - Improve model ARN parsing and error handling in profile enumeration - Consolidate profile metadata storage to enable consistent feature detection

Added handling for message and content block deltas, including safety checks for open thinking tags. Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.

The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.

…requiring the user to cd manually (#202) * fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually * Add docker-compose to support running locally

…lity Docker BuildKit (especially with docker-container driver) may create OCI image manifests with attestations that AWS Lambda does not support. Lambda requires Docker V2 Schema 2 format without multi-manifest index. This fix ensures the build script generates Lambda-compatible images regardless of the user's Docker/BuildKit configuration. Fixes #206

Co-authored-by: Hooman Yar <yarhooma@amazon.com>

Replace ALB + Lambda architecture with API Gateway REST API + Lambda using response streaming for SSE support. This provides: - No VPC required, reducing complexity and cost - Native streaming support via API Gateway response streaming - Pay-per-request pricing model Changes: - Add Lambda Web Adapter to Dockerfile for streaming support - Replace BedrockProxy.template with API Gateway configuration - Update README with new deployment options and latest models - Update architecture diagram for API Gateway flow

Update dependencies to fix HIGH severity ReDoS vulnerability: - fastapi==0.128.0 - starlette==0.49.1 CVE-2025-62727 allows unauthenticated attackers to send crafted HTTP Range headers that trigger quadratic-time processing in FileResponse Range parsing, causing CPU exhaustion and DoS. Fixes #215

Co-authored-by: Hooman Yar <yarhooma@amazon.com>

PR #193 added tiktoken preloading to Dockerfile_ecs but the same fix was not applied to the Lambda Dockerfile. This causes a ConnectTimeout error in network-restricted environments (e.g. Lambda in VPC without NAT Gateway) when tiktoken tries to download cl100k_base encoding at runtime from openaipublic.blob.core.windows.net. Cache the encoding at build time, consistent with Dockerfile_ecs. Related to #118

* feat: add Amazon Nova 2 multimodal embeddings support Adds support for `amazon.nova-2-multimodal-embeddings-v1:0` via the new `NovaEmbeddingsModel` class, using the `taskType`/`singleEmbeddingParams` request format documented in the Nova 2 user guide. - Supports single and batch text inputs - Respects the `dimensions` parameter (256/512/1024/2048/3072, default 3072) - Supports `float` and `base64` encoding formats - Includes `test_nova_embed.py` for quick end-to-end verification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: remove test script from repo Test script moved to PR description instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: validate Nova embedding dimensions and fix falsy-zero bug - Add VALID_DIMENSIONS set and upfront validation with a clear error message - Fix `dimensions or DEFAULT` which would incorrectly ignore dimensions=0 - Add inline comment explaining approximate token counting (Nova API does not return token counts in the response) * fix: address PR review comments for NovaEmbeddingsModel - Fix VALID_DIMENSIONS to {256, 384, 1024, 3072} per Nova embeddings schema docs (previous values 512/2048 were mistakenly referenced from Titan embedding model docs) - Replace str(item) fallback with HTTPException(400) to avoid silent garbage embeddings - Update schema.py dimensions comment: 'not used' -> 'Used by Nova embeddings' - Replace getattr() with direct .dimensions access on Pydantic model - Move dimension validation before the loop (validates once, not per-text) - Add enumerate to batch loop; include input index in error detail - Switch isinstance(item, Iterable) to isinstance(item, list) for precise matching - Add comment explaining embeddingPurpose hardcoded to GENERIC_INDEX --------- Co-authored-by: Gabriel <gabrielkoo@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

When both reasoning_effort and extra_body are provided, additionalModelRequestFields set by reasoning_effort (containing reasoning_config) was silently overwritten by extra_body processing. This prevented features like anthropic_beta for 1M context from coexisting with reasoning_effort.

Bumps [requests](https://github.com/psf/requests) from 2.32.4 to 2.33.0. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.4...v2.33.0) --- updated-dependencies: - dependency-name: requests dependency-version: 2.33.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ens (#241)

UniMa007 and others added 5 commits May 27, 2025 21:52

Add Titan Embeddings G2 (#94)

aed5730

add titan G1 embeddings (#152)

844efec

feat: add support to include application inference profiles as models (…

0183608

…#131) --------- Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>

fix: properly handle tool_use messages in conversation

76a3614

lionello requested a review from nullfunc July 11, 2025 14:22

nullfunc approved these changes Jul 11, 2025

View reviewed changes

heisenbergye and others added 23 commits July 21, 2025 16:44

feat: support Claude 4 Interleaved thinking (beta) (#164)

3f1b56a

Add pagination to list_inference_profiles calls (#173)

a2110ff

Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>

chore: update requirements to fix vulnerability (#177)

bdfa57c

* chore: update requirements to fix vulnerability * Update Python base image to version 3.13-slim

docs: update deployment instructions and enhance ECR push script

e3ee9a7

chore: cleanup useless files

371d11d

Support <think> tags (#117)

8177876

fix: ECS container /health endpoint does not require API_KEY Bearer T…

7756532

…oken (#184)

🐳 preload tiktoken encoding in Dockerfile_ecs (#193)

18b68bd

fix: Fix invalid cache_creation_tokens metric key (#195)

7e03ab0

Fixed <think> </think> tags for GPT-OSS in bedrock.py (#200)

ce4cfab

Added handling for message and content block deltas, including safety checks for open thinking tags. Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.

Fix healthcheck in Dockerfile_ecs (#199)

b3c1c82

The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.

fix: Allow the push-to-ecr.sh script to run from anywhere instead of …

37374e7

…requiring the user to cd manually (#202) * fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually * Add docker-compose to support running locally

feat: add claude-opus-4-5 to TEMPERATURE_TOPP_CONFLICT_MODELS set (#208)

0411454

Co-authored-by: Hooman Yar <yarhooma@amazon.com>

Add support for 'developer' role in chat messages (#209)

1a7f55b

0xhmn and others added 9 commits February 12, 2026 15:21

fix: support continue response for claude opus 4.6 (#219)

a150f7b

Co-authored-by: Hooman Yar <yarhooma@amazon.com>

fix: Support reasoning_tokens at bedrock streaming response (#223)

d1dc4ed

fix: Fix ImageContent schema to use proper default value (#234)

737cf07

add claude-sonnet-4-6 to TEMPERATURE_TOPP_CONFLICT_MODELS list (#238)

e5f7d3d

fix: use None as default for max_tokens and prefer max_completion_tok…

9697a7f

…ens (#241)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with upstream#24

Sync with upstream#24
lionello wants to merge 37 commits intoDefangLabs:defangfrom
aws-samples:main

lionello commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

lionello commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants