This repository was archived by the owner on Apr 20, 2026. It is now read-only.
Sync with upstream#24
Open
lionello wants to merge 37 commits intoDefangLabs:defangfrom
Open
Conversation
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.3...v2.32.4) --- updated-dependencies: - dependency-name: requests dependency-version: 2.32.4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#131) --------- Co-authored-by: Mengxin Zhu <843303+zxkane@users.noreply.github.com>
nullfunc
approved these changes
Jul 11, 2025
Updates boto3 from 1.37.0 to 1.40.4 and botocore from 1.37.0 to 1.40.4. This update enables support for AWS_BEARER_TOKEN_BEDROCK functionality and includes the latest AWS service features and bug fixes. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Rizvi Rahim <rizvi@rizvir.com>
* models: fix Application Inference Profiles mapping to include all profiles per model_id; switch to defaultdict(set) and emit all AIPs * Fix rebase issue --------- Co-authored-by: Jeremy Brockett <313937+jbrockett@users.noreply.github.com>
* chore: update requirements to fix vulnerability * Update Python base image to version 3.13-slim
#180) This commit adds comprehensive support for Claude Sonnet 4.5 (claude-sonnet-4-5-20250929), Anthropic's most intelligent model with enhanced coding capabilities and complex agent support. Changes: - Added global cross-region inference profile discovery (global.anthropic.*) - Fixed temperature/topP compatibility for Claude Sonnet 4.5 (model doesn't support both simultaneously) - Fixed reasoning_effort parameter handling to prevent KeyError - Added extended thinking/interleaved thinking support via extra_body parameter - Updated documentation with Claude Sonnet 4.5 examples (English and Chinese) - Updated README with Sonnet 4.5 announcement Technical Details: - src/api/models/bedrock.py: Added global profile support in list_bedrock_models() - src/api/models/bedrock.py: Added Claude Sonnet 4.5 detection to remove topP parameter - src/api/models/bedrock.py: Changed pop("topP") to pop("topP", None) to prevent KeyError - docs/Usage.md: Added Chat Completions section with Sonnet 4.5 examples - docs/Usage.md: Updated Interleaved thinking section with Sonnet 4.5 examples - docs/Usage_CN.md: Added Chinese versions of all Sonnet 4.5 documentation Model ID: global.anthropic.claude-sonnet-4-5-20250929-v1:0
- Run Docker container as non-root user (appuser) to minimize security risks - Add Docker HEALTHCHECK for better container orchestration - Make CORS configurable via ALLOWED_ORIGINS env var with security warning - Replace assertions with proper error handling (TypeError/ValueError) - Add 30s timeout to HTTP requests to prevent hanging connections - Disable auto-reload in production uvicorn settings
Add comprehensive prompt caching support with flexible control options: Features: - ENV variable control (ENABLE_PROMPT_CACHING, default: false) - Per-request control via extra_body.prompt_caching - Pattern-based model detection (Claude, Nova) - Token limit warnings (Nova 20K limit) - OpenAI-compatible response format (prompt_tokens_details.cached_tokens) Supported models: - Claude 3+ models (anthropic.claude-*) - Nova models (amazon.nova-*) - Auto-detection prevents breaking unsupported models Implementation: - System prompts caching via extra_body.prompt_caching.system - Messages caching via extra_body.prompt_caching.messages - Non-streaming and streaming modes - Compatible with reasoning, thinking, and tool calls
- Add unified profile_metadata dictionary for both SYSTEM_DEFINED and APPLICATION inference profiles - Remove unused region prefix functions and defaultdict import - Add TEMPERATURE_TOPP_CONFLICT_MODELS set for Claude model parameter conflicts - Improve model ARN parsing and error handling in profile enumeration - Consolidate profile metadata storage to enable consistent feature detection
Added handling for message and content block deltas, including safety checks for open thinking tags. Results in working reasoning and makes GPT-OSS 80/120b usable in frontends that expect closing thinking tags.
The healthcheck in Dockerfile_ecs uses the hardcoded port instead of ENV setting. This was fixed.
…requiring the user to cd manually (#202) * fix: Allow the push-to-ecr.sh script to run from anywhere instead of requiring the user to cd manually * Add docker-compose to support running locally
…lity Docker BuildKit (especially with docker-container driver) may create OCI image manifests with attestations that AWS Lambda does not support. Lambda requires Docker V2 Schema 2 format without multi-manifest index. This fix ensures the build script generates Lambda-compatible images regardless of the user's Docker/BuildKit configuration. Fixes #206
Co-authored-by: Hooman Yar <yarhooma@amazon.com>
Replace ALB + Lambda architecture with API Gateway REST API + Lambda using response streaming for SSE support. This provides: - No VPC required, reducing complexity and cost - Native streaming support via API Gateway response streaming - Pay-per-request pricing model Changes: - Add Lambda Web Adapter to Dockerfile for streaming support - Replace BedrockProxy.template with API Gateway configuration - Update README with new deployment options and latest models - Update architecture diagram for API Gateway flow
Update dependencies to fix HIGH severity ReDoS vulnerability: - fastapi==0.128.0 - starlette==0.49.1 CVE-2025-62727 allows unauthenticated attackers to send crafted HTTP Range headers that trigger quadratic-time processing in FileResponse Range parsing, causing CPU exhaustion and DoS. Fixes #215
Co-authored-by: Hooman Yar <yarhooma@amazon.com>
PR #193 added tiktoken preloading to Dockerfile_ecs but the same fix was not applied to the Lambda Dockerfile. This causes a ConnectTimeout error in network-restricted environments (e.g. Lambda in VPC without NAT Gateway) when tiktoken tries to download cl100k_base encoding at runtime from openaipublic.blob.core.windows.net. Cache the encoding at build time, consistent with Dockerfile_ecs. Related to #118
* feat: add Amazon Nova 2 multimodal embeddings support
Adds support for `amazon.nova-2-multimodal-embeddings-v1:0` via the
new `NovaEmbeddingsModel` class, using the `taskType`/`singleEmbeddingParams`
request format documented in the Nova 2 user guide.
- Supports single and batch text inputs
- Respects the `dimensions` parameter (256/512/1024/2048/3072, default 3072)
- Supports `float` and `base64` encoding formats
- Includes `test_nova_embed.py` for quick end-to-end verification
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: remove test script from repo
Test script moved to PR description instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: validate Nova embedding dimensions and fix falsy-zero bug
- Add VALID_DIMENSIONS set and upfront validation with a clear error message
- Fix `dimensions or DEFAULT` which would incorrectly ignore dimensions=0
- Add inline comment explaining approximate token counting (Nova API
does not return token counts in the response)
* fix: address PR review comments for NovaEmbeddingsModel
- Fix VALID_DIMENSIONS to {256, 384, 1024, 3072} per Nova embeddings schema docs
(previous values 512/2048 were mistakenly referenced from Titan embedding model docs)
- Replace str(item) fallback with HTTPException(400) to avoid silent garbage embeddings
- Update schema.py dimensions comment: 'not used' -> 'Used by Nova embeddings'
- Replace getattr() with direct .dimensions access on Pydantic model
- Move dimension validation before the loop (validates once, not per-text)
- Add enumerate to batch loop; include input index in error detail
- Switch isinstance(item, Iterable) to isinstance(item, list) for precise matching
- Add comment explaining embeddingPurpose hardcoded to GENERIC_INDEX
---------
Co-authored-by: Gabriel <gabrielkoo@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When both reasoning_effort and extra_body are provided, additionalModelRequestFields set by reasoning_effort (containing reasoning_config) was silently overwritten by extra_body processing. This prevented features like anthropic_beta for 1M context from coexisting with reasoning_effort.
Bumps [requests](https://github.com/psf/requests) from 2.32.4 to 2.33.0. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](psf/requests@v2.32.4...v2.33.0) --- updated-dependencies: - dependency-name: requests dependency-version: 2.33.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.