You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: SM87/SM89 adaptation and community PRs integration
This commit adapts text-embeddings-inference for NVIDIA Jetson Orin (SM87)
and L4 GPU (SM89), and integrates valuable community PRs.
Changes:
1. SM87/SM89 CUDA Support
- Added compute capability 8.7 and 8.9 support
- Modified Dockerfile-cuda-all for multi-arch builds
- Updated compute_cap.rs for SM87/89 detection
Files: Dockerfile-cuda-all, cuda-all-entrypoint.sh, compute_cap.rs
2. PR #730: Qwen3 Reranker Support
- Added classification head for Qwen3 reranking
- Implemented template formatting system for chat-based reranking
Files: models/qwen3.rs, core/templates.rs, core/lib.rs
3. PR #787: Batch Notification Performance Optimization
- Implemented AtomicUsize counter for batch processing
- Reduced unnecessary notify_one() calls
- Only last request in batch triggers thread notification
Files: core/infer.rs, router/http/server.rs, router/grpc/server.rs
4. PR #753: GeLU Activation Consistency Fix
- Changed Gelu from approximate (gelu) to exact (gelu_erf)
- Added NewGelu variant for backward compatibility
Files: layers/linear.rs
5. PR #790: StaticEmbedding Model Support
- Added support for 0_StaticEmbedding/ directory structure
- Implemented fallback loading for model weights and tokenizer
- Default to Mean pooling for StaticEmbedding models
Files: models/static_embedding.rs (new), lib.rs, download.rs, router/lib.rs
6. PR #746: DebertaV2 Sequence Classification Support
- Complete DebertaV2 model implementation
- Support for sequence classification tasks (e.g., Llama Prompt Guard)
- CPU and CUDA device support
Files: models/debertav2.rs (new), lib.rs, models/mod.rs
All changes have been tested and compile successfully with:
cargo check --all-targets
Compilation verified with CUDA support:
cargo install --path router -F candle-cuda
Target Hardware: NVIDIA Jetson Orin AGX (SM87), L4 GPU (SM89)
Date: January 5, 2026
0 commit comments