Add bidirectional attention and projection layer support for Qwen3-based models#808
Conversation
bd2bc16 to
539f322
Compare
Add two new config fields to Qwen3 to support voyage-4-nano and similar models: - `use_bidirectional_attention`: When true, disables causal masking for embedding models that use full bidirectional attention - `num_labels`: When set, loads projection layer from linear.weight at safetensors root level (e.g., 1024 -> 2048 for voyage-4-nano) Both fields are backwards compatible, defaulting to disabled behavior. Changes: - backends/candle/src/models/qwen3.rs: Add config fields and CPU impl - backends/candle/src/models/flash_qwen3.rs: Add CUDA/flash-attn impl - backends/candle/tests/test_voyage_nano.rs: CPU tests with snapshots - backends/candle/tests/test_flash_voyage_nano.rs: CUDA tests - README.md, docs/source/en/supported_models.md: Add voyage-4-nano Tested with voyageai/voyage-4-nano: - Output dimension: 2048 (correct) - Cosine similarity vs transformers: 0.999965 - Inference time: ~9ms on L4 GPU (vs 35ms with transformers)
539f322 to
7e08d97
Compare
alvarobartt
left a comment
There was a problem hiding this comment.
Hey @williambarberjr thanks for the PR, indeed I have a PR to add BF16 support for both Metal and CUDA, which I'll merge before this one to make sure that we use the correct dtype for Voyage AI Embedding models 🎉
|
Q: Did you validate the cosine similarity both with and without normalization? Or only with normalized embeddings? See e.g. https://huggingface.co/voyageai/voyage-4-nano#via-sentence-transformers, which won't normalize the embeddings as the default value for |
|
See #809, as per BF16 support as mentioned in the review 🤗 |
Correct order Co-authored-by: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com>
|
Hey @williambarberjr thanks again for the PR, did you have time to read the comments? Would you need any help tackling those? I really want this feature in v1.9.0 releasing (hopefully) soon, so let me know if you'd need help 🤗 |
|
Sorry for being so slow after the quick review! Reviewed and agree with the changes/comments. Implemented them, re-ran the validation comparing TEI vs sentence-transformers (voyageai/voyage-4-nano, trust_remote_code=True) on an A100. MRL dimensions parity (2048/1024/512/256):
Normalization parity (2048 dim, 5 texts):
Extra sanity checks:
Validation environment:
|
|
Please merge this ;D |
|
Yes @gazb23 the idea is to merge this today, I'm still yet to test it myself before, but expect it to be merged today EOD 🤗 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@williambarberjr this is mostly a performance bug, since for the voyage model you need / should perform the layer after pooling. Since the pooling is activation free, running it after projection is better (2048x faster). |
What does this PR do?
This PR adds support for
voyageai/voyage-4-nano, a Qwen3-based embedding model that uses bidirectional attention and a projection layer.Changes
1. Bidirectional Attention Support
use_bidirectional_attentionconfig field (default:false)true, disables causal masking in the attention mechanism2. Projection Layer Support
num_labelsconfig field for output projection dimensionlinear.weightfrom safetensors root level and applies projection after final normalizationModel Configuration
Models using these features should have in their
config.json:{ "use_bidirectional_attention": true, "num_labels": 2048 }Testing
Tested with
voyageai/voyage-4-nano:Files Changed
backends/candle/src/models/flash_qwen3.rs- CUDA/flash attention implementationbackends/candle/src/models/qwen3.rs- CPU/Metal implementation + config structbackends/candle/Cargo.toml- Added cudarc dev-dependency for CUDA testsbackends/candle/tests/test_voyage_nano.rs- CPU test with snapshotsbackends/candle/tests/test_flash_voyage_nano.rs- CUDA test with snapshotsREADME.md- Added voyage-4-nano to supported models tabledocs/source/en/supported_models.md- Added voyage-4-nano to docsBefore submitting
instasnapshots?Who can review?
@Narsil @alvarobartt - This adds two new config fields to support voyage-4-nano embedding model. The changes are backwards compatible (both fields default to disabled behavior).