Add M2M100/NLLB support (nllb-200-distilled-600M, 1.3B, 3.3B)#19
Open
dschulmeist wants to merge 1 commit into
Open
Add M2M100/NLLB support (nllb-200-distilled-600M, 1.3B, 3.3B)#19dschulmeist wants to merge 1 commit into
dschulmeist wants to merge 1 commit into
Conversation
0927ac4 to
11df39e
Compare
NickLucche
reviewed
Apr 16, 2026
Collaborator
NickLucche
left a comment
There was a problem hiding this comment.
Hey @dschulmeist thanks for contributing!
Would you mind pushing the v0.18 fixes in a separate PR?
I thought we were vllm-0.18 compatible with the latest release, if that's not the case I could use your fix to issue a separate patch release.
Contributor
Author
|
Makes sense. I split the v0.18 compatibility changes into a separate PR and keep this one on M2M100/NLLB support |
Contributor
Author
|
Split out the generic I also updated this branch so |
Adds M2M100ForConditionalGeneration support for the three NLLB distilled translation models: facebook/nllb-200-distilled-600M, 1.3B, and 3.3B. All three share model_type=m2m_100. Architecture differences from BART implemented in nllb.py: - Sinusoidal (fixed) positional embeddings instead of learned - PRE-LayerNorm (norm before sublayer) instead of POST-LayerNorm - Additional layer_norm after all encoder/decoder layers - ReLU activation instead of GELU - No final_logits_bias Language routing: - Decoder starts with target language token via create_decoder_prompt, which resolves the FLORES-200 code (e.g. "fra_Latn") via convert_tokens_to_ids for reliable special-token handling. - Source language token is prepended to the encoder input via src_lang in mm_processor_kwargs (default "eng_Latn"); a make_nllb_prompt helper is provided. Depends on the BART processor vLLM 0.18 compatibility fix (PR vllm-project#20): M2M100MultiModalProcessor inherits create_encoder_prompt from BartMultiModalProcessor and needs the [0] placeholder behavior to function under vLLM >=0.18. Tests: - 12 unit tests (tests/test_nllb_model_structure.py), no GPU required - 13 integration tests (tests/test_nllb_inference.py) covering 4 target scripts, 3 non-English sources, batching, determinism, and max_tokens. All 13 pass on NVIDIA GB10 (DGX Spark) with vLLM 0.18.0. Signed-off-by: David Schulmeister <dschulmeist@users.noreply.github.com>
4f0c152 to
e5a46c1
Compare
Contributor
Author
Contributor
Author
|
Hey @NickLucche both PRs are ready whenever you get a chance. #20 (v0.18 compat fix) first, then this one builds on top. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
M2M100ForConditionalGenerationsupport for Meta's NLLB distilled translation models:facebook/nllb-200-distilled-600Mfacebook/nllb-200-distilled-1.3Bfacebook/nllb-200-3.3BAll three share
model_type=m2m_100and are registered underM2M100ForConditionalGeneration.Depends on #20
M2M100MultiModalProcessorinheritscreate_encoder_promptfromBartMultiModalProcessor, so this feature requires the vLLM 0.18 compatibility fix in #20 to function under vLLM >=0.18. The generic compatibility changes were split into #20 per maintainer request.Architecture differences from BART
final_logits_biasLanguage routing
src_langinmm_processor_kwargs(defaulteng_Latn).create_decoder_promptresolves the FLORES-200 target language code to its token ID viatokenizer.convert_tokens_to_ids.Tests
tests/test_nllb_model_structure.py) — no GPU requiredtests/test_nllb_inference.py) — 4 target scripts, 3 non-English sources, batch, determinism,max_tokensAll 13 integration tests pass on NVIDIA GB10 (DGX Spark) with vLLM 0.18.0.