Add M2M100/NLLB support (nllb-200-distilled-600M, 1.3B, 3.3B) by dschulmeist · Pull Request #19 · vllm-project/bart-plugin

dschulmeist · 2026-04-15T20:52:47Z

Summary

Adds M2M100ForConditionalGeneration support for Meta's NLLB distilled translation models:

facebook/nllb-200-distilled-600M
facebook/nllb-200-distilled-1.3B
facebook/nllb-200-3.3B

All three share model_type=m2m_100 and are registered under M2M100ForConditionalGeneration.

Depends on #20

M2M100MultiModalProcessor inherits create_encoder_prompt from BartMultiModalProcessor, so this feature requires the vLLM 0.18 compatibility fix in #20 to function under vLLM >=0.18. The generic compatibility changes were split into #20 per maintainer request.

Architecture differences from BART

Feature	BART	M2M100/NLLB
Positional embeddings	Learned	Fixed sinusoidal
LayerNorm position	POST-norm	PRE-norm
Post-stack layer norm	No	Yes (encoder + decoder)
Activation function	GELU	ReLU
`final_logits_bias`	Yes	No

Language routing

from vllm_bart_plugin.nllb import make_nllb_prompt

prompt = make_nllb_prompt(
    "The United Nations was founded in 1945.",
    src_lang="eng_Latn",
    tgt_lang="fra_Latn",
)
out = llm.generate([prompt], SamplingParams(temperature=0.0, max_tokens=60))

Encoder: source language token prepended via src_lang in mm_processor_kwargs (default eng_Latn).
Decoder: create_decoder_prompt resolves the FLORES-200 target language code to its token ID via tokenizer.convert_tokens_to_ids.

Tests

12 unit tests (tests/test_nllb_model_structure.py) — no GPU required
13 integration tests (tests/test_nllb_inference.py) — 4 target scripts, 3 non-English sources, batch, determinism, max_tokens

All 13 integration tests pass on NVIDIA GB10 (DGX Spark) with vLLM 0.18.0.

NickLucche

Hey @dschulmeist thanks for contributing!

Would you mind pushing the v0.18 fixes in a separate PR?
I thought we were vllm-0.18 compatible with the latest release, if that's not the case I could use your fix to issue a separate patch release.

dschulmeist · 2026-04-16T09:23:13Z

Makes sense. I split the v0.18 compatibility changes into a separate PR and keep this one on M2M100/NLLB support

dschulmeist · 2026-04-16T09:26:19Z

Split out the generic vLLM 0.18 compatibility changes into a separate PR: #20.

I also updated this branch so #19 now stays focused on the M2M100 / NLLB feature work and no longer carries the generic bart.py compatibility patch.

Adds M2M100ForConditionalGeneration support for the three NLLB distilled translation models: facebook/nllb-200-distilled-600M, 1.3B, and 3.3B. All three share model_type=m2m_100. Architecture differences from BART implemented in nllb.py: - Sinusoidal (fixed) positional embeddings instead of learned - PRE-LayerNorm (norm before sublayer) instead of POST-LayerNorm - Additional layer_norm after all encoder/decoder layers - ReLU activation instead of GELU - No final_logits_bias Language routing: - Decoder starts with target language token via create_decoder_prompt, which resolves the FLORES-200 code (e.g. "fra_Latn") via convert_tokens_to_ids for reliable special-token handling. - Source language token is prepended to the encoder input via src_lang in mm_processor_kwargs (default "eng_Latn"); a make_nllb_prompt helper is provided. Depends on the BART processor vLLM 0.18 compatibility fix (PR vllm-project#20): M2M100MultiModalProcessor inherits create_encoder_prompt from BartMultiModalProcessor and needs the [0] placeholder behavior to function under vLLM >=0.18. Tests: - 12 unit tests (tests/test_nllb_model_structure.py), no GPU required - 13 integration tests (tests/test_nllb_inference.py) covering 4 target scripts, 3 non-English sources, batching, determinism, and max_tokens. All 13 pass on NVIDIA GB10 (DGX Spark) with vLLM 0.18.0. Signed-off-by: David Schulmeister <dschulmeist@users.noreply.github.com>

dschulmeist · 2026-04-16T09:33:57Z

Updated the branch: v0.18 compatibility fixes are now in #20, this PR is squashed to a single clean commit with DCO signoff, and the PR body reflects the dependency on #20. Both PRs now pass DCO and are mergeable.

dschulmeist · 2026-04-17T11:53:08Z

Hey @NickLucche both PRs are ready whenever you get a chance. #20 (v0.18 compat fix) first, then this one builds on top.

dschulmeist mentioned this pull request Apr 15, 2026

[model] Add M2M100ForConditionalGeneration to OOT supported models vllm-project/vllm#39946

Open

dschulmeist force-pushed the add-nllb-m2m100-support branch from 0927ac4 to 11df39e Compare April 15, 2026 21:03

NickLucche reviewed Apr 16, 2026

View reviewed changes

dschulmeist force-pushed the add-nllb-m2m100-support branch from 4f0c152 to e5a46c1 Compare April 16, 2026 09:33

dschulmeist mentioned this pull request Apr 16, 2026

Fix BART processor compatibility with vLLM 0.18 #20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add M2M100/NLLB support (nllb-200-distilled-600M, 1.3B, 3.3B)#19

Add M2M100/NLLB support (nllb-200-distilled-600M, 1.3B, 3.3B)#19
dschulmeist wants to merge 1 commit into
vllm-project:masterfrom
dschulmeist:add-nllb-m2m100-support

dschulmeist commented Apr 15, 2026 •

edited

Loading

Uh oh!

NickLucche left a comment

Uh oh!

dschulmeist commented Apr 16, 2026

Uh oh!

dschulmeist commented Apr 16, 2026

Uh oh!

dschulmeist commented Apr 16, 2026

Uh oh!

dschulmeist commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dschulmeist commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Depends on #20

Architecture differences from BART

Language routing

Tests

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

dschulmeist commented Apr 16, 2026

Uh oh!

dschulmeist commented Apr 16, 2026

Uh oh!

dschulmeist commented Apr 16, 2026

Uh oh!

dschulmeist commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dschulmeist commented Apr 15, 2026 •

edited

Loading