From e5a46c1db19a32b3a92ac9ee73507f4d7fe303d7 Mon Sep 17 00:00:00 2001 From: David Schulmeister Date: Thu, 16 Apr 2026 11:32:55 +0200 Subject: [PATCH] Add M2M100/NLLB support (nllb-200-distilled-600M, 1.3B, 3.3B) Adds M2M100ForConditionalGeneration support for the three NLLB distilled translation models: facebook/nllb-200-distilled-600M, 1.3B, and 3.3B. All three share model_type=m2m_100. Architecture differences from BART implemented in nllb.py: - Sinusoidal (fixed) positional embeddings instead of learned - PRE-LayerNorm (norm before sublayer) instead of POST-LayerNorm - Additional layer_norm after all encoder/decoder layers - ReLU activation instead of GELU - No final_logits_bias Language routing: - Decoder starts with target language token via create_decoder_prompt, which resolves the FLORES-200 code (e.g. "fra_Latn") via convert_tokens_to_ids for reliable special-token handling. - Source language token is prepended to the encoder input via src_lang in mm_processor_kwargs (default "eng_Latn"); a make_nllb_prompt helper is provided. Depends on the BART processor vLLM 0.18 compatibility fix (PR #20): M2M100MultiModalProcessor inherits create_encoder_prompt from BartMultiModalProcessor and needs the [0] placeholder behavior to function under vLLM >=0.18. Tests: - 12 unit tests (tests/test_nllb_model_structure.py), no GPU required - 13 integration tests (tests/test_nllb_inference.py) covering 4 target scripts, 3 non-English sources, batching, determinism, and max_tokens. All 13 pass on NVIDIA GB10 (DGX Spark) with vLLM 0.18.0. Signed-off-by: David Schulmeister --- example_nllb_usage.py | 105 ++++ pyproject.toml | 4 +- tests/conftest.py | 14 + tests/test_nllb_inference.py | 284 +++++++++ tests/test_nllb_model_structure.py | 281 +++++++++ vllm_bart_plugin/__init__.py | 23 +- vllm_bart_plugin/nllb.py | 903 +++++++++++++++++++++++++++++ 7 files changed, 1604 insertions(+), 10 deletions(-) create mode 100644 example_nllb_usage.py create mode 100644 tests/test_nllb_inference.py create mode 100644 tests/test_nllb_model_structure.py create mode 100644 vllm_bart_plugin/nllb.py diff --git a/example_nllb_usage.py b/example_nllb_usage.py new file mode 100644 index 0000000..dd55175 --- /dev/null +++ b/example_nllb_usage.py @@ -0,0 +1,105 @@ +"""Example: NLLB translation with vLLM via the bart-plugin. + +Supported models (all use model_type=m2m_100): + facebook/nllb-200-distilled-600M (~1.2 GB) + facebook/nllb-200-distilled-1.3B (~2.6 GB) + facebook/nllb-200-3.3B (~6.6 GB) + +Language codes follow the FLORES-200 format: _