Skip to content

Fix BART processor compatibility with vLLM 0.18#20

Merged
NickLucche merged 2 commits into
vllm-project:masterfrom
dschulmeist:v018-bart-fixes
Apr 30, 2026
Merged

Fix BART processor compatibility with vLLM 0.18#20
NickLucche merged 2 commits into
vllm-project:masterfrom
dschulmeist:v018-bart-fixes

Conversation

@dschulmeist

@dschulmeist dschulmeist commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the existing BART multimodal processor for vLLM 0.18 compatibility.

Split out from #19 so it can be reviewed and released independently (as a patch release, per maintainer request).

Changes

Three generic compatibility fixes in vllm_bart_plugin/bart.py:

  • TextDataParser._parse_text_data: replace the removed _is_empty helper (deleted from MultiModalDataParser in 0.18) with inline emptiness checks.
  • BartMultiModalProcessor.create_encoder_prompt: in 0.18 the prompt argument is the decoder prompt text, not the encoder text. Always return [0] as a placeholder token; _get_prompt_updates replaces it with the correct number of encoder token slots.
  • BartMultiModalProcessor._call_hf_processor: the 0.18 rendering pipeline may pass the decoder prompt as an already-tokenized list[int] instead of a str. Handle both.

Tests

Adds tests/test_vllm_018_compat.py with three narrow unit tests covering each path. No GPU required.

Notes

The NLLB/M2M100 feature PR (#19) depends on this compatibility fix, since M2M100MultiModalProcessor inherits create_encoder_prompt from BartMultiModalProcessor.

The multimodal processor in bart.py broke under vLLM 0.18 in three places:

- TextDataParser relied on MultiModalDataParser._is_empty, which was
  removed in 0.18. Replaced with inline emptiness checks for str and list.

- create_encoder_prompt previously tokenized `prompt` as the encoder text.
  In 0.18 `inputs.prompt` passed to this method is the DECODER prompt text,
  not the encoder text (the encoder content lives in mm_data). The method
  now returns a single [0] placeholder token; _get_prompt_updates replaces
  it with the correct number of encoder token slots during rendering.

- _call_hf_processor is now sometimes called with an already-tokenized
  decoder prompt (list[int]) instead of a str. Handle both cases when
  building result["input_ids"].

Adds tests/test_vllm_018_compat.py with three narrow unit tests covering
each of these paths; no GPU required.

Signed-off-by: David Schulmeister <dschulmeist@users.noreply.github.com>
dschulmeist added a commit to dschulmeist/bart-plugin that referenced this pull request Apr 16, 2026
Adds M2M100ForConditionalGeneration support for the three NLLB
distilled translation models: facebook/nllb-200-distilled-600M,
1.3B, and 3.3B. All three share model_type=m2m_100.

Architecture differences from BART implemented in nllb.py:
- Sinusoidal (fixed) positional embeddings instead of learned
- PRE-LayerNorm (norm before sublayer) instead of POST-LayerNorm
- Additional layer_norm after all encoder/decoder layers
- ReLU activation instead of GELU
- No final_logits_bias

Language routing:
- Decoder starts with target language token via create_decoder_prompt,
  which resolves the FLORES-200 code (e.g. "fra_Latn") via
  convert_tokens_to_ids for reliable special-token handling.
- Source language token is prepended to the encoder input via src_lang
  in mm_processor_kwargs (default "eng_Latn"); a make_nllb_prompt
  helper is provided.

Depends on the BART processor vLLM 0.18 compatibility fix (PR vllm-project#20):
M2M100MultiModalProcessor inherits create_encoder_prompt from
BartMultiModalProcessor and needs the [0] placeholder behavior to
function under vLLM >=0.18.

Tests:
- 12 unit tests (tests/test_nllb_model_structure.py), no GPU required
- 13 integration tests (tests/test_nllb_inference.py) covering 4
  target scripts, 3 non-English sources, batching, determinism, and
  max_tokens. All 13 pass on NVIDIA GB10 (DGX Spark) with vLLM 0.18.0.

Signed-off-by: David Schulmeister <dschulmeist@users.noreply.github.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
@NickLucche NickLucche mentioned this pull request Apr 30, 2026
@NickLucche NickLucche merged commit d656d7c into vllm-project:master Apr 30, 2026
1 check passed
@NickLucche

Copy link
Copy Markdown
Member

I had to make some changes to your PR due to an output mismatch I was getting, caused by add_special_tokens=False. Thanks for contributing @dschulmeist !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants