Skip to content

fix(vllm): use max prompt length for batch context-length check#1209

Closed
JKDasondee wants to merge 1 commit intohuggingface:mainfrom
JKDasondee:fix/vllm-batch-context-length-check
Closed

fix(vllm): use max prompt length for batch context-length check#1209
JKDasondee wants to merge 1 commit intohuggingface:mainfrom
JKDasondee:fix/vllm-batch-context-length-check

Conversation

@JKDasondee
Copy link
Copy Markdown

In VLLMModel._greedy_until, the context-length check before truncation used len(inputs[0]) — the length of only the first prompt in the batch — instead of the maximum length across all prompts. For batches with variable-length prompts, any prompt longer than the first would silently bypass truncation and get passed to vLLM with a token count exceeding max_model_len, causing runtime errors or silent truncation inside the engine.

The fix replaces len(inputs[0]) with max(len(inp) for inp in inputs) so the check is conservative over the entire batch, and updates the related warning messages to reflect that the reported size is the batch maximum.

Fixes #1204.

context_size was computed as len(inputs[0]), checking only the first
prompt in the batch. Any prompt longer than the first would bypass
truncation, causing vLLM to receive sequences exceeding max_model_len.
Fixes huggingface#1204.
@JKDasondee
Copy link
Copy Markdown
Author

Closing — #1205 addresses the same issue with broader improvements. Sorry for the duplicate.

@JKDasondee JKDasondee closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] [vLLM] Batch truncation bug: context length check uses first prompt instead of longest

1 participant