fix(vllm): use max prompt length for batch context-length check by JKDasondee · Pull Request #1209 · huggingface/lighteval

JKDasondee · 2026-04-10T02:37:06Z

In VLLMModel._greedy_until, the context-length check before truncation used len(inputs[0]) — the length of only the first prompt in the batch — instead of the maximum length across all prompts. For batches with variable-length prompts, any prompt longer than the first would silently bypass truncation and get passed to vLLM with a token count exceeding max_model_len, causing runtime errors or silent truncation inside the engine.

The fix replaces len(inputs[0]) with max(len(inp) for inp in inputs) so the check is conservative over the entire batch, and updates the related warning messages to reflect that the reported size is the batch maximum.

Fixes #1204.

context_size was computed as len(inputs[0]), checking only the first prompt in the batch. Any prompt longer than the first would bypass truncation, causing vLLM to receive sequences exceeding max_model_len. Fixes huggingface#1204.

JKDasondee · 2026-04-10T03:33:13Z

Closing — #1205 addresses the same issue with broader improvements. Sorry for the duplicate.

JKDasondee closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vllm): use max prompt length for batch context-length check#1209

fix(vllm): use max prompt length for batch context-length check#1209
JKDasondee wants to merge 1 commit intohuggingface:mainfrom
JKDasondee:fix/vllm-batch-context-length-check

JKDasondee commented Apr 10, 2026

Uh oh!

JKDasondee commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JKDasondee commented Apr 10, 2026

Uh oh!

JKDasondee commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant