Skip to content

Commit 4af002f

Browse files
test(llm_eval): widen tiny qwen3 max_position_embeddings for fp8 e2e test
The default tiny qwen3 max_position_embeddings is 32, shorter than typical MMLU prompts (39-46 tokens). lm-eval's HFLM path tolerates this by truncating, but the TRT-LLM serve path used in test_qwen3_eval_fp8 rejects oversized prompts with `default_max_tokens (-14) must be greater than 0`. Bump to 2048 to give TRT-LLM headroom for MMLU/hellaswag/gsm8k/humaneval prompts. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
1 parent 8d83ac3 commit 4af002f

1 file changed

Lines changed: 3 additions & 1 deletion

File tree

tests/examples/llm_eval/test_llm_eval.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,9 @@ def test_lm_eval_hf(tmp_path):
4141

4242
@minimum_sm(89)
4343
def test_qwen3_eval_fp8(tmp_path):
44-
model_dir = create_tiny_qwen3_dir(tmp_path, with_tokenizer=True)
44+
# Bump max_position_embeddings: TRT-LLM serve rejects prompts longer than
45+
# max_seq_len, and the default (32) is shorter than even simple MMLU prompts.
46+
model_dir = create_tiny_qwen3_dir(tmp_path, with_tokenizer=True, max_position_embeddings=2048)
4547
try:
4648
run_llm_ptq_command(
4749
model=str(model_dir),

0 commit comments

Comments
 (0)