test(llm_eval): widen tiny qwen3 max_position_embeddings for fp8 e2e test

kevalmorabia97 · kevalmorabia97 · commit 4af002f13083 · 2026-05-08T06:26:00.000-07:00
The default tiny qwen3 max_position_embeddings is 32, shorter than typical
MMLU prompts (39-46 tokens). lm-eval's HFLM path tolerates this by truncating,
but the TRT-LLM serve path used in test_qwen3_eval_fp8 rejects oversized
prompts with `default_max_tokens (-14) must be greater than 0`. Bump to 2048
to give TRT-LLM headroom for MMLU/hellaswag/gsm8k/humaneval prompts.

Signed-off-by: Keval Morabia &lt;28916987+kevalmorabia97@users.noreply.github.com&gt;
diff --git a/tests/examples/llm_eval/test_llm_eval.py b/tests/examples/llm_eval/test_llm_eval.py
@@ -41,7 +41,9 @@ def test_lm_eval_hf(tmp_path):
 
 @minimum_sm(89)
 def test_qwen3_eval_fp8(tmp_path):
-    model_dir = create_tiny_qwen3_dir(tmp_path, with_tokenizer=True)
+    # Bump max_position_embeddings: TRT-LLM serve rejects prompts longer than
+    # max_seq_len, and the default (32) is shorter than even simple MMLU prompts.
+    model_dir = create_tiny_qwen3_dir(tmp_path, with_tokenizer=True, max_position_embeddings=2048)
     try:
         run_llm_ptq_command(
             model=str(model_dir),