Skip to content

Commit e3f43c4

Browse files
committed
set max_model_len: 65536
1 parent 60c2cd8 commit e3f43c4

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

configs/extractor/llm/qwen3_30b_in_process.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ vllm_kwargs:
3030
tensor_parallel_size: 1 # shard across N GPUs
3131
# Supports up to 256K tokens (max size for https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507).
3232
# Use lower value to save memory and improve performance
33-
max_model_len: 131072
33+
max_model_len: 65536
3434
# This model requires the deepseek_r1 reasoning parser (see HF model card).
3535
reasoning_parser: "deepseek_r1"
3636
gpu_memory_utilization: 0.95 # fraction of GPU memory reserved (impacts KV cache size)

0 commit comments

Comments
 (0)