Priority
P2-High
OS type
Ubuntu
Hardware type
Xeon-GNR
Installation method
Deploy method
Running nodes
Single Node
What's the version?
latest source code
Description
While doing helm charts changes following the #1790, I found several places are still using the wrong parameter --max-seq_len-to-capture, which should actually be --max-seq-len-to-capture.
(https://docs.vllm.ai/en/latest/serving/engine_args.html)
I'll cover the helm charts one, but someone else need to do the compose changes.
GenAIExamples$ grep -R max-seq_len-to-capture *
AgentQnA/kubernetes/helm/cpu-values.yaml: extraCmdArgs: ["--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
AgentQnA/kubernetes/helm/gaudi-values.yaml: extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
AudioQnA/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
AudioQnA/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/docker_compose/intel/hpu/gaudi/compose_guardrails.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/docker_compose/intel/hpu/gaudi/compose_without_rerank.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/kubernetes/helm/faqgen-gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
ChatQnA/kubernetes/helm/guardrails-gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
ChatQnA/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
CodeTrans/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model $LLM_MODEL_ID --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
CodeTrans/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
DocSum/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model $LLM_MODEL_ID --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
DocSum/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
Reproduce steps
cd GenAIExamples
grep -R max-seq_len-to-capture *
Raw log
Attachments
No response
Priority
P2-High
OS type
Ubuntu
Hardware type
Xeon-GNR
Installation method
Deploy method
Running nodes
Single Node
What's the version?
latest source code
Description
While doing helm charts changes following the #1790, I found several places are still using the wrong parameter --max-seq_len-to-capture, which should actually be --max-seq-len-to-capture.
(https://docs.vllm.ai/en/latest/serving/engine_args.html)
I'll cover the helm charts one, but someone else need to do the compose changes.
GenAIExamples$ grep -R max-seq_len-to-capture *$LLM_MODEL_ID --tensor-parallel-size $ {NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}$LLM_MODEL_ID --tensor-parallel-size $ {NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
AgentQnA/kubernetes/helm/cpu-values.yaml: extraCmdArgs: ["--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
AgentQnA/kubernetes/helm/gaudi-values.yaml: extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
AudioQnA/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
AudioQnA/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/docker_compose/intel/hpu/gaudi/compose_guardrails.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/docker_compose/intel/hpu/gaudi/compose_without_rerank.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/kubernetes/helm/faqgen-gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
ChatQnA/kubernetes/helm/guardrails-gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
ChatQnA/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
CodeTrans/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model
CodeTrans/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
DocSum/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model
DocSum/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
Reproduce steps
cd GenAIExamples
grep -R max-seq_len-to-capture *
Raw log
Attachments
No response