Skip to content

[Bug] vllm-gaudi parameters --max-seq-len-to-capture typo #1805

@yongfengdu

Description

@yongfengdu

Priority

P2-High

OS type

Ubuntu

Hardware type

Xeon-GNR

Installation method

  • Pull docker images from hub.docker.com
  • Build docker images from source
  • Other
  • N/A

Deploy method

  • Docker
  • Docker Compose
  • Kubernetes Helm Charts
  • Kubernetes GMC
  • Other
  • N/A

Running nodes

Single Node

What's the version?

latest source code

Description

While doing helm charts changes following the #1790, I found several places are still using the wrong parameter --max-seq_len-to-capture, which should actually be --max-seq-len-to-capture.
(https://docs.vllm.ai/en/latest/serving/engine_args.html)

I'll cover the helm charts one, but someone else need to do the compose changes.

GenAIExamples$ grep -R max-seq_len-to-capture *
AgentQnA/kubernetes/helm/cpu-values.yaml: extraCmdArgs: ["--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
AgentQnA/kubernetes/helm/gaudi-values.yaml: extraCmdArgs: ["--tensor-parallel-size", "4", "--max-seq_len-to-capture", "16384", "--enable-auto-tool-choice", "--tool-call-parser", "llama3_json"]
AudioQnA/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
AudioQnA/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
ChatQnA/docker_compose/intel/hpu/gaudi/compose_faqgen.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/docker_compose/intel/hpu/gaudi/compose_guardrails.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/docker_compose/intel/hpu/gaudi/compose_without_rerank.yaml: command: --model ${LLM_MODEL_ID} --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
ChatQnA/kubernetes/helm/faqgen-gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
ChatQnA/kubernetes/helm/guardrails-gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
ChatQnA/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
CodeTrans/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model $LLM_MODEL_ID --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
CodeTrans/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"
DocSum/docker_compose/intel/hpu/gaudi/compose.yaml: command: --model $LLM_MODEL_ID --tensor-parallel-size ${NUM_CARDS} --host 0.0.0.0 --port 80 --block-size ${BLOCK_SIZE} --max-num-seqs ${MAX_NUM_SEQS} --max-seq_len-to-capture ${MAX_SEQ_LEN_TO_CAPTURE}
DocSum/kubernetes/helm/gaudi-values.yaml: "--max-seq_len-to-capture", "2048"

Reproduce steps

cd GenAIExamples
grep -R max-seq_len-to-capture *

Raw log

Attachments

No response

Metadata

Metadata

Assignees

Labels

A0ScrubebugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions