Skip to content

triton+vllm serve embeddings but not support list[str] #8655

@carloscao0928

Description

@carloscao0928

Description
Hi team, I follow this guide: https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/client_guide/openai_readme.html#embedding-models to serve the "bge-large-zh-v1.5" model, link here:https://huggingface.co/BAAI/bge-large-zh-v1.5, and I try to send a single request it also works, but when I try to use aiperf to conduct the benchmark test, it faild, tips Input should be a valid string, so it seems like not support the list[str].

Triton Information
What version of Triton are you using?
nvcr.io/nvidia/tritonserver:26.01-vllm-python-py3

Are you using the Triton container or did you build it yourself?
nvcr.io/nvidia/tritonserver:26.01-vllm-python-py3

To Reproduce
Steps to reproduce the behavior.
I use the above image and create a deployment, command as below:
cd /opt/tritonserver/python/openai
python3 openai_frontend/main.py --model-repository xxx --openai-port 8000

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

model:BAAI/bge-large-zh-v1.5

config.pbtxt:
backend: "vllm"
instance_group [{kind: KIND_MODEL}]

model.json
{"model": "xxx","gpu_memory_utilization": 0.9}

Expected behavior
It should allow the list[str], its the openai specific.

Metadata

Metadata

Assignees

Labels

EnhancementNew feature or requestopenaiOpenAI related
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions