RFE: support multi-instances of vLLM on single system

Performance results indicate that running multiple instances of vLLM provides better scaling than adding additional cores to a single vLLM server instance. This RFE requests that support be added to the automation to (at the users specification) deploy multiple (symmetric) vLLM Server instances on a single Linux server. The expectation would be that N instances would each have the same  configuration settings such as KV-Cache size and number of cores.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE: support multi-instances of vLLM on single system #123

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RFE: support multi-instances of vLLM on single system #123

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions