Skip to content

RFE: support multi-instances of vLLM on single system #123

@jharriga

Description

@jharriga

Performance results indicate that running multiple instances of vLLM provides better scaling than adding additional cores to a single vLLM server instance. This RFE requests that support be added to the automation to (at the users specification) deploy multiple (symmetric) vLLM Server instances on a single Linux server. The expectation would be that N instances would each have the same configuration settings such as KV-Cache size and number of cores.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions