Performance results indicate that running multiple instances of vLLM provides better scaling than adding additional cores to a single vLLM server instance. This RFE requests that support be added to the automation to (at the users specification) deploy multiple (symmetric) vLLM Server instances on a single Linux server. The expectation would be that N instances would each have the same configuration settings such as KV-Cache size and number of cores.
Performance results indicate that running multiple instances of vLLM provides better scaling than adding additional cores to a single vLLM server instance. This RFE requests that support be added to the automation to (at the users specification) deploy multiple (symmetric) vLLM Server instances on a single Linux server. The expectation would be that N instances would each have the same configuration settings such as KV-Cache size and number of cores.