Related to an existing integration?
No
Existing integration
No response
Overview
Integrating the LlamaCpp server containers into Aspire would provide benefits in areas where other integrations either under or over deliver, including:
- Running inference in small, lightweight containers, suitable for IOT and EDGE devices with limited resources.
- Isolating inference models in different small containers rather than having them together in one larger one.
- Running just models, not the extra added features that other integrations add.
- Having more control over the configuration of the inference servers.
- Using less resources for scaling out many small containers rather than fewer larger ones.
- Taking advantage of specific versions/optimizations of LlamaCpp server that may not be supported by others yet.
As a point of reference, as for today the standard container image used by the Aspire.Hosting.Ollama integration is sized 3.75Gb, while the standard llama:server image is around 127Mb.
Usage example
A minimal usage would be:
var modelUrl = "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf";
var llamaServer = builder.AddLlamaServer("llamaserver", modelUrl);
Additional extension methods can add more funtionality, like:
- .WithReasoning(bool useReasoning = true) // Explicitly enables or disables the output of thinking (cot), if supported by the model.
- .WithApikeys(params string[] keys) // Defines one or more valid Api keys that will be set as requirement to make requests to the REST api.
- .WithContextSize(int size = 0) // Sets a limit to the context size for the model.
- .WithModelAlias(string alias) //Defines the alias that will be used by OpenAI-compatible clients to make requests to the model.
- .WithMultimodalProjection(string projectionFileUrl) // Adds a multimodal projection file for multimodal (image/text) models.
- .WithDataVolume(string? name = null, bool isReadOnly = false) // Adds a volume and sets it as the storage for the downloaded model/s.
- .WithDataVolume(IResourceBuilder<LlamaCppServerResource> volumeOwner, bool isReadOnly = false) // Explicitly uses the same volume that is used by another LlamaCppServer resource. Useful for having several server instances that use the same model files, so they are downloaded once and shared among them.
Breaking change?
No
Alternatives
I have described an integration I already implemented on my own.
Additional context
No response
Help us help you
Yes, I'd like to be assigned to work on this item
Related to an existing integration?
No
Existing integration
No response
Overview
Integrating the LlamaCpp server containers into Aspire would provide benefits in areas where other integrations either under or over deliver, including:
As a point of reference, as for today the standard container image used by the Aspire.Hosting.Ollama integration is sized 3.75Gb, while the standard llama:server image is around 127Mb.
Usage example
A minimal usage would be:
Additional extension methods can add more funtionality, like:
Breaking change?
No
Alternatives
I have described an integration I already implemented on my own.
Additional context
No response
Help us help you
Yes, I'd like to be assigned to work on this item