Add LlamaCpp containers hosting integration for Aspire

### Related to an existing integration?

No

### Existing integration

_No response_

### Overview

Integrating the LlamaCpp server containers into Aspire would provide benefits in areas where other integrations either under or over deliver, including:
- Running inference in small, lightweight containers, suitable for IOT and EDGE devices with limited resources.
- Isolating inference models in different small containers rather than having them together in one larger one.
- Running just models, not the extra added features that other integrations add.
- Having more control over the configuration of the inference servers.
- Using less resources for scaling out many small containers rather than fewer larger ones.
- Taking advantage of specific versions/optimizations of LlamaCpp server that may not be supported by others yet.

As a point of reference, as for today the standard container image used by the Aspire.Hosting.Ollama integration is sized 3.75Gb, while the standard llama:server image is around 127Mb.

### Usage example

A minimal usage would be:
```
var modelUrl = "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf";

var llamaServer = builder.AddLlamaServer("llamaserver", modelUrl);
```

Additional extension methods can add more funtionality, like:

```
- .WithReasoning(bool useReasoning = true) // Explicitly enables or disables the output of thinking (cot), if supported by the model.
- .WithApikeys(params string[] keys) // Defines one or more valid Api keys that will be set as requirement to make requests to the REST api.
- .WithContextSize(int size = 0) // Sets a limit to the context size for the model.
- .WithModelAlias(string alias) //Defines the alias that will be used by OpenAI-compatible clients to make requests to the model.
- .WithMultimodalProjection(string projectionFileUrl) // Adds a multimodal projection file for multimodal (image/text) models.
- .WithDataVolume(string? name = null, bool isReadOnly = false) // Adds a volume and sets it as the storage for the downloaded model/s.
- .WithDataVolume(IResourceBuilder<LlamaCppServerResource> volumeOwner, bool isReadOnly = false) // Explicitly uses the same volume that is used by another LlamaCppServer resource. Useful for having several server instances that use the same model files, so they are downloaded once and shared among them.
```


### Breaking change?

No

### Alternatives

I have described an integration I already implemented on my own.

### Additional context

_No response_

### Help us help you

Yes, I'd like to be assigned to work on this item

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LlamaCpp containers hosting integration for Aspire #1274

Related to an existing integration?

Existing integration

Overview

Usage example

Breaking change?

Alternatives

Additional context

Help us help you

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add LlamaCpp containers hosting integration for Aspire #1274

Description

Related to an existing integration?

Existing integration

Overview

Usage example

Breaking change?

Alternatives

Additional context

Help us help you

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions