Llama ccp WinUI3 Control Panel

Llama ccp WinUI3 Control Panel is a native Windows app for running and managing local GGUF models through bundled llama.cpp-based backends. It is designed as a local model/runtime control surface rather than a full chat application: pick a model, load it into memory, tune runtime settings, expose a local OpenAI-compatible endpoint, monitor resources, and keep the backend resident until you unload it.

The app targets users who want a lightweight local LLM server with more direct control than a black-box runner. It can run standard llama.cpp-compatible GGUF text models and DiffusionGemma GGUF models through separate bundled backends.

Screenshots

What It Does

Runs local GGUF models through llama-server.exe.
Runs DiffusionGemma GGUF models through a bundled DiffusionGemma worker wrapper.
Keeps one selected model loaded until the user unloads it.
Exposes a local OpenAI-compatible endpoint for third-party tools.
Lets users switch models from the UI.
Downloads public Hugging Face GGUF files.
Provides an HF Library browser with most-popular/newest sorting and hardware fit estimates.
Shows CPU, RAM, process memory, uptime, GPU usage, and VRAM usage where available.
Redirects backend logs into an in-app runtime console.
Provides a small Chat Test page for checking the currently loaded model.
Supports app startup registration and close-to-system-tray behavior.

Main Features

Model Library

Select a custom app models folder.
Optionally point the UI at the local Ollama model store for discovery.
Scan for .gguf files.
Hide models from the app library without deleting files.
Reveal selected models in Explorer.
Detect model architecture and likely capabilities.
Detect vision/projector companion files such as mmproj where available.

Hugging Face Library

Search public Hugging Face repositories.
Sort compatible results by newest or most popular.
List up to 100 compatible repositories.
Show available GGUF files in a selected repo.
Display file size, quantization, backend type, capabilities, and local fit estimates.
Color-code fit:
- Green: good fit.
- Yellow: likely usable or unknown.
- Orange: tight fit or partial GPU offload likely.
- Red: too large or auxiliary/not standalone.
Download public GGUF files into the selected app models folder.
Resume interrupted downloads where the server supports byte ranges.

Runtime Control

Load, unload, and reload the selected model.
Run backends without opening a separate console window.
Configure host and port.
Copy the local endpoint.
Detect port conflicts.
Keep the backend resident until explicitly unloaded.
Stop backend processes on app exit.
Use a Windows process lifetime guard so child backends are closed if the app exits unexpectedly.

Runtime Settings

Context size.
CPU threads.
GPU layers.
Batch size.
UBatch size.
Parallel slots.
Flash attention.
mmap/mlock toggles.
Diffusion steps for DiffusionGemma.
Custom additional backend arguments.
Per-model runtime settings.
Hardware-aware optimization button.

Resource Monitor

CPU usage.
RAM usage.
Backend process memory.
Backend uptime.
Current model.
Current host and port.
GPU usage through nvidia-smi when available.
VRAM usage through nvidia-smi when available.
Clear fallback text when GPU metrics are unavailable.

Logs

Live stdout/stderr capture.
Timestamped log lines.
Warning/error highlighting.
Copy logs.
Save logs.
Clear logs.

Chat Test

The app is not intended to be a full chat client, but it includes a basic test page so users can verify the currently loaded endpoint. It supports text prompts and image attachment attempts for compatible vision models.

Supported Backends

llama.cpp / llama-server

Used for standard GGUF text, embedding, tool-capable, reasoning, and vision-capable models that are supported by the bundled llama.cpp build.

DiffusionGemma Worker

Used for DiffusionGemma GGUF models. The app keeps the worker process resident and wraps it with local /v1/chat/completions and /v1/completions endpoints.

OpenAI-Compatible Endpoint

When a model is loaded, the app exposes a local endpoint such as:

http://127.0.0.1:8080

Useful paths:

/health
/v1/completions
/v1/chat/completions

This allows other local tools to connect as long as they support OpenAI-style local server URLs.

Models Are Not Included

This repository and release package do not include model files.

Users can:

Download public GGUF models from the HF Library page.
Paste a public Hugging Face repository URL.
Paste a direct public .gguf file URL.
Place .gguf files manually in the configured models folder.

Private, gated, or token-protected Hugging Face downloads are not currently supported.

Recommended Models

For first tests, use smaller quantized GGUF files such as Q4 or Q5 variants. They load faster and are more likely to fit on consumer GPUs.

Examples of useful search terms in the HF Library:

gemma gguf
qwen3 gguf
embedding gguf
diffusiongemma

Hardware Notes

CPU-only mode can work but will usually be slower.
NVIDIA GPUs are detected through nvidia-smi.
CUDA-enabled bundled backends are included in the ready-to-run package.
GPU/VRAM monitoring depends on available drivers and nvidia-smi.
Large BF16/F16 models may require more VRAM than consumer GPUs provide.
Quantized Q4/Q5 models are usually better for local interactive use.

Installation From A Release Zip

Download the release zip.
Extract it to a normal user-writable folder, for example:
```
C:\Tools\Llama ccp WinUI3 Control Panel
```
Run:
```
Llama ccp WinUI3 Control Panel.exe
```
Open Settings if you want to enable:
- Run on startup.
- Close to system tray.
Open Models or HF Library to add GGUF models.
Select a model and click Load model on the Dashboard or Runtime page.

Building From Source

Requirements:

Windows 10/11.
.NET SDK compatible with the target framework in the project file.
Windows App SDK dependencies restored through NuGet.

Build:

dotnet restore
dotnet build

The build copies a runnable unpackaged app into:

local-launch\

The public source repository does not store bundled backend binaries or model files. Official release downloads include the ready-to-run app package with the bundled backend files already included.

The release package should be created from the runnable output, not from bin, obj, downloaded models, or local settings.

Privacy And Local Data

The app stores user settings under the current Windows user profile in:

%LOCALAPPDATA%\Llama ccp WinUI3 Control Panel

Those settings are created on the user's device at runtime. The release package does not include local settings, personal paths, downloaded models, or machine-specific data.

Packaging Rules For Maintainers

Do not include:

Models/
bin/
obj/
local-launch-next/
smoke test logs
local settings from %LOCALAPPDATA%
downloaded model files

Include:

the root launcher .exe
app DLLs/runtime files
bundled backend binaries
required backend DLLs
Assets/newlogo.*
screenshots for GitHub documentation
README.md
LICENSE
third-party notices, when distributing backend binaries

Known Limitations

Private/gated Hugging Face models are not supported.
GPU metrics are best-effort and NVIDIA-focused.
Vision models may require matching projector files.
The Chat Test page is intentionally basic.
DiffusionGemma support is a local wrapper around a resident worker process, not a full general diffusion/image generation system.
This app does not currently package image-generation backends such as Stable Diffusion, SDXL, or Flux.

Project Status

This is an MVP-level local runtime manager. It is functional, but still evolving around backend tuning, model compatibility detection, and release packaging.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Assets		Assets
Pages		Pages
Properties		Properties
Services		Services
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
App.xaml		App.xaml
App.xaml.cs		App.xaml.cs
LICENSE		LICENSE
LlamaCppControlPanel.csproj		LlamaCppControlPanel.csproj
MainWindow.xaml		MainWindow.xaml
MainWindow.xaml.cs		MainWindow.xaml.cs
Package.appxmanifest		Package.appxmanifest
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
app.manifest		app.manifest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama ccp WinUI3 Control Panel

Screenshots

What It Does

Main Features

Model Library

Hugging Face Library

Runtime Control

Runtime Settings

Resource Monitor

Logs

Chat Test

Supported Backends

llama.cpp / llama-server

DiffusionGemma Worker

OpenAI-Compatible Endpoint

Models Are Not Included

Recommended Models

Hardware Notes

Installation From A Release Zip

Building From Source

Privacy And Local Data

Packaging Rules For Maintainers

Known Limitations

Project Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Llama ccp WinUI3 Control Panel

Screenshots

What It Does

Main Features

Model Library

Hugging Face Library

Runtime Control

Runtime Settings

Resource Monitor

Logs

Chat Test

Supported Backends

llama.cpp / llama-server

DiffusionGemma Worker

OpenAI-Compatible Endpoint

Models Are Not Included

Recommended Models

Hardware Notes

Installation From A Release Zip

Building From Source

Privacy And Local Data

Packaging Rules For Maintainers

Known Limitations

Project Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages