Skip to content

BennyAI2/Llama.ccp-WinUI3-Control-Panel

Repository files navigation

Llama ccp WinUI3 Control Panel logo

Llama ccp WinUI3 Control Panel

Llama ccp WinUI3 Control Panel is a native Windows app for running and managing local GGUF models through bundled llama.cpp-based backends. It is designed as a local model/runtime control surface rather than a full chat application: pick a model, load it into memory, tune runtime settings, expose a local OpenAI-compatible endpoint, monitor resources, and keep the backend resident until you unload it.

The app targets users who want a lightweight local LLM server with more direct control than a black-box runner. It can run standard llama.cpp-compatible GGUF text models and DiffusionGemma GGUF models through separate bundled backends.

Screenshots

Dashboard

Models

Runtime

Monitor

Logs

What It Does

  • Runs local GGUF models through llama-server.exe.
  • Runs DiffusionGemma GGUF models through a bundled DiffusionGemma worker wrapper.
  • Keeps one selected model loaded until the user unloads it.
  • Exposes a local OpenAI-compatible endpoint for third-party tools.
  • Lets users switch models from the UI.
  • Downloads public Hugging Face GGUF files.
  • Provides an HF Library browser with most-popular/newest sorting and hardware fit estimates.
  • Shows CPU, RAM, process memory, uptime, GPU usage, and VRAM usage where available.
  • Redirects backend logs into an in-app runtime console.
  • Provides a small Chat Test page for checking the currently loaded model.
  • Supports app startup registration and close-to-system-tray behavior.

Main Features

Model Library

  • Select a custom app models folder.
  • Optionally point the UI at the local Ollama model store for discovery.
  • Scan for .gguf files.
  • Hide models from the app library without deleting files.
  • Reveal selected models in Explorer.
  • Detect model architecture and likely capabilities.
  • Detect vision/projector companion files such as mmproj where available.

Hugging Face Library

  • Search public Hugging Face repositories.
  • Sort compatible results by newest or most popular.
  • List up to 100 compatible repositories.
  • Show available GGUF files in a selected repo.
  • Display file size, quantization, backend type, capabilities, and local fit estimates.
  • Color-code fit:
    • Green: good fit.
    • Yellow: likely usable or unknown.
    • Orange: tight fit or partial GPU offload likely.
    • Red: too large or auxiliary/not standalone.
  • Download public GGUF files into the selected app models folder.
  • Resume interrupted downloads where the server supports byte ranges.

Runtime Control

  • Load, unload, and reload the selected model.
  • Run backends without opening a separate console window.
  • Configure host and port.
  • Copy the local endpoint.
  • Detect port conflicts.
  • Keep the backend resident until explicitly unloaded.
  • Stop backend processes on app exit.
  • Use a Windows process lifetime guard so child backends are closed if the app exits unexpectedly.

Runtime Settings

  • Context size.
  • CPU threads.
  • GPU layers.
  • Batch size.
  • UBatch size.
  • Parallel slots.
  • Flash attention.
  • mmap/mlock toggles.
  • Diffusion steps for DiffusionGemma.
  • Custom additional backend arguments.
  • Per-model runtime settings.
  • Hardware-aware optimization button.

Resource Monitor

  • CPU usage.
  • RAM usage.
  • Backend process memory.
  • Backend uptime.
  • Current model.
  • Current host and port.
  • GPU usage through nvidia-smi when available.
  • VRAM usage through nvidia-smi when available.
  • Clear fallback text when GPU metrics are unavailable.

Logs

  • Live stdout/stderr capture.
  • Timestamped log lines.
  • Warning/error highlighting.
  • Copy logs.
  • Save logs.
  • Clear logs.

Chat Test

The app is not intended to be a full chat client, but it includes a basic test page so users can verify the currently loaded endpoint. It supports text prompts and image attachment attempts for compatible vision models.

Supported Backends

llama.cpp / llama-server

Used for standard GGUF text, embedding, tool-capable, reasoning, and vision-capable models that are supported by the bundled llama.cpp build.

DiffusionGemma Worker

Used for DiffusionGemma GGUF models. The app keeps the worker process resident and wraps it with local /v1/chat/completions and /v1/completions endpoints.

OpenAI-Compatible Endpoint

When a model is loaded, the app exposes a local endpoint such as:

http://127.0.0.1:8080

Useful paths:

/health
/v1/completions
/v1/chat/completions

This allows other local tools to connect as long as they support OpenAI-style local server URLs.

Models Are Not Included

This repository and release package do not include model files.

Users can:

  • Download public GGUF models from the HF Library page.
  • Paste a public Hugging Face repository URL.
  • Paste a direct public .gguf file URL.
  • Place .gguf files manually in the configured models folder.

Private, gated, or token-protected Hugging Face downloads are not currently supported.

Recommended Models

For first tests, use smaller quantized GGUF files such as Q4 or Q5 variants. They load faster and are more likely to fit on consumer GPUs.

Examples of useful search terms in the HF Library:

gemma gguf
qwen3 gguf
embedding gguf
diffusiongemma

Hardware Notes

  • CPU-only mode can work but will usually be slower.
  • NVIDIA GPUs are detected through nvidia-smi.
  • CUDA-enabled bundled backends are included in the ready-to-run package.
  • GPU/VRAM monitoring depends on available drivers and nvidia-smi.
  • Large BF16/F16 models may require more VRAM than consumer GPUs provide.
  • Quantized Q4/Q5 models are usually better for local interactive use.

Installation From A Release Zip

  1. Download the release zip.

  2. Extract it to a normal user-writable folder, for example:

    C:\Tools\Llama ccp WinUI3 Control Panel
    
  3. Run:

    Llama ccp WinUI3 Control Panel.exe
    
  4. Open Settings if you want to enable:

    • Run on startup.
    • Close to system tray.
  5. Open Models or HF Library to add GGUF models.

  6. Select a model and click Load model on the Dashboard or Runtime page.

Building From Source

Requirements:

  • Windows 10/11.
  • .NET SDK compatible with the target framework in the project file.
  • Windows App SDK dependencies restored through NuGet.

Build:

dotnet restore
dotnet build

The build copies a runnable unpackaged app into:

local-launch\

The public source repository does not store bundled backend binaries or model files. Official release downloads include the ready-to-run app package with the bundled backend files already included.

The release package should be created from the runnable output, not from bin, obj, downloaded models, or local settings.

Privacy And Local Data

The app stores user settings under the current Windows user profile in:

%LOCALAPPDATA%\Llama ccp WinUI3 Control Panel

Those settings are created on the user's device at runtime. The release package does not include local settings, personal paths, downloaded models, or machine-specific data.

Packaging Rules For Maintainers

Do not include:

  • Models/
  • bin/
  • obj/
  • local-launch-next/
  • smoke test logs
  • local settings from %LOCALAPPDATA%
  • downloaded model files

Include:

  • the root launcher .exe
  • app DLLs/runtime files
  • bundled backend binaries
  • required backend DLLs
  • Assets/newlogo.*
  • screenshots for GitHub documentation
  • README.md
  • LICENSE
  • third-party notices, when distributing backend binaries

Known Limitations

  • Private/gated Hugging Face models are not supported.
  • GPU metrics are best-effort and NVIDIA-focused.
  • Vision models may require matching projector files.
  • The Chat Test page is intentionally basic.
  • DiffusionGemma support is a local wrapper around a resident worker process, not a full general diffusion/image generation system.
  • This app does not currently package image-generation backends such as Stable Diffusion, SDXL, or Flux.

Project Status

This is an MVP-level local runtime manager. It is functional, but still evolving around backend tuning, model compatibility detection, and release packaging.

About

Native WinUI 3 control panel for running local llama.cpp and DiffusionGemma backends with model management, Hugging Face downloads, runtime tuning, logs, and resource monitoring.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors