Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
-
Updated
May 26, 2026 - Python
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
MacOS menu‑bar utility to adjust Apple Silicon GPU VRAM allocation
Estimate whether a Hugging Face model fits and fine-tunes on your local GPU.
Social AI Agent Blueprint. Powered by vram.ai
First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.
Rust block device in userspace
Find which LLMs actually fit on your hardware. Client-side GPU detection, quantization-aware memory estimation, and speed predictions.
Compare API providers, local GPUs, and cloud for any model
Plug-and-play homelab dashboard in one container — GPU, local-AI model VRAM, Docker, systemd and host health. One page, no Prometheus/Grafana.
Add a description, image, and links to the vram topic page so that developers can more easily learn about it.
To associate your repository with the vram topic, visit your repo's landing page and select "manage topics."