A Docker Compose setup for running a complete local Large Language Model (LLM) stack with GPU acceleration, featuring Ollama for model inference and Open WebUI for an intuitive chat interface.
- Ollama: Run powerful LLMs locally with NVIDIA GPU acceleration
- Open WebUI: Modern, user-friendly web interface for interacting with your models
- Watchtower: Automatic container updates to keep your stack current
- Persistent Storage: Volumes for model data and chat history
- GPU Support: Full NVIDIA GPU acceleration for optimal performance
- Docker installed
- Docker Compose installed
- NVIDIA GPU with NVIDIA Container Toolkit installed
- Windows, Linux, or macOS
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smiIf this command succeeds, your GPU is properly configured for Docker.
-
Clone this repository (or download the files):
cd c:\Users\blaineperry\git\local_llm
-
Start the stack:
docker-compose up -d
-
Wait for containers to initialize (first run may take a few minutes):
docker-compose logs -f
-
Access Open WebUI:
- Open your browser to http://localhost:3000
- Create an account (first user becomes admin)
- Purpose: LLM runtime engine
- API Endpoint:
http://localhost:11434 - GPU: Uses all available NVIDIA GPUs
- Volume:
ollama(stores downloaded models)
- Purpose: Web interface for chat and model management
- Access: http://localhost:3000
- Ollama Connection: Auto-discovers Ollama via Docker network (
http://ollama:11434) - Volume:
open-webui(stores conversations and settings)
- Purpose: Automatic container updates
- Schedule: Checks for updates every hour (3600 seconds)
- Cleanup: Automatically removes old images
Visit the Ollama Model Library to browse all available models. The library includes:
- Detailed model descriptions and capabilities
- Parameter counts and size information
- Tags for different quantization levels
- Example prompts and use cases
- Performance characteristics
Each model page shows the exact command needed to download it.
GPU Memory Requirements Guide:
| Model Size | Minimum VRAM | Recommended VRAM | Example Models |
|---|---|---|---|
| 1B-3B params | 4GB | 6GB | llama3.2:1b, phi3:mini |
| 7B params | 6GB | 8GB | mistral, llama3.2, codellama |
| 13B params | 10GB | 16GB | llama2:13b, codellama:13b |
| 34B params | 20GB | 24GB | codellama:34b, yi:34b |
| 70B+ params | 40GB+ | 48GB+ | llama3.2:70b, llama2:70b |
Check Your GPU Memory:
# Windows/Linux with NVIDIA GPU
nvidia-smiLook for "Memory-Usage" to see your available VRAM.
Understanding Model Tags:
llama3.2orllama3.2:latest- Default quantization (usually Q4)llama3.2:1b- Smaller 1 billion parameter versionllama3.2:3b- 3 billion parameter versionllama3.2:70b- 70 billion parameter version- Model quantizations (lower = smaller/faster, less accurate):
q2- Highly compressed, lowest qualityq4- Good balance (default for most models)q8- Higher quality, larger sizefp16- Full precision, largest size
After starting the stack, download a model using one of these methods:
Option 1: Via Open WebUI (Recommended)
-
Access the interface: Navigate to http://localhost:3000
-
Create your account: On first visit, register (first user becomes admin)
-
Open Model Settings:
- Click on your profile icon (bottom left)
- Select "Admin Panel" β "Settings" β "Models"
- Or click the Settings gear icon β "Admin Settings" β "Models"
-
Pull a model:
- In the "Pull a model from Ollama.com" section
- Enter the model name from ollama.com/library
- Examples:
llama3.2,mistral,phi3,codellama:7b - Click "Pull Model" button
- Watch the download progress in the notification
-
Wait for download: Large models can take several minutes depending on your internet speed
-
Start chatting:
- Click "New Chat" (top left)
- Click the model dropdown at the top
- Select your downloaded model
- Begin your conversation!
Option 2: Via Command Line
# Download a model
docker exec -it ollama ollama pull llama3.2
docker exec -it ollama ollama pull mistral
# Check download progress and available models
docker exec -it ollama ollama listFor 4-6GB VRAM:
llama3.2:1b- Excellent small model from Metaphi3:mini- Microsoft's efficient 3.8B modelgemma:2b- Google's compact model
For 8GB VRAM (Most Common):
llama3.2- Meta's versatile 3B model (default)mistral- Fast, high-quality 7B modelcodellama- Best for programming tasksphi3- Great general-purpose 3.8B model
For 12-16GB VRAM:
llama3.1- Larger context windowmixtral:8x7b- Mixture of experts modelcodellama:13b- Enhanced coding capabilities
For 24GB+ VRAM:
llama3.2:70b- Most capable open modelcodellama:34b- Professional-grade coding
Explore the full catalog at ollama.com/library
- Navigate to http://localhost:3000
- Select a downloaded model from the dropdown
- Start asking questions!
Open WebUI automatically connects to Ollama via Docker's internal networking using the service name ollama. The containers communicate directly without additional configuration.
To add external Ollama instances:
- Open WebUI Settings β Admin Panel β Settings β Connections
- Add additional endpoints as needed (e.g., remote Ollama servers)
Edit the docker-compose.yml file to customize:
- Ports: Change
3000:8080to use a different port for Open WebUI - GPU Count: Modify
count: allto limit GPU usage - Update Interval: Change
WATCHTOWER_POLL_INTERVAL(in seconds) - WebUI Secret Key: Set
WEBUI_SECRET_KEYfor session encryption
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f ollama
docker-compose logs -f open-webuidocker-compose downdocker-compose restartdocker exec -it ollama ollama list# See detailed model information including size
docker exec -it ollama ollama show <model-name># Free up disk space by removing unused models
docker exec -it ollama ollama rm <model-name>
# Example:
docker exec -it ollama ollama rm llama3.2:70bdocker-compose pull
docker-compose up -dIssue: "Could not connect to Ollama"
Solution:
- Verify both containers are running:
docker ps - Check Ollama logs:
docker-compose logs ollama - Ensure containers are on the same Docker network:
docker network ls - Restart the stack:
docker-compose restart - Verify Ollama API is accessible:
docker exec -it open-webui curl http://ollama:11434
Issue: Models run slowly or GPU isn't being used
Solution:
- Verify NVIDIA drivers:
nvidia-smi - Check Docker GPU support:
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi - Install NVIDIA Container Toolkit
Issue: Docker socket permission denied
Solution:
sudo usermod -aG docker $USER
# Log out and back in for changes to take effectIssue: Port 3000 or 11434 already in use
Solution: Edit docker-compose.yml and change the port mappings:
ports:
- "3001:8080" # Change 3000 to 3001All data is stored in Docker volumes:
- ollama: Model files (can be large, 4GB+ per model)
- open-webui: Chat history, settings, and user data
# Backup volumes
docker run --rm -v ollama:/data -v $(pwd):/backup alpine tar czf /backup/ollama-backup.tar.gz /data
docker run --rm -v open-webui:/data -v $(pwd):/backup alpine tar czf /backup/webui-backup.tar.gz /datadocker-compose down -v- First user to register in Open WebUI becomes the admin
- Set
WEBUI_SECRET_KEYenvironment variable for production use - Consider placing behind a reverse proxy for external access
- Keep containers updated via Watchtower or manual pulls
- Ollama Model Library - Browse and discover all available models
- Ollama Documentation - Official Ollama docs and guides
- Open WebUI Documentation - Web interface documentation
- Ollama Blog - Latest updates and model releases
- NVIDIA Container Toolkit - GPU setup guide
This is a configuration repository. See individual component licenses:
Feel free to submit issues or pull requests for improvements to this Docker Compose configuration.
Enjoy your local LLM stack! π