Skip to content

bcperry/local_llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Local LLM Stack

A Docker Compose setup for running a complete local Large Language Model (LLM) stack with GPU acceleration, featuring Ollama for model inference and Open WebUI for an intuitive chat interface.

πŸš€ Features

  • Ollama: Run powerful LLMs locally with NVIDIA GPU acceleration
  • Open WebUI: Modern, user-friendly web interface for interacting with your models
  • Watchtower: Automatic container updates to keep your stack current
  • Persistent Storage: Volumes for model data and chat history
  • GPU Support: Full NVIDIA GPU acceleration for optimal performance

πŸ“‹ Prerequisites

Verify GPU Support

docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

If this command succeeds, your GPU is properly configured for Docker.

πŸ› οΈ Installation

  1. Clone this repository (or download the files):

    cd c:\Users\blaineperry\git\local_llm
  2. Start the stack:

    docker-compose up -d
  3. Wait for containers to initialize (first run may take a few minutes):

    docker-compose logs -f
  4. Access Open WebUI:

πŸ“Š Services

Ollama (Port 11434)

  • Purpose: LLM runtime engine
  • API Endpoint: http://localhost:11434
  • GPU: Uses all available NVIDIA GPUs
  • Volume: ollama (stores downloaded models)

Open WebUI (Port 3000)

  • Purpose: Web interface for chat and model management
  • Access: http://localhost:3000
  • Ollama Connection: Auto-discovers Ollama via Docker network (http://ollama:11434)
  • Volume: open-webui (stores conversations and settings)

Watchtower

  • Purpose: Automatic container updates
  • Schedule: Checks for updates every hour (3600 seconds)
  • Cleanup: Automatically removes old images

🎯 Quick Start Guide

Finding Models

Visit the Ollama Model Library to browse all available models. The library includes:

  • Detailed model descriptions and capabilities
  • Parameter counts and size information
  • Tags for different quantization levels
  • Example prompts and use cases
  • Performance characteristics

Each model page shows the exact command needed to download it.

Choosing the Right Model for Your GPU

GPU Memory Requirements Guide:

Model Size Minimum VRAM Recommended VRAM Example Models
1B-3B params 4GB 6GB llama3.2:1b, phi3:mini
7B params 6GB 8GB mistral, llama3.2, codellama
13B params 10GB 16GB llama2:13b, codellama:13b
34B params 20GB 24GB codellama:34b, yi:34b
70B+ params 40GB+ 48GB+ llama3.2:70b, llama2:70b

Check Your GPU Memory:

# Windows/Linux with NVIDIA GPU
nvidia-smi

Look for "Memory-Usage" to see your available VRAM.

Understanding Model Tags:

  • llama3.2 or llama3.2:latest - Default quantization (usually Q4)
  • llama3.2:1b - Smaller 1 billion parameter version
  • llama3.2:3b - 3 billion parameter version
  • llama3.2:70b - 70 billion parameter version
  • Model quantizations (lower = smaller/faster, less accurate):
    • q2 - Highly compressed, lowest quality
    • q4 - Good balance (default for most models)
    • q8 - Higher quality, larger size
    • fp16 - Full precision, largest size

Download Your First Model

After starting the stack, download a model using one of these methods:

Option 1: Via Open WebUI (Recommended)

  1. Access the interface: Navigate to http://localhost:3000

  2. Create your account: On first visit, register (first user becomes admin)

  3. Open Model Settings:

    • Click on your profile icon (bottom left)
    • Select "Admin Panel" β†’ "Settings" β†’ "Models"
    • Or click the Settings gear icon β†’ "Admin Settings" β†’ "Models"
  4. Pull a model:

    • In the "Pull a model from Ollama.com" section
    • Enter the model name from ollama.com/library
    • Examples: llama3.2, mistral, phi3, codellama:7b
    • Click "Pull Model" button
    • Watch the download progress in the notification
  5. Wait for download: Large models can take several minutes depending on your internet speed

  6. Start chatting:

    • Click "New Chat" (top left)
    • Click the model dropdown at the top
    • Select your downloaded model
    • Begin your conversation!

Option 2: Via Command Line

# Download a model
docker exec -it ollama ollama pull llama3.2
docker exec -it ollama ollama pull mistral

# Check download progress and available models
docker exec -it ollama ollama list

Recommended Models by GPU Size

For 4-6GB VRAM:

  • llama3.2:1b - Excellent small model from Meta
  • phi3:mini - Microsoft's efficient 3.8B model
  • gemma:2b - Google's compact model

For 8GB VRAM (Most Common):

  • llama3.2 - Meta's versatile 3B model (default)
  • mistral - Fast, high-quality 7B model
  • codellama - Best for programming tasks
  • phi3 - Great general-purpose 3.8B model

For 12-16GB VRAM:

  • llama3.1 - Larger context window
  • mixtral:8x7b - Mixture of experts model
  • codellama:13b - Enhanced coding capabilities

For 24GB+ VRAM:

  • llama3.2:70b - Most capable open model
  • codellama:34b - Professional-grade coding

Explore the full catalog at ollama.com/library

Start Chatting

  1. Navigate to http://localhost:3000
  2. Select a downloaded model from the dropdown
  3. Start asking questions!

βš™οΈ Configuration

Ollama Connection

Open WebUI automatically connects to Ollama via Docker's internal networking using the service name ollama. The containers communicate directly without additional configuration.

To add external Ollama instances:

  1. Open WebUI Settings β†’ Admin Panel β†’ Settings β†’ Connections
  2. Add additional endpoints as needed (e.g., remote Ollama servers)

Custom Settings

Edit the docker-compose.yml file to customize:

  • Ports: Change 3000:8080 to use a different port for Open WebUI
  • GPU Count: Modify count: all to limit GPU usage
  • Update Interval: Change WATCHTOWER_POLL_INTERVAL (in seconds)
  • WebUI Secret Key: Set WEBUI_SECRET_KEY for session encryption

πŸ”§ Common Commands

View Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f ollama
docker-compose logs -f open-webui

Stop the Stack

docker-compose down

Restart Services

docker-compose restart

List Downloaded Models

docker exec -it ollama ollama list

Check Model Info

# See detailed model information including size
docker exec -it ollama ollama show <model-name>

Remove a Model

# Free up disk space by removing unused models
docker exec -it ollama ollama rm <model-name>

# Example:
docker exec -it ollama ollama rm llama3.2:70b

Update Containers Manually

docker-compose pull
docker-compose up -d

πŸ› Troubleshooting

Open WebUI Can't Connect to Ollama

Issue: "Could not connect to Ollama"

Solution:

  • Verify both containers are running: docker ps
  • Check Ollama logs: docker-compose logs ollama
  • Ensure containers are on the same Docker network: docker network ls
  • Restart the stack: docker-compose restart
  • Verify Ollama API is accessible: docker exec -it open-webui curl http://ollama:11434

GPU Not Detected

Issue: Models run slowly or GPU isn't being used

Solution:

  1. Verify NVIDIA drivers: nvidia-smi
  2. Check Docker GPU support: docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
  3. Install NVIDIA Container Toolkit

Permission Issues (Linux/macOS)

Issue: Docker socket permission denied

Solution:

sudo usermod -aG docker $USER
# Log out and back in for changes to take effect

Port Already in Use

Issue: Port 3000 or 11434 already in use

Solution: Edit docker-compose.yml and change the port mappings:

ports:
  - "3001:8080"  # Change 3000 to 3001

πŸ“ Data Persistence

All data is stored in Docker volumes:

  • ollama: Model files (can be large, 4GB+ per model)
  • open-webui: Chat history, settings, and user data

Backup Your Data

# Backup volumes
docker run --rm -v ollama:/data -v $(pwd):/backup alpine tar czf /backup/ollama-backup.tar.gz /data
docker run --rm -v open-webui:/data -v $(pwd):/backup alpine tar czf /backup/webui-backup.tar.gz /data

Remove All Data

docker-compose down -v

⚠️ Warning: This deletes all downloaded models and chat history!

πŸ” Security Notes

  • First user to register in Open WebUI becomes the admin
  • Set WEBUI_SECRET_KEY environment variable for production use
  • Consider placing behind a reverse proxy for external access
  • Keep containers updated via Watchtower or manual pulls

πŸ“š Additional Resources

πŸ“ License

This is a configuration repository. See individual component licenses:

🀝 Contributing

Feel free to submit issues or pull requests for improvements to this Docker Compose configuration.


Enjoy your local LLM stack! πŸŽ‰

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors