A comprehensive RunPod container orchestrating multiple AI/ML services (ComfyUI, AI-Toolkit, Ollama) via supervisord, designed for GPU-accelerated workflows with intelligent monitoring and automated resource management.
All services are proxied through NGINX with basic authentication:
- ComfyUI β Port 3001 - Advanced image generation platform with 25+ custom nodes
- Jupyter Lab β Port 3002 - Interactive Python development environment
- AI-Toolkit β Port 3003 - FLUX/LoRA model training platform by Ostris
- Filebrowser β Port 3004 - Web-based file management interface
- Ollama β Port 3005 - Local LLM inference service
- SSH β Port 22 - Direct shell access
Services are not pre-installed. Install them on first use:
# Install ComfyUI (includes 25+ custom nodes)
/workspace/service_manager.sh install comfyui
# Install AI-Toolkit (includes web UI)
/workspace/service_manager.sh install aitoolkit
# Ollama is pre-installed and runs automaticallyNote: After installation, supervisor will automatically start the services. Check status with supervisorctl status.
Once running, access services through your RunPod instance:
https://YOUR_POD_ID-3001.proxy.runpod.net # ComfyUI
https://YOUR_POD_ID-3002.proxy.runpod.net # Jupyter Lab
https://YOUR_POD_ID-3003.proxy.runpod.net # AI-Toolkit
https://YOUR_POD_ID-3004.proxy.runpod.net # Filebrowser
https://YOUR_POD_ID-3005.proxy.runpod.net # Ollama
Default credentials: Check your RunPod environment or /etc/nginx/.htpasswd
The unified service manager handles installation and manual service control:
# Installation
/workspace/service_manager.sh install comfyui # Install ComfyUI + custom nodes
/workspace/service_manager.sh install aitoolkit # Install AI-Toolkit + web UI
# Manual start (normally handled by supervisor)
/workspace/service_manager.sh start comfyui
/workspace/service_manager.sh start aitoolkitKey Features:
- ComfyUI: Uses Python 3.10 venv with
--system-site-packagesto share base PyTorch - AI-Toolkit: Isolated Python 3.10 venv with its own PyTorch installation (CUDA-aware)
- Auto-detection: Detects CUDA version (12.8 or 12.1) and installs appropriate PyTorch
Supervisor manages all services as background processes:
# Check all service status
supervisorctl status
# Control individual services
supervisorctl start comfyui
supervisorctl stop aitoolkit
supervisorctl restart ollama
# Restart all services
supervisorctl restart all
# View real-time logs
supervisorctl tail -f comfyui
supervisorctl tail -f watchtower
# Exit supervisor console
supervisorctl quit- nginx - Reverse proxy with authentication
- sshd - SSH server for remote access
- jupyter - JupyterLab server (Python 3.10)
- filebrowser - Web file manager (noauth mode)
- comfyui - Image generation service
- aitoolkit - Training service with Node.js UI
- ollama - LLM inference service
- watchtower - Activity monitor & health checker
# Supervisor logs (all in /var/log/supervisor/)
/var/log/supervisor/nginx.log
/var/log/supervisor/sshd.log
/var/log/supervisor/jupyter.log
/var/log/supervisor/filebrowser.log
/var/log/supervisor/comfyui.log
/var/log/supervisor/aitoolkit.log
/var/log/supervisor/ollama.log
/var/log/supervisor/watchtower.log
# View in real-time
tail -f /var/log/supervisor/comfyui.log
tail -f /var/log/supervisor/watchtower.log
# View last 50 lines
tail -50 /var/log/supervisor/aitoolkit.logTest service availability directly:
# ComfyUI
curl http://127.0.0.1:8188
# AI-Toolkit
curl http://127.0.0.1:8675
# Ollama
curl http://127.0.0.1:11434/api/tags
# Check listening ports
netstat -tlnp | grep -E '8188|8675|11434|8888|8080'The watchtower service provides intelligent activity monitoring and health checks:
- Full Activity - GPU/CPU usage or SSH connections β Resets inactivity timer
- HTTP-Only Activity - Browser connections only β Shorter timeout threshold
- No Activity - Increments counter β Eventual shutdown (if enabled)
Continuously monitors service endpoints and logs status changes:
- ComfyUI:
http://127.0.0.1:8188 - AI-Toolkit:
http://127.0.0.1:8675 - Ollama:
http://127.0.0.1:11434
Status indicators: β READY | β³ WAITING | β DOWN
Configure watchtower behavior through environment variables:
# Monitoring Controls
INACTIVITY_MONITOR_ENABLED=true # Enable auto-shutdown (default: false)
INACTIVITY_TIMEOUT=30 # Minutes of inactivity before shutdown (default: 30)
HTTP_ONLY_TIMEOUT=10 # Minutes of HTTP-only activity (default: 10)
# Check Intervals
CHECK_INTERVAL=60 # Seconds between activity checks (default: 60)
HEALTH_CHECK_INTERVAL=30 # Seconds between health checks (default: 30)
# Activity Thresholds
CPU_USAGE_THRESHOLD=10 # CPU % to be considered active (default: 10)
# Testing & Debug
INACTIVITY_MONITOR_DRY_RUN=true # Dry run mode - no actual shutdown (default: true)
DEBUG=false # Enable verbose logging (default: false)
# RunPod API (required for actual shutdown)
RUNPOD_POD_ID=your_pod_id # Your RunPod instance ID
RUNPOD_API_KEY=your_api_key # Your RunPod API keySafety Note: Inactivity monitoring is disabled by default (INACTIVITY_MONITOR_ENABLED=false) and runs in dry-run mode for safety.
# 1. Verify installation
ls -la /workspace/ComfyUI
ls -la /workspace/AI-Toolkit
# 2. Check if venv exists
ls -la /workspace/ComfyUI/venv
ls -la /workspace/AI-Toolkit/venv
# 3. Review supervisor status
supervisorctl status
# 4. Check error logs
tail -50 /var/log/supervisor/comfyui.log
tail -50 /var/log/supervisor/aitoolkit.log
# 5. Try manual installation
/workspace/service_manager.sh install comfyui
/workspace/service_manager.sh install aitoolkit
# 6. Restart service
supervisorctl restart comfyui# Check installation logs during install
/workspace/service_manager.sh install comfyui 2>&1 | tee install.log
# Verify Python version
python3.10 --version
# Check PyTorch availability (for ComfyUI troubleshooting)
python3.10 -c "import torch; print(torch.__version__)"
# Check CUDA availability
nvidia-smi
# Manual venv test
cd /workspace/ComfyUI
source venv/bin/activate
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"ComfyUI custom nodes are installed during the initial install comfyui process. If a specific node fails:
# Navigate to custom nodes directory
cd /workspace/ComfyUI/custom_nodes
# Check which nodes exist
ls -la
# Manually install a missing node
cd /workspace/ComfyUI/custom_nodes
git clone https://github.com/author/node-name.git
cd node-name
pip install -r requirements.txt # If requirements.txt exists# Check which ports are listening
netstat -tlnp
# Check if nginx is running
supervisorctl status nginx
# Test nginx config
nginx -t
# Restart nginx
supervisorctl restart nginx# Run manual health checks
curl -I http://127.0.0.1:8188 # ComfyUI
curl -I http://127.0.0.1:8675 # AI-Toolkit
curl -I http://127.0.0.1:8888 # Jupyter
curl -I http://127.0.0.1:8080 # Filebrowser
curl http://127.0.0.1:11434/api/tags # Ollama
# Check from outside (through nginx)
curl -I http://127.0.0.1:3001 # ComfyUI via nginx
curl -I http://127.0.0.1:3002 # Jupyter via nginx# Stop all services
supervisorctl stop all
# Remove installations (careful!)
rm -rf /workspace/ComfyUI
rm -rf /workspace/AI-Toolkit
# Reinstall
/workspace/service_manager.sh install comfyui
/workspace/service_manager.sh install aitoolkit
# Start services
supervisorctl start allpodify-me/
βββ Dockerfile # Main container definition
βββ README.md # This file
β
βββ .github/
β βββ copilot-instructions.md # AI assistant context
β βββ workflows/
β βββ main_workflow.yml # CI/CD pipeline
β
βββ conf/ # Configuration files
β βββ supervisord.conf # Supervisor service definitions
β βββ nginx.conf # NGINX proxy configuration
β βββ nginx.htpasswd # Basic auth credentials
β βββ jupyter_lab_config.py # Jupyter configuration
β βββ comfyUI_extra_model_paths.yaml # ComfyUI model paths
β βββ snippets/ # NGINX config snippets
β βββ nginx-proxy.conf
β βββ nginx-error-handling.conf
β
βββ scripts/ # Runtime scripts
βββ entrypoint.sh # Container initialization
βββ service_manager.sh # Unified install/start manager
βββ watchtower.sh # Monitoring & health checks
/workspace/ # Persistent storage (runtime)
βββ ComfyUI/ # ComfyUI installation (post-install)
β βββ venv/ # Python venv (--system-site-packages)
β βββ custom_nodes/ # 25+ custom nodes
βββ AI-Toolkit/ # AI-Toolkit installation (post-install)
β βββ venv/ # Isolated Python venv
β βββ ui/ # Node.js web UI
βββ _assets/ # Data directories
β βββ ComfyUI/
β βββ models/ # Shared model storage
β βββ user/ # User settings
β βββ output/ # Generated images
β βββ input/ # Input files
βββ .cache/ # Shared caches
β βββ pip/
β βββ uv/
β βββ virtualenv/
β βββ huggingface/
βββ .ollama/ # Ollama models
βββ service_manager.sh # Copied for user convenience
| Service | Proxy Port | Backend Port | Description |
|---|---|---|---|
| ComfyUI | 3001 | 8188 | Image generation UI with custom nodes |
| Jupyter Lab | 3002 | 8888 | Interactive Python environment |
| AI-Toolkit | 3003 | 8675 | FLUX/LoRA training UI |
| Filebrowser | 3004 | 8080 | Web-based file manager |
| Ollama | 3005 | 11434 | LLM inference API |
| SSH | 22 | - | Direct shell access |
Services run on 127.0.0.1 and are proxied through NGINX with basic authentication:
- All services behind NGINX require authentication (see
/etc/nginx/.htpasswd) - Direct access to backend ports (8188, 8675, etc.) is blocked from external connections
- Only proxy ports (3001-3005) and SSH (22) are exposed externally
-
Entrypoint (
scripts/entrypoint.sh)- Initializes environment and SSH keys
- Configures filebrowser database
- Copies service_manager.sh to
/workspace - Exports environment variables
- Hands off to supervisord
-
Supervisord (
conf/supervisord.conf)- Manages 8 background services
- Handles automatic restarts
- Logs all service output
- Priority-based startup order
-
Service Manager (
scripts/service_manager.sh)- Handles ComfyUI and AI-Toolkit installation
- Manages virtual environments
- Detects CUDA version for PyTorch
- Starts services with proper activation
- ComfyUI: Uses
--system-site-packagesto share base PyTorch (~5GB savings) - AI-Toolkit: Isolated venv with its own PyTorch build (version control)
- Jupyter: Uses system Python 3.10 directly
# Default build (CUDA 12.4.1, Ubuntu 22.04)
docker build -t podifyme:localdev .
# Custom CUDA/Ubuntu version
docker build \
--build-arg CUDA_VERSION=12.8.1 \
--build-arg UBUNTU_VERSION=22.04 \
-t podifyme:localdev .# Basic run
docker run -it --rm \
-p 3001:3001 -p 3002:3002 -p 3003:3003 -p 3004:3004 -p 3005:3005 \
-p 2222:22 \
--gpus all \
podifyme:localdev
# With watchtower monitoring (dry-run)
docker run -it --rm \
-p 3001-3005:3001-3005 -p 2222:22 \
--gpus all \
-e INACTIVITY_MONITOR_ENABLED=true \
-e INACTIVITY_TIMEOUT=30 \
-e INACTIVITY_MONITOR_DRY_RUN=true \
podifyme:localdev
# With persistent volume
docker run -it --rm \
-p 3001-3005:3001-3005 -p 2222:22 \
--gpus all \
-v ./workspace:/workspace \
podifyme:localdevThis repository includes VS Code tasks for common operations:
- Docker: Build Image (Dev) - Builds the container
- Docker: Run Container - Runs with monitoring enabled
- Docker: Debug Container - Runs with debug logging
Press Ctrl+Shift+B to access build tasks or Ctrl+Shift+P β "Tasks: Run Task"
- All web services protected by NGINX basic authentication
- Credentials stored in
/etc/nginx/.htpasswd - SSH requires public key authentication (set via
PUBLIC_KEYenv var) - Filebrowser runs in noauth mode (protected by NGINX layer)
This is a personal RunPod container, but suggestions and improvements are welcome!
- Fork the repository
- Create a feature branch
- Make your changes
- Test with local Docker build
- Submit a pull request
MIT License - See LICENSE file for details
- ComfyUI: comfyanonymous/ComfyUI
- AI-Toolkit: ostris/ai-toolkit
- Ollama: ollama/ollama
Note: This container is designed for RunPod but can run on any Docker host with NVIDIA GPU support.