Podify-Me: AI/ML RunPod Container

A comprehensive RunPod container orchestrating multiple AI/ML services (ComfyUI, AI-Toolkit, Ollama) via supervisord, designed for GPU-accelerated workflows with intelligent monitoring and automated resource management.

🎯 Services Included

All services are proxied through NGINX with basic authentication:

ComfyUI → Port 3001 - Advanced image generation platform with 25+ custom nodes
Jupyter Lab → Port 3002 - Interactive Python development environment
AI-Toolkit → Port 3003 - FLUX/LoRA model training platform by Ostris
Filebrowser → Port 3004 - Web-based file management interface
Ollama → Port 3005 - Local LLM inference service
SSH → Port 22 - Direct shell access

🚀 Quick Start

First Time Setup

Services are not pre-installed. Install them on first use:

# Install ComfyUI (includes 25+ custom nodes)
/workspace/service_manager.sh install comfyui

# Install AI-Toolkit (includes web UI)
/workspace/service_manager.sh install aitoolkit

# Ollama is pre-installed and runs automatically

Note: After installation, supervisor will automatically start the services. Check status with supervisorctl status.

Accessing Services

Once running, access services through your RunPod instance:

https://YOUR_POD_ID-3001.proxy.runpod.net  # ComfyUI
https://YOUR_POD_ID-3002.proxy.runpod.net  # Jupyter Lab
https://YOUR_POD_ID-3003.proxy.runpod.net  # AI-Toolkit
https://YOUR_POD_ID-3004.proxy.runpod.net  # Filebrowser
https://YOUR_POD_ID-3005.proxy.runpod.net  # Ollama

Default credentials: Check your RunPod environment or /etc/nginx/.htpasswd

📖 Service Manager Usage

The unified service manager handles installation and manual service control:

# Installation
/workspace/service_manager.sh install comfyui     # Install ComfyUI + custom nodes
/workspace/service_manager.sh install aitoolkit   # Install AI-Toolkit + web UI

# Manual start (normally handled by supervisor)
/workspace/service_manager.sh start comfyui
/workspace/service_manager.sh start aitoolkit

Key Features:

ComfyUI: Uses Python 3.10 venv with --system-site-packages to share base PyTorch
AI-Toolkit: Isolated Python 3.10 venv with its own PyTorch installation (CUDA-aware)
Auto-detection: Detects CUDA version (12.8 or 12.1) and installs appropriate PyTorch

🔧 Supervisor Management

Supervisor manages all services as background processes:

# Check all service status
supervisorctl status

# Control individual services
supervisorctl start comfyui
supervisorctl stop aitoolkit
supervisorctl restart ollama

# Restart all services
supervisorctl restart all

# View real-time logs
supervisorctl tail -f comfyui
supervisorctl tail -f watchtower

# Exit supervisor console
supervisorctl quit

Managed Services

nginx - Reverse proxy with authentication
sshd - SSH server for remote access
jupyter - JupyterLab server (Python 3.10)
filebrowser - Web file manager (noauth mode)
comfyui - Image generation service
aitoolkit - Training service with Node.js UI
ollama - LLM inference service
watchtower - Activity monitor & health checker

📊 Logging & Debugging

Service Log Files

# Supervisor logs (all in /var/log/supervisor/)
/var/log/supervisor/nginx.log
/var/log/supervisor/sshd.log
/var/log/supervisor/jupyter.log
/var/log/supervisor/filebrowser.log
/var/log/supervisor/comfyui.log
/var/log/supervisor/aitoolkit.log
/var/log/supervisor/ollama.log
/var/log/supervisor/watchtower.log

# View in real-time
tail -f /var/log/supervisor/comfyui.log
tail -f /var/log/supervisor/watchtower.log

# View last 50 lines
tail -50 /var/log/supervisor/aitoolkit.log

Health Check Endpoints

Test service availability directly:

# ComfyUI
curl http://127.0.0.1:8188

# AI-Toolkit  
curl http://127.0.0.1:8675

# Ollama
curl http://127.0.0.1:11434/api/tags

# Check listening ports
netstat -tlnp | grep -E '8188|8675|11434|8888|8080'

🔍 Watchtower Monitoring

The watchtower service provides intelligent activity monitoring and health checks:

Activity Detection (3 Levels)

Full Activity - GPU/CPU usage or SSH connections → Resets inactivity timer
HTTP-Only Activity - Browser connections only → Shorter timeout threshold
No Activity - Increments counter → Eventual shutdown (if enabled)

Health Monitoring

Continuously monitors service endpoints and logs status changes:

ComfyUI: http://127.0.0.1:8188
AI-Toolkit: http://127.0.0.1:8675
Ollama: http://127.0.0.1:11434

Status indicators: ✅ READY | ⏳ WAITING | ❌ DOWN

Environment Variables

Configure watchtower behavior through environment variables:

# Monitoring Controls
INACTIVITY_MONITOR_ENABLED=true   # Enable auto-shutdown (default: false)
INACTIVITY_TIMEOUT=30             # Minutes of inactivity before shutdown (default: 30)
HTTP_ONLY_TIMEOUT=10              # Minutes of HTTP-only activity (default: 10)

# Check Intervals
CHECK_INTERVAL=60                 # Seconds between activity checks (default: 60)
HEALTH_CHECK_INTERVAL=30          # Seconds between health checks (default: 30)

# Activity Thresholds
CPU_USAGE_THRESHOLD=10            # CPU % to be considered active (default: 10)

# Testing & Debug
INACTIVITY_MONITOR_DRY_RUN=true   # Dry run mode - no actual shutdown (default: true)
DEBUG=false                       # Enable verbose logging (default: false)

# RunPod API (required for actual shutdown)
RUNPOD_POD_ID=your_pod_id        # Your RunPod instance ID
RUNPOD_API_KEY=your_api_key      # Your RunPod API key

Safety Note: Inactivity monitoring is disabled by default (INACTIVITY_MONITOR_ENABLED=false) and runs in dry-run mode for safety.

🐛 Troubleshooting

Service Won't Start

# 1. Verify installation
ls -la /workspace/ComfyUI
ls -la /workspace/AI-Toolkit

# 2. Check if venv exists
ls -la /workspace/ComfyUI/venv
ls -la /workspace/AI-Toolkit/venv

# 3. Review supervisor status
supervisorctl status

# 4. Check error logs
tail -50 /var/log/supervisor/comfyui.log
tail -50 /var/log/supervisor/aitoolkit.log

# 5. Try manual installation
/workspace/service_manager.sh install comfyui
/workspace/service_manager.sh install aitoolkit

# 6. Restart service
supervisorctl restart comfyui

Installation Failed

# Check installation logs during install
/workspace/service_manager.sh install comfyui 2>&1 | tee install.log

# Verify Python version
python3.10 --version

# Check PyTorch availability (for ComfyUI troubleshooting)
python3.10 -c "import torch; print(torch.__version__)"

# Check CUDA availability
nvidia-smi

# Manual venv test
cd /workspace/ComfyUI
source venv/bin/activate
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"

Custom Node Issues

ComfyUI custom nodes are installed during the initial install comfyui process. If a specific node fails:

# Navigate to custom nodes directory
cd /workspace/ComfyUI/custom_nodes

# Check which nodes exist
ls -la

# Manually install a missing node
cd /workspace/ComfyUI/custom_nodes
git clone https://github.com/author/node-name.git
cd node-name
pip install -r requirements.txt  # If requirements.txt exists

Port Conflicts

# Check which ports are listening
netstat -tlnp

# Check if nginx is running
supervisorctl status nginx

# Test nginx config
nginx -t

# Restart nginx
supervisorctl restart nginx

Service Health Check

# Run manual health checks
curl -I http://127.0.0.1:8188    # ComfyUI
curl -I http://127.0.0.1:8675    # AI-Toolkit
curl -I http://127.0.0.1:8888    # Jupyter
curl -I http://127.0.0.1:8080    # Filebrowser
curl http://127.0.0.1:11434/api/tags  # Ollama

# Check from outside (through nginx)
curl -I http://127.0.0.1:3001    # ComfyUI via nginx
curl -I http://127.0.0.1:3002    # Jupyter via nginx

Reset Everything

# Stop all services
supervisorctl stop all

# Remove installations (careful!)
rm -rf /workspace/ComfyUI
rm -rf /workspace/AI-Toolkit

# Reinstall
/workspace/service_manager.sh install comfyui
/workspace/service_manager.sh install aitoolkit

# Start services
supervisorctl start all

📁 Directory Structure

podify-me/
├── Dockerfile                          # Main container definition
├── README.md                           # This file
│
├── .github/
│   ├── copilot-instructions.md        # AI assistant context
│   └── workflows/
│       └── main_workflow.yml          # CI/CD pipeline
│
├── conf/                              # Configuration files
│   ├── supervisord.conf               # Supervisor service definitions
│   ├── nginx.conf                     # NGINX proxy configuration
│   ├── nginx.htpasswd                 # Basic auth credentials
│   ├── jupyter_lab_config.py          # Jupyter configuration
│   ├── comfyUI_extra_model_paths.yaml # ComfyUI model paths
│   └── snippets/                      # NGINX config snippets
│       ├── nginx-proxy.conf
│       └── nginx-error-handling.conf
│
└── scripts/                           # Runtime scripts
    ├── entrypoint.sh                  # Container initialization
    ├── service_manager.sh             # Unified install/start manager
    └── watchtower.sh                  # Monitoring & health checks

/workspace/                            # Persistent storage (runtime)
├── ComfyUI/                          # ComfyUI installation (post-install)
│   ├── venv/                         # Python venv (--system-site-packages)
│   └── custom_nodes/                 # 25+ custom nodes
├── AI-Toolkit/                       # AI-Toolkit installation (post-install)
│   ├── venv/                         # Isolated Python venv
│   └── ui/                           # Node.js web UI
├── _assets/                          # Data directories
│   └── ComfyUI/
│       ├── models/                   # Shared model storage
│       ├── user/                     # User settings
│       ├── output/                   # Generated images
│       └── input/                    # Input files
├── .cache/                           # Shared caches
│   ├── pip/
│   ├── uv/
│   ├── virtualenv/
│   └── huggingface/
├── .ollama/                          # Ollama models
└── service_manager.sh                # Copied for user convenience

🌐 Port Mapping

NGINX Proxy Ports (External Access)

Service	Proxy Port	Backend Port	Description
ComfyUI	3001	8188	Image generation UI with custom nodes
Jupyter Lab	3002	8888	Interactive Python environment
AI-Toolkit	3003	8675	FLUX/LoRA training UI
Filebrowser	3004	8080	Web-based file manager
Ollama	3005	11434	LLM inference API
SSH	22	-	Direct shell access

Internal Service Ports

Services run on 127.0.0.1 and are proxied through NGINX with basic authentication:

All services behind NGINX require authentication (see /etc/nginx/.htpasswd)
Direct access to backend ports (8188, 8675, etc.) is blocked from external connections
Only proxy ports (3001-3005) and SSH (22) are exposed externally

🏗️ Architecture Overview

Three-Layer Service Management

Entrypoint (scripts/entrypoint.sh)
- Initializes environment and SSH keys
- Configures filebrowser database
- Copies service_manager.sh to /workspace
- Exports environment variables
- Hands off to supervisord
Supervisord (conf/supervisord.conf)
- Manages 8 background services
- Handles automatic restarts
- Logs all service output
- Priority-based startup order
Service Manager (scripts/service_manager.sh)
- Handles ComfyUI and AI-Toolkit installation
- Manages virtual environments
- Detects CUDA version for PyTorch
- Starts services with proper activation

Virtual Environment Strategy

ComfyUI: Uses --system-site-packages to share base PyTorch (~5GB savings)
AI-Toolkit: Isolated venv with its own PyTorch build (version control)
Jupyter: Uses system Python 3.10 directly

🐳 Building & Running Locally

Build the Image

# Default build (CUDA 12.4.1, Ubuntu 22.04)
docker build -t podifyme:localdev .

# Custom CUDA/Ubuntu version
docker build \
  --build-arg CUDA_VERSION=12.8.1 \
  --build-arg UBUNTU_VERSION=22.04 \
  -t podifyme:localdev .

Run the Container

# Basic run
docker run -it --rm \
  -p 3001:3001 -p 3002:3002 -p 3003:3003 -p 3004:3004 -p 3005:3005 \
  -p 2222:22 \
  --gpus all \
  podifyme:localdev

# With watchtower monitoring (dry-run)
docker run -it --rm \
  -p 3001-3005:3001-3005 -p 2222:22 \
  --gpus all \
  -e INACTIVITY_MONITOR_ENABLED=true \
  -e INACTIVITY_TIMEOUT=30 \
  -e INACTIVITY_MONITOR_DRY_RUN=true \
  podifyme:localdev

# With persistent volume
docker run -it --rm \
  -p 3001-3005:3001-3005 -p 2222:22 \
  --gpus all \
  -v ./workspace:/workspace \
  podifyme:localdev

VS Code Tasks

This repository includes VS Code tasks for common operations:

Docker: Build Image (Dev) - Builds the container
Docker: Run Container - Runs with monitoring enabled
Docker: Debug Container - Runs with debug logging

Press Ctrl+Shift+B to access build tasks or Ctrl+Shift+P → "Tasks: Run Task"

🔐 Security

All web services protected by NGINX basic authentication
Credentials stored in /etc/nginx/.htpasswd
SSH requires public key authentication (set via PUBLIC_KEY env var)
Filebrowser runs in noauth mode (protected by NGINX layer)

🤝 Contributing

This is a personal RunPod container, but suggestions and improvements are welcome!

Fork the repository
Create a feature branch
Make your changes
Test with local Docker build
Submit a pull request

📝 License

MIT License - See LICENSE file for details

🙏 Credits

ComfyUI: comfyanonymous/ComfyUI
AI-Toolkit: ostris/ai-toolkit
Ollama: ollama/ollama

Note: This container is designed for RunPod but can run on any Docker host with NVIDIA GPU support.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
.vscode		.vscode
conf		conf
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Podify-Me: AI/ML RunPod Container

🎯 Services Included

🚀 Quick Start

First Time Setup

Accessing Services

📖 Service Manager Usage

🔧 Supervisor Management

Managed Services

📊 Logging & Debugging

Service Log Files

Health Check Endpoints

🔍 Watchtower Monitoring

Activity Detection (3 Levels)

Health Monitoring

Environment Variables

🐛 Troubleshooting

Service Won't Start

Installation Failed

Custom Node Issues

Port Conflicts

Service Health Check

Reset Everything

📁 Directory Structure

🌐 Port Mapping

NGINX Proxy Ports (External Access)

Internal Service Ports

🏗️ Architecture Overview

Three-Layer Service Management

Virtual Environment Strategy

🐳 Building & Running Locally

Build the Image

Run the Container

VS Code Tasks

🔐 Security

🤝 Contributing

📝 License

🙏 Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages