Skip to content

thenullengine/podify-me

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Podify-Me: AI/ML RunPod Container

A comprehensive RunPod container orchestrating multiple AI/ML services (ComfyUI, AI-Toolkit, Ollama) via supervisord, designed for GPU-accelerated workflows with intelligent monitoring and automated resource management.

🎯 Services Included

All services are proxied through NGINX with basic authentication:

  • ComfyUI β†’ Port 3001 - Advanced image generation platform with 25+ custom nodes
  • Jupyter Lab β†’ Port 3002 - Interactive Python development environment
  • AI-Toolkit β†’ Port 3003 - FLUX/LoRA model training platform by Ostris
  • Filebrowser β†’ Port 3004 - Web-based file management interface
  • Ollama β†’ Port 3005 - Local LLM inference service
  • SSH β†’ Port 22 - Direct shell access

πŸš€ Quick Start

First Time Setup

Services are not pre-installed. Install them on first use:

# Install ComfyUI (includes 25+ custom nodes)
/workspace/service_manager.sh install comfyui

# Install AI-Toolkit (includes web UI)
/workspace/service_manager.sh install aitoolkit

# Ollama is pre-installed and runs automatically

Note: After installation, supervisor will automatically start the services. Check status with supervisorctl status.

Accessing Services

Once running, access services through your RunPod instance:

https://YOUR_POD_ID-3001.proxy.runpod.net  # ComfyUI
https://YOUR_POD_ID-3002.proxy.runpod.net  # Jupyter Lab
https://YOUR_POD_ID-3003.proxy.runpod.net  # AI-Toolkit
https://YOUR_POD_ID-3004.proxy.runpod.net  # Filebrowser
https://YOUR_POD_ID-3005.proxy.runpod.net  # Ollama

Default credentials: Check your RunPod environment or /etc/nginx/.htpasswd

πŸ“– Service Manager Usage

The unified service manager handles installation and manual service control:

# Installation
/workspace/service_manager.sh install comfyui     # Install ComfyUI + custom nodes
/workspace/service_manager.sh install aitoolkit   # Install AI-Toolkit + web UI

# Manual start (normally handled by supervisor)
/workspace/service_manager.sh start comfyui
/workspace/service_manager.sh start aitoolkit

Key Features:

  • ComfyUI: Uses Python 3.10 venv with --system-site-packages to share base PyTorch
  • AI-Toolkit: Isolated Python 3.10 venv with its own PyTorch installation (CUDA-aware)
  • Auto-detection: Detects CUDA version (12.8 or 12.1) and installs appropriate PyTorch

πŸ”§ Supervisor Management

Supervisor manages all services as background processes:

# Check all service status
supervisorctl status

# Control individual services
supervisorctl start comfyui
supervisorctl stop aitoolkit
supervisorctl restart ollama

# Restart all services
supervisorctl restart all

# View real-time logs
supervisorctl tail -f comfyui
supervisorctl tail -f watchtower

# Exit supervisor console
supervisorctl quit

Managed Services

  1. nginx - Reverse proxy with authentication
  2. sshd - SSH server for remote access
  3. jupyter - JupyterLab server (Python 3.10)
  4. filebrowser - Web file manager (noauth mode)
  5. comfyui - Image generation service
  6. aitoolkit - Training service with Node.js UI
  7. ollama - LLM inference service
  8. watchtower - Activity monitor & health checker

πŸ“Š Logging & Debugging

Service Log Files

# Supervisor logs (all in /var/log/supervisor/)
/var/log/supervisor/nginx.log
/var/log/supervisor/sshd.log
/var/log/supervisor/jupyter.log
/var/log/supervisor/filebrowser.log
/var/log/supervisor/comfyui.log
/var/log/supervisor/aitoolkit.log
/var/log/supervisor/ollama.log
/var/log/supervisor/watchtower.log

# View in real-time
tail -f /var/log/supervisor/comfyui.log
tail -f /var/log/supervisor/watchtower.log

# View last 50 lines
tail -50 /var/log/supervisor/aitoolkit.log

Health Check Endpoints

Test service availability directly:

# ComfyUI
curl http://127.0.0.1:8188

# AI-Toolkit  
curl http://127.0.0.1:8675

# Ollama
curl http://127.0.0.1:11434/api/tags

# Check listening ports
netstat -tlnp | grep -E '8188|8675|11434|8888|8080'

πŸ” Watchtower Monitoring

The watchtower service provides intelligent activity monitoring and health checks:

Activity Detection (3 Levels)

  1. Full Activity - GPU/CPU usage or SSH connections β†’ Resets inactivity timer
  2. HTTP-Only Activity - Browser connections only β†’ Shorter timeout threshold
  3. No Activity - Increments counter β†’ Eventual shutdown (if enabled)

Health Monitoring

Continuously monitors service endpoints and logs status changes:

  • ComfyUI: http://127.0.0.1:8188
  • AI-Toolkit: http://127.0.0.1:8675
  • Ollama: http://127.0.0.1:11434

Status indicators: βœ… READY | ⏳ WAITING | ❌ DOWN

Environment Variables

Configure watchtower behavior through environment variables:

# Monitoring Controls
INACTIVITY_MONITOR_ENABLED=true   # Enable auto-shutdown (default: false)
INACTIVITY_TIMEOUT=30             # Minutes of inactivity before shutdown (default: 30)
HTTP_ONLY_TIMEOUT=10              # Minutes of HTTP-only activity (default: 10)

# Check Intervals
CHECK_INTERVAL=60                 # Seconds between activity checks (default: 60)
HEALTH_CHECK_INTERVAL=30          # Seconds between health checks (default: 30)

# Activity Thresholds
CPU_USAGE_THRESHOLD=10            # CPU % to be considered active (default: 10)

# Testing & Debug
INACTIVITY_MONITOR_DRY_RUN=true   # Dry run mode - no actual shutdown (default: true)
DEBUG=false                       # Enable verbose logging (default: false)

# RunPod API (required for actual shutdown)
RUNPOD_POD_ID=your_pod_id        # Your RunPod instance ID
RUNPOD_API_KEY=your_api_key      # Your RunPod API key

Safety Note: Inactivity monitoring is disabled by default (INACTIVITY_MONITOR_ENABLED=false) and runs in dry-run mode for safety.

πŸ› Troubleshooting

Service Won't Start

# 1. Verify installation
ls -la /workspace/ComfyUI
ls -la /workspace/AI-Toolkit

# 2. Check if venv exists
ls -la /workspace/ComfyUI/venv
ls -la /workspace/AI-Toolkit/venv

# 3. Review supervisor status
supervisorctl status

# 4. Check error logs
tail -50 /var/log/supervisor/comfyui.log
tail -50 /var/log/supervisor/aitoolkit.log

# 5. Try manual installation
/workspace/service_manager.sh install comfyui
/workspace/service_manager.sh install aitoolkit

# 6. Restart service
supervisorctl restart comfyui

Installation Failed

# Check installation logs during install
/workspace/service_manager.sh install comfyui 2>&1 | tee install.log

# Verify Python version
python3.10 --version

# Check PyTorch availability (for ComfyUI troubleshooting)
python3.10 -c "import torch; print(torch.__version__)"

# Check CUDA availability
nvidia-smi

# Manual venv test
cd /workspace/ComfyUI
source venv/bin/activate
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"

Custom Node Issues

ComfyUI custom nodes are installed during the initial install comfyui process. If a specific node fails:

# Navigate to custom nodes directory
cd /workspace/ComfyUI/custom_nodes

# Check which nodes exist
ls -la

# Manually install a missing node
cd /workspace/ComfyUI/custom_nodes
git clone https://github.com/author/node-name.git
cd node-name
pip install -r requirements.txt  # If requirements.txt exists

Port Conflicts

# Check which ports are listening
netstat -tlnp

# Check if nginx is running
supervisorctl status nginx

# Test nginx config
nginx -t

# Restart nginx
supervisorctl restart nginx

Service Health Check

# Run manual health checks
curl -I http://127.0.0.1:8188    # ComfyUI
curl -I http://127.0.0.1:8675    # AI-Toolkit
curl -I http://127.0.0.1:8888    # Jupyter
curl -I http://127.0.0.1:8080    # Filebrowser
curl http://127.0.0.1:11434/api/tags  # Ollama

# Check from outside (through nginx)
curl -I http://127.0.0.1:3001    # ComfyUI via nginx
curl -I http://127.0.0.1:3002    # Jupyter via nginx

Reset Everything

# Stop all services
supervisorctl stop all

# Remove installations (careful!)
rm -rf /workspace/ComfyUI
rm -rf /workspace/AI-Toolkit

# Reinstall
/workspace/service_manager.sh install comfyui
/workspace/service_manager.sh install aitoolkit

# Start services
supervisorctl start all

πŸ“ Directory Structure

podify-me/
β”œβ”€β”€ Dockerfile                          # Main container definition
β”œβ”€β”€ README.md                           # This file
β”‚
β”œβ”€β”€ .github/
β”‚   β”œβ”€β”€ copilot-instructions.md        # AI assistant context
β”‚   └── workflows/
β”‚       └── main_workflow.yml          # CI/CD pipeline
β”‚
β”œβ”€β”€ conf/                              # Configuration files
β”‚   β”œβ”€β”€ supervisord.conf               # Supervisor service definitions
β”‚   β”œβ”€β”€ nginx.conf                     # NGINX proxy configuration
β”‚   β”œβ”€β”€ nginx.htpasswd                 # Basic auth credentials
β”‚   β”œβ”€β”€ jupyter_lab_config.py          # Jupyter configuration
β”‚   β”œβ”€β”€ comfyUI_extra_model_paths.yaml # ComfyUI model paths
β”‚   └── snippets/                      # NGINX config snippets
β”‚       β”œβ”€β”€ nginx-proxy.conf
β”‚       └── nginx-error-handling.conf
β”‚
└── scripts/                           # Runtime scripts
    β”œβ”€β”€ entrypoint.sh                  # Container initialization
    β”œβ”€β”€ service_manager.sh             # Unified install/start manager
    └── watchtower.sh                  # Monitoring & health checks

/workspace/                            # Persistent storage (runtime)
β”œβ”€β”€ ComfyUI/                          # ComfyUI installation (post-install)
β”‚   β”œβ”€β”€ venv/                         # Python venv (--system-site-packages)
β”‚   └── custom_nodes/                 # 25+ custom nodes
β”œβ”€β”€ AI-Toolkit/                       # AI-Toolkit installation (post-install)
β”‚   β”œβ”€β”€ venv/                         # Isolated Python venv
β”‚   └── ui/                           # Node.js web UI
β”œβ”€β”€ _assets/                          # Data directories
β”‚   └── ComfyUI/
β”‚       β”œβ”€β”€ models/                   # Shared model storage
β”‚       β”œβ”€β”€ user/                     # User settings
β”‚       β”œβ”€β”€ output/                   # Generated images
β”‚       └── input/                    # Input files
β”œβ”€β”€ .cache/                           # Shared caches
β”‚   β”œβ”€β”€ pip/
β”‚   β”œβ”€β”€ uv/
β”‚   β”œβ”€β”€ virtualenv/
β”‚   └── huggingface/
β”œβ”€β”€ .ollama/                          # Ollama models
└── service_manager.sh                # Copied for user convenience

🌐 Port Mapping

NGINX Proxy Ports (External Access)

Service Proxy Port Backend Port Description
ComfyUI 3001 8188 Image generation UI with custom nodes
Jupyter Lab 3002 8888 Interactive Python environment
AI-Toolkit 3003 8675 FLUX/LoRA training UI
Filebrowser 3004 8080 Web-based file manager
Ollama 3005 11434 LLM inference API
SSH 22 - Direct shell access

Internal Service Ports

Services run on 127.0.0.1 and are proxied through NGINX with basic authentication:

  • All services behind NGINX require authentication (see /etc/nginx/.htpasswd)
  • Direct access to backend ports (8188, 8675, etc.) is blocked from external connections
  • Only proxy ports (3001-3005) and SSH (22) are exposed externally

πŸ—οΈ Architecture Overview

Three-Layer Service Management

  1. Entrypoint (scripts/entrypoint.sh)

    • Initializes environment and SSH keys
    • Configures filebrowser database
    • Copies service_manager.sh to /workspace
    • Exports environment variables
    • Hands off to supervisord
  2. Supervisord (conf/supervisord.conf)

    • Manages 8 background services
    • Handles automatic restarts
    • Logs all service output
    • Priority-based startup order
  3. Service Manager (scripts/service_manager.sh)

    • Handles ComfyUI and AI-Toolkit installation
    • Manages virtual environments
    • Detects CUDA version for PyTorch
    • Starts services with proper activation

Virtual Environment Strategy

  • ComfyUI: Uses --system-site-packages to share base PyTorch (~5GB savings)
  • AI-Toolkit: Isolated venv with its own PyTorch build (version control)
  • Jupyter: Uses system Python 3.10 directly

🐳 Building & Running Locally

Build the Image

# Default build (CUDA 12.4.1, Ubuntu 22.04)
docker build -t podifyme:localdev .

# Custom CUDA/Ubuntu version
docker build \
  --build-arg CUDA_VERSION=12.8.1 \
  --build-arg UBUNTU_VERSION=22.04 \
  -t podifyme:localdev .

Run the Container

# Basic run
docker run -it --rm \
  -p 3001:3001 -p 3002:3002 -p 3003:3003 -p 3004:3004 -p 3005:3005 \
  -p 2222:22 \
  --gpus all \
  podifyme:localdev

# With watchtower monitoring (dry-run)
docker run -it --rm \
  -p 3001-3005:3001-3005 -p 2222:22 \
  --gpus all \
  -e INACTIVITY_MONITOR_ENABLED=true \
  -e INACTIVITY_TIMEOUT=30 \
  -e INACTIVITY_MONITOR_DRY_RUN=true \
  podifyme:localdev

# With persistent volume
docker run -it --rm \
  -p 3001-3005:3001-3005 -p 2222:22 \
  --gpus all \
  -v ./workspace:/workspace \
  podifyme:localdev

VS Code Tasks

This repository includes VS Code tasks for common operations:

  • Docker: Build Image (Dev) - Builds the container
  • Docker: Run Container - Runs with monitoring enabled
  • Docker: Debug Container - Runs with debug logging

Press Ctrl+Shift+B to access build tasks or Ctrl+Shift+P β†’ "Tasks: Run Task"

πŸ” Security

  • All web services protected by NGINX basic authentication
  • Credentials stored in /etc/nginx/.htpasswd
  • SSH requires public key authentication (set via PUBLIC_KEY env var)
  • Filebrowser runs in noauth mode (protected by NGINX layer)

🀝 Contributing

This is a personal RunPod container, but suggestions and improvements are welcome!

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test with local Docker build
  5. Submit a pull request

πŸ“ License

MIT License - See LICENSE file for details

πŸ™ Credits


Note: This container is designed for RunPod but can run on any Docker host with NVIDIA GPU support.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors