This document provides instructions for maintaining and operating the InfiniLM Service Virtualization and Control (SVC) system. The system implements a distributed architecture with centralized service discovery and load balancing.
The InfiniLM-SVC system consists of three main components:
┌─────────────────────────────────────────────────────────┐
│ Local Server │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Registry │ │ Router │ │
│ │ (Port 8081) │ │ (Port 8080) │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
│
│ Service Discovery & Routing
│
┌─────────────────────────────────────────────────────────┐
│ Local & Remote Servers │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Babysitter │ │ Babysitter │ │ Babysitter │ │
│ │ + xtask │ │ + xtask │ │ + xtask │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
- Service Registry - Centralized service discovery and health monitoring
- Distributed Router - Load balancer with model-aware routing and request proxying
- Enhanced Babysitter - Service management wrapper for InfiniLM services
- xtask - Provides OpenAI API interface (
/chat/completions,/models)
- Python Version (Original): Full-featured Python implementation
- Rust Version (Recommended): High-performance Rust implementation with better reliability and lower resource usage
Both versions provide the same API and functionality.
Build:
cd rust
cargo build --release --bin infini-registry --bin infini-router --bin infini-babysitterLaunch All Services:
# Configure babysitter configs
export BABYSITTER_CONFIGS=("config/babysitter1.toml" "config/babysitter2.toml")
# Optional: Override ports and settings
export REGISTRY_PORT=18000
export ROUTER_PORT=8000
# Launch
./script/launch_all_rust.shConfiguration Options:
REGISTRY_PORT- Registry port (default: 18000)ROUTER_PORT- Router port (default: 8000)ROUTER_REGISTRY_URL- Registry URL for router (default: http://localhost:18000)BABYSITTER_CONFIGS- Array of TOML config files for babysittersREGISTRY_HEALTH_INTERVAL,REGISTRY_HEALTH_TIMEOUT,REGISTRY_CLEANUP_INTERVAL- Registry settingsROUTER_HEALTH_INTERVAL,ROUTER_HEALTH_TIMEOUT,ROUTER_REGISTRY_SYNC_INTERVAL- Router settings
Docker Usage:
docker run -d \
-e REGISTRY_PORT=18000 \
-e ROUTER_PORT=8000 \
-e BABYSITTER_CONFIGS="config/babysitter1.toml config/babysitter2.toml" \
-v /path/to/config:/app/config \
-v /path/to/logs:/app/logs \
infinilm-svc-rustLaunch All Services:
./script/launch_all.shIndividual Services:
./script/launch_registry.sh- Service Registry./script/launch_router.sh- Distributed Router./script/launch_babysitter.sh- Enhanced Babysitter (template)./script/launch_babysitter_9g8b.sh- Babysitter for 9g8b model./script/launch_babysitter_qwen.sh- Babysitter for Qwen model
Rust Version:
./script/launch_all_rust.shPython Version:
./script/launch_all.sh./script/stop_all.shIndividual Stop:
./script/stop_registry.sh- Stop Registry./script/stop_router.sh- Stop Router./script/stop_babysitter.sh- Stop all Babysitters./script/stop_babysitter.sh <PORT>- Stop specific Babysitter
# Registry
curl http://localhost:18000/health
curl http://localhost:18000/services
# Router
curl http://localhost:8000/health
curl http://localhost:8000/models
curl http://localhost:8000/services
# Babysitter (port + 1)
curl http://localhost:8001/healthCreate a TOML config file in config/ directory:
name = "babysitter-service-1"
port = 8000
registry_url = "http://localhost:18000"
[babysitter]
max_restarts = 10
restart_delay = 2
heartbeat_interval = 10
[backend]
type = "command"
command = "python"
args = ["/path/to/service.py", "--port", "8000"]
work_dir = "/path/to/workdir"
env = { "CUDA_VISIBLE_DEVICES" = "0" }Edit the configuration section in launch_babysitter.sh:
PORT=8000
SERVICE_NAME="service-1"
SERVICE_TYPE="InfiniLM" # or "InfiniLM-Rust"
MODEL_PATH="/path/to/model"
REGISTRY_URL="http://localhost:18000"
HPCC_VISIBLE_DEVICES="0"Create multiple TOML config files and specify them:
export BABYSITTER_CONFIGS=(
"config/babysitter1.toml"
"config/babysitter2.toml"
"config/babysitter3.toml"
)
./script/launch_all_rust.shDuplicate and customize launch scripts:
cp script/launch_babysitter.sh script/launch_babysitter_8000.sh
cp script/launch_babysitter.sh script/launch_babysitter_8001.sh
# Edit each script's configuration section
./script/launch_babysitter_8000.sh
./script/launch_babysitter_8001.shThe router routes requests to services based on the requested model:
- Services register their supported models during startup
- Router aggregates models from all healthy services
- Requests are load-balanced among services supporting the requested model
- Returns 503 if no service supports the requested model
- Services automatically register with the registry on startup
- Registry tracks service health and metadata
- Router syncs service list from registry periodically
- Unhealthy services are automatically excluded from routing
- Registry performs periodic health checks on registered services
- Router monitors service health and routes only to healthy services
- Babysitter monitors managed service health and restarts on failure
# List all registered services
curl http://localhost:18000/services
# Router statistics
curl http://localhost:8000/stats
# Service health
curl http://localhost:8000/servicesLogs are stored in logs/ directory:
logs/registry_*.log- Registry logslogs/router_*.log- Router logslogs/babysitter_*.log- Babysitter logs
# Monitor logs
tail -f logs/registry_*.log
tail -f logs/router_*.log
tail -f logs/babysitter_*.logRouter returns 503 "No healthy services available"
- Check registry:
curl http://localhost:18000/services - Verify services are registered and healthy
- Check service logs for errors
Service not registering
- Verify registry is running:
curl http://localhost:18000/health - Check network connectivity
- Review service configuration
Model not found
- Check available models:
curl http://localhost:8000/models - Verify service model registration in registry
- Check babysitter logs for model fetching errors
# Build image
docker build -t infinilm-svc-rust -f docker/Dockerfile.rust .
# Run container
docker run -d \
-e REGISTRY_PORT=18000 \
-e ROUTER_PORT=8000 \
-e BABYSITTER_CONFIGS="config/babysitter1.toml" \
-v $(pwd)/config:/app/config \
-v $(pwd)/logs:/app/logs \
infinilm-svc-rust# Build image
docker build -t infinilm-svc -f docker/Dockerfile .
# Run container
docker run -d \
-p 8000:8000 -p 8080:8080 -p 8081:8081 \
infinilm-svc- Integration Tests:
rust/tests/integration/README.md - Babysitter Guide:
rust/src/bin/README.md - Distributed Deployment:
docs/DISTRIBUTED_DEPLOYMENT_README.md - Multi-Service Guide:
docs/MULTI_SERVICE_README.md
- Rust 1.70+ (nightly recommended)
- Cargo
- Python 3.8+
- Dependencies:
aiohttp,requests
For detailed configuration examples and advanced usage, see the individual component documentation in the docs/ directory.