This guide helps you get up and running with VideoAnnotator using Docker (recommended) or a local uv install.
- Python 3.12+ (required)
- uv package manager (fast, modern dependency management)
- Git (for version control)
- Optional: CUDA-compatible GPU for faster processing
For the most reliable setup (consistent dependencies, easier GPU support), run VideoAnnotator in Docker:
# CPU (default)
docker compose up --build
# GPU (requires NVIDIA Container Toolkit)
docker compose --profile gpu up --build videoannotator-gpuOpen http://localhost:18011/docs for the interactive API documentation.
To initialize the database and create an admin API key explicitly:
docker compose exec videoannotator setupdb --admin-email you@example.com --admin-username you
# If you launched the GPU service instead:
docker compose exec videoannotator-gpu setupdb --admin-email you@example.com --admin-username you# Linux/Mac
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"git clone https://github.com/InfantLab/VideoAnnotator.git
cd VideoAnnotator
# Install all dependencies (fast!)
uv sync
# Install development dependencies
uv sync --extra dev
# Initialize the database and create an admin API key (idempotent)
uv run videoannotator setup-db --admin-email you@example.com --admin-username you# Test the installation
uv run python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# Test CLI interface
uv run videoannotator --helpVideoAnnotator runs everything through one integrated API server with built-in background processing:
# Quick start (simplest - server is default)
uv run videoannotator
# Or explicitly specify the server command with options
uv run videoannotator server --host 0.0.0.0 --port 18011
# For custom client development or testing (allows all CORS origins)
uv run videoannotator --dev
# View interactive API documentation at http://localhost:18011/docs
# Server includes integrated background job processing - no separate worker needed!CORS Note: The official web client (video-annotation-viewer on port 19011) is automatically allowed. For custom clients, use --dev mode or set CORS_ORIGINS environment variable.
The modern CLI makes video processing simple:
# Submit a video processing job
uv run videoannotator job submit video.mp4 --pipelines "scene,person,face"
# Check job status (returns job ID from submit command)
uv run videoannotator job status <job_id>
# Get detailed results
uv run videoannotator job results <job_id>
# Download all annotations (ZIP)
uv run videoannotator job download-annotations <job_id>
# List all jobs
uv run videoannotator job list --status completed# List available pipelines
uv run videoannotator pipelines --detailed
# Show system information and database status
uv run videoannotator info
# Validate configuration files
uv run videoannotator config --validate config.yaml
# Backup database
uv run videoannotator backup backup.dbDirect API access for integration with other systems:
# Use the API key printed by `setup-db` (or run `videoannotator generate-token`)
export API_KEY="va_api_your_key_here"
# Submit a video processing job
curl -X POST "http://localhost:18011/api/v1/jobs/" \
-H "Authorization: Bearer $API_KEY" \
-F "video=@video.mp4" \
-F "selected_pipelines=scene,person,face"
# Check job status
curl -H "Authorization: Bearer $API_KEY" \
"http://localhost:18011/api/v1/jobs/{job_id}"
# Get detailed results with pipeline outputs
curl -H "Authorization: Bearer $API_KEY" \
"http://localhost:18011/api/v1/jobs/{job_id}/results"
# Download specific pipeline result files
curl -H "Authorization: Bearer $API_KEY" \
"http://localhost:18011/api/v1/jobs/{job_id}/results/files/scene_detection" -Ofrom videoannotator.pipelines.scene_detection.scene_pipeline import SceneDetectionPipeline
from videoannotator.pipelines.person_tracking.person_pipeline import PersonTrackingPipeline
# Scene detection
scene_config = {
"threshold": 30.0,
"min_scene_length": 1.0,
"enabled": True
}
pipeline = SceneDetectionPipeline(scene_config)
pipeline.initialize()
results = pipeline.process(
video_path="path/to/video.mp4",
start_time=0.0,
end_time=30.0, # Process first 30 seconds
output_dir="output/"
)
pipeline.cleanup()VideoAnnotator uses YAML configuration files for flexible setup:
# config.yaml
scene_detection:
threshold: 30.0
min_scene_length: 1.0
enabled: true
person_tracking:
model: "yolo11n-pose.pt"
conf_threshold: 0.4
iou_threshold: 0.7
track_mode: trueVideoAnnotator generates structured JSON files with comprehensive metadata:
{
"metadata": {
"videoannotator": {
"version": "1.4.1",
"git": { "commit_hash": "359d693e..." }
},
"pipeline": { "name": "SceneDetectionPipeline" },
"model": { "model_name": "PySceneDetect + CLIP" }
},
"annotations": [
{
"scene_id": "scene_001",
"start_time": 0.0,
"end_time": 10.0,
"scene_type": "living_room"
}
]
}| Pipeline | Description | Output | Status |
|---|---|---|---|
| scene_detection | Scene boundary detection + CLIP environment classification | *_scene_detection.json |
✅ Ready |
| person_tracking | YOLO11 + ByteTrack multi-person pose tracking | *_person_tracking.json |
✅ Ready |
| face_analysis | OpenFace 3.0 + LAION facial behavior analysis | *_laion_face_annotations.json |
✅ Ready |
| audio_processing | Whisper speech recognition + pyannote diarization | *_speech_recognition.vtt |
✅ Ready |
All pipelines are fully integrated with the API server and process through the background job system!
- Read the Full Installation Guide for detailed setup
- Explore Pipeline Specifications for detailed pipeline documentation
- Learn about Demo Commands for complete usage examples
- Check out Testing Overview for QA information
- See Examples CLI Update Plan for updated CLI patterns
# Check CUDA availability
uv run python -c "import torch; print(torch.cuda.is_available())"
# In Docker:
docker compose exec videoannotator uv run python -c "import torch; print(torch.cuda.is_available())"# Install FFmpeg
# Ubuntu/Debian:
sudo apt install ffmpeg
# macOS:
brew install ffmpeg
# Windows: Download from https://ffmpeg.org/Models are downloaded automatically on first use. Ensure you have:
- Stable internet connection
- Sufficient disk space (~2GB for all models)
- Proper permissions for the models directory
- 📖 Documentation: Check the
docs/folder - 🐛 Issues: Report bugs on GitHub Issues
- 💬 Discussions: Join GitHub Discussions for questions
- 📧 Contact: Email the development team
- Use GPU: Install CUDA-compatible PyTorch for 10x speedup
- Batch Processing: Process multiple videos together
- Optimize Parameters: Reduce PPS for faster processing
- Memory Management: Process shorter segments for large videos
Happy annotating! 🎥✨