All notable changes to VideoAnnotator will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Queue position display for pending jobs
- Deterministic test fixtures with synthetic video generation
- Research workflow examples for JOSS paper
- Benchmark results and performance validation
- Additional contributor documentation improvements
This release accompanies the JOSS submission of VideoAnnotator and its companion project Video Annotation Viewer.
- CLIP migration: Migrated scene-classification pipeline from
cliptoopen_clip, using the LAION-2B pretrainedViT-B-32model for improved availability and reproducibility. - HuggingFace auth: Updated diarization and Whisper pipelines to use the current
tokenparameter instead of the deprecateduse_auth_token. - Devcontainer: Simplified forwarded-port list to the single default API port (18011).
- Database GUID handling: Added defensive
try/exceptin theGUIDtype decorator to gracefully handle malformed UUID values. - Diarization init: Wrapped model loading in explicit error handling with a clear log message on failure.
- Voice emotion baseline: Removed
voice_emotion_baselinepipeline metadata and associated tests (superseded by LAION EmoNet voice pipeline).
- Added JOSS cover letter (
paper/cover_letter.md). - Updated paper bibliography version to v1.4.2.
- Container/Devcontainer: Baked
hadolintinto Docker images and devcontainer so pre-commit hooks work reliably. - Dockerfiles: Added
git-lfsto CPU/GPU Dockerfiles for smoother model/asset workflows.
- Documentation: Consolidated the JOSS manuscript into
paper/paper.mdand replaceddocs/joss.mdwith a pointer to avoid divergence. - Repository Hygiene: Moved top-level helper scripts into organized subfolders under
scripts/and updated imports to thevideoannotator.*package namespace. - Entrypoints: Updated
api_server.pyto act as a compatibility wrapper; documentation now recommends using thevideoannotatorCLI. - README: Rationalized repeated setup/install instructions, fixed broken/non-links, and replaced hard-coded test/coverage claims with CI status.
- Docs: Standardized examples on the canonical API port
18011and corrected Docker run port mappings. - Docs: Replaced placeholder
docs/usage/accessing_results.mdwith a real results retrieval guide.
This release introduces a flexible storage system allowing artifact downloads and a robust database-backed authentication system.
- Flexible Storage: New artifact download capabilities, including source video retrieval.
- Authentication: Migrated from file-based to database-backed authentication for improved security and scalability.
- Artifacts API: New endpoint
GET /api/v1/jobs/{id}/artifactsto download job results as a ZIP archive.
- Artifact Downloads: Ensured source video files are included in the downloaded artifact ZIP.
This patch release focuses on critical performance fixes for the API and improving the developer experience in cloud environments (Codespaces).
- Critical Performance: Reduced
GET /api/v1/pipelinesresponse time from ~160s to <100ms by removing heavy module imports during listing. - Critical Performance: Removed 1-second blocking delay in
GET /api/v1/system/healthby optimizing CPU usage checks. - CORS: Fixed Cross-Origin Resource Sharing for development environments by correctly supporting wildcard origins with credentials.
- API Routing: Resolved timeouts and 307 Redirect loops caused by trailing slash inconsistencies in API routes.
- Pipeline Discovery: Fixed discovery issues for
face_laion_clipand reduced log spam. - Dev Container: Fixed build issues and normalized line endings for cross-platform compatibility.
- Storage: Fixed critical issues with video storage paths and cleanup logic.
- Documentation: Added
docs/development/CORS_AND_AUTH_PROTOCOL.mdfor frontend integration guidance. - CLI: Added
setup-dbcommand for streamlined database initialization.
This release addresses critical production blockers identified during client integration testing and establishes a solid foundation for JOSS publication.
🔧 Job Management & Concurrency Control
- Job cancellation API endpoint (
POST /api/v1/jobs/{id}/cancel) withCancellationManagerfor async task tracking CANCELLEDjob status with proper state machine transitionsMAX_CONCURRENT_JOBSenvironment variable (default: 2) with worker queue enforcement- Worker retry logic with exponential backoff
- Enhanced worker signal handling for graceful cancellation
- 24 comprehensive tests for cancellation (15 unit + 9 integration)
💾 Persistent Storage System
- Persistent storage implementation with
STORAGE_DIRenvironment variable (default:./storage) - Automatic directory structure creation (
uploads/,results/,temp/,logs/) - Storage cleanup module with configurable retention policies (
STORAGE_RETENTION_DAYS) - Dry-run mode and multiple safety checks to prevent data loss
- Audit logging for all storage operations
- 15 tests for storage paths and cleanup logic
✅ Configuration Validation
- Schema-based config validation using pipeline metadata
- Validation API endpoint (
POST /api/v1/pipelines/{name}/validate) - Field-level error messages with specific paths, types, and valid values
- Pre-flight validation integrated into job submission workflow
ConfigValidatorwith comprehensive validation logic- 49 tests (26 unit + 14 API + 9 job submission)
🔒 Security Hardening
- Secure-by-default configuration with
AUTH_REQUIRED=true - Automatic API key generation on first startup with database-backed token storage
videoannotator generate-tokenCLI command for additional API keys- CORS restrictions defaulting to
http://localhost:19011(configurable viaALLOWED_ORIGINS) - Frictionless CORS configuration for web client developers
- Security warnings logged on startup for insecure configurations
- Comprehensive security documentation suite (
docs/security/) - 15 tests (7 startup + 8 CORS)
📦 Package Namespace Migration
- Restructured to standard src layout (
src/videoannotator/) - Modern Python package structure following PEP 517/518 best practices
- All imports updated to
videoannotator.*namespace - Better test isolation and cleaner package boundaries
- Migration guide with automated migration script (
docs/UPGRADING_TO_v1.3.0.md) - 20 namespace tests (11 passing core functionality)
🏥 Enhanced Diagnostics & Health Monitoring
- Comprehensive diagnostic CLI commands:
videoannotator diagnose system(Python, FFmpeg, OS info)videoannotator diagnose gpu(CUDA, device info, memory)videoannotator diagnose storage(free space, write permissions)videoannotator diagnose database(connectivity, schema version)videoannotator diagnose all(combined report)
- Enhanced health endpoint (
/api/v1/system/health?detailed=true) with:- GPU compute capability detection and compatibility warnings
- Worker status and active job count
- Storage diagnostics with disk space warnings
- Database health checks
- Pipeline registry status
- ASCII-safe output with
--jsonflag for scripting - Exit codes: 0=pass, 1=errors, 2=warnings
- 15 diagnostic tests + 22 health endpoint tests
⚙️ Environment Configuration System
- Comprehensive environment variable configuration module (
src/videoannotator/config_env.py) - 19 configurable options including:
STORAGE_DIR,STORAGE_RETENTION_DAYSMAX_CONCURRENT_JOBSAUTH_REQUIRED,ALLOWED_ORIGINSRETRY_BASE_DELAY,RETRY_MAX_DELAY,RETRY_JITTER- Database, logging, and pipeline configuration
- Complete documentation at
docs/usage/environment_variables.md - Updated
.env.examplewith all options - 19 passing configuration tests
🐛 Critical Bug Fixes
- Fixed broken import paths causing "No pipeline classes available" errors
- Added missing pipeline metadata (speaker_diarization, speech_recognition, face_analysis, LAION voice)
- Fixed unit test and integration test imports to use videoannotator package paths
- Resolved pipeline name resolution failures
📊 API Enhancements
- Video metadata in job responses (filename, size, duration)
- Disabled trailing slash redirects for better API compatibility
- Job error messages exposed in API responses
- Standardized
ErrorEnvelopewith consistent structure across all endpoints:- Fields:
code,message,detail,hint,field,timestamp - Unified exception handlers (VideoAnnotatorException, APIError)
- 6 integration tests for error format consistency
- Fields:
📚 JOSS Publication Requirements
- Installation verification script (
scripts/verify_installation.py) with 30 tests- Progressive environment validation (Python, FFmpeg, imports, database, GPU, video processing)
- Platform detection (Linux, macOS, Windows, WSL2)
- ASCII-safe output with exit codes
- Test coverage validation system (
scripts/validate_coverage.py)- Module-specific thresholds: API (90%), pipelines (80%), database (85%), storage (85%)
- HTML and XML report generation
- Comprehensive documentation (
docs/testing/coverage_report.md)
- Enhanced API endpoint documentation
- Comprehensive docstrings with curl examples for all major endpoints
- Detailed request/response examples in Swagger UI
- Success and error response examples
- JOSS reviewer documentation
- Quick start guide (
docs/GETTING_STARTED_REVIEWERS.md) with <15 minute evaluation - Comprehensive troubleshooting guide (
docs/installation/troubleshooting.md) - Security configuration guide (
docs/security/)
- Quick start guide (
- Made
scripts/a proper Python package for cleaner imports
📖 Documentation Improvements
docs/UPGRADING_TO_v1.3.0.md- Complete migration guidedocs/archive/v1.3.0/V1.3.0_CLIENT_UPDATE.md- Client team integration notes (archived)docs/archive/2025/API_IMPROVEMENTS_2025-10-30.md- API enhancement details (archived)docs/archive/2025/CORS_IMPROVEMENTS_OCT2025.md- CORS configuration guide (archived)docs/archive/2025/CLIENT_TEAM_UPDATE.md- Updated client integration info (archived)docs/development/PRE_COMMIT_GUIDE.md- Pre-commit hook guidancedocs/development/scripts_inventory.md- Scripts audit and documentation- Enhanced
README.mdand getting started guides
- BREAKING: Package namespace changed to
videoannotator.*(migration guide provided) - BREAKING: Authentication now required by default (
AUTH_REQUIRED=true) - BREAKING: CORS restricted to localhost by default (
ALLOWED_ORIGINS=http://localhost:19011) - Default storage moved from
/tmpto./storagefor persistence - All curl examples in documentation updated with Authorization headers
- API version updated to 1.3.0-dev during development
- Pipeline registry validation and name resolution failures
- Import path issues preventing pipeline loading
- Data loss risk from ephemeral
/tmpstorage - Runaway jobs continuing after delete request
- Invalid configurations passing validation
- Inconsistent error formats across endpoints
- Test import errors across unit and integration tests
See docs/UPGRADING_TO_v1.3.0.md for detailed migration instructions including:
- Import path updates for
videoannotator.*namespace - Environment variable configuration
- API authentication setup
- Storage migration from temp to persistent directories
- Total: 234 tests passing across all modules
- Coverage: Meeting module-specific thresholds (80-90%)
- New Tests:
- 24 cancellation tests
- 49 validation tests
- 15 security tests
- 20 namespace tests
- 15 diagnostic tests
- 22 health endpoint tests
- 30 installation verification tests
- 19 configuration tests
- 10+ new documentation files
- Complete API documentation with examples
- Security configuration guide
- JOSS reviewer quick start
- Troubleshooting guide
- Migration guide
- Environment variables reference
Major Testing Infrastructure Enhancements
- Improved test suite from 607 passing (79.6%) to 720 passing (94.4%) - +113 tests fixed
- Created comprehensive test fixtures infrastructure:
- Real test media:
tests/fixtures/audio/test.wav(1.4MB speech audio) - Real test video:
tests/fixtures/video/test.mp4(825KB) - Fixtures documentation and recording guidelines
- Real test media:
- Fixed integration tests to use real audio instead of synthetic sine waves
- Installed ffmpeg system-wide and added to all Dockerfiles
- Updated conftest.py to prefer real media when available, fall back to synthetic for unit tests
Test Fixes
- Fixed 5 database permission tests (removed unnecessary skip decorators)
- Fixed 4 size_analysis config tests (updated to match actual implementation structure)
- Fixed 6 enhanced logging tests (removed emoji for Windows compatibility)
- Fixed 1 pipeline spec documentation test (namespace + regeneration)
- All integration tests now work with real media files
Test Infrastructure
- 18 legitimate skipped tests (external dependencies, future features)
- 25 remaining failures (complex integration tests, non-blocking)
- Exceeds 95% passing target (697) by 23 tests
Special thanks to the Video Annotation Viewer team for extensive integration testing that identified critical production issues addressed in this release.
- Uniform absolute import normalization across API, pipelines, storage, auth, exporters, and CLI to eliminate fragile
src.and relative (..) paths after previous layout adjustments. - CLI server invocation now targets
api.main:appdirectly (removing stalesrc.reference) improving reliability ofvideoannotator server. - Restored and merged accidentally truncated
docs/archive/development/roadmap_v1.3.0.mdcontent; added explicit "Package Layout Normalization" technical debt section without loss of prior feature timeline, risks, or metrics. - Updated Windows console output in version/dependency reporting to ASCII-safe tags only (reinforcing earlier 1.2.1 patch policy) – ensured no reintroduction of emojis in modified modules.
- Status annotations in the v1.2.1 roadmap marking tasks as COMPLETED / DEFERRED / PARTIAL to synchronize roadmap with actual delivered scope.
- Explicit release date + version bump in
src/version.pyfor 1.2.2. - Technical debt narrative enumerating upcoming packaging namespace migration (planned for v1.3.0) and associated deprecation shim strategy.
- Server startup failure (
ModuleNotFoundError: No module named 'src') caused by inconsistent import paths after flattening; all runtime imports now resolvable when installed in editable or built form. - Documentation integrity regression where large sections of v1.3.0 roadmap were temporarily overwritten; fully restored from history.
- No API surface changes. Downstream code referencing
src.prefixes should be updated to plain absolute module imports (e.g.from api.main import app). - Future v1.3.0 namespace migration will introduce
videoannotator.*package paths; current absolute imports chosen to minimize churn (deprecation shims will map old paths temporarily).
- Consolidated import approach reduces risk of duplicate module objects under mixed relative/absolute resolution, aiding forthcoming plugin/registry enhancements.
- Roadmap adjustments documented to prevent silent scope shrinkage in strategic planning artifacts.
- Smoke import test:
import api.main, pipelines.base_pipeline, exporters.native_formatssucceeds post-normalization. - API key optional validation behavior unchanged; 401 still returned only for explicitly invalid provided keys.
- Fully backward compatible at API & CLI command level; only internal import paths refactored. Any third-party code using undocumented relative imports must adjust.
- Establishes a clean, predictable import baseline before larger v1.3.0 restructuring (namespaced package, extras, plugin hooks) to reduce compounded technical debt.
- Pipeline Registry: YAML-driven pipeline metadata under
src/registry/metadata/dynamically exposed via/api/v1/pipelines(single source of truth). - Extended Taxonomy Fields:
pipeline_family,variant,tasks,modalities,capabilities,backends, optionalstabilityreplacing the former coarsecategoryconcept. - Auto-generated Pipeline Specification:
docs/pipelines_spec.mdproduced byscripts/generate_pipeline_specs.py(regenerate to update docs; diffs signal drift). - Emotion Output Format Specification: Standard segment-based JSON schema at
docs/specs/emotion_output_format.mdfor emotion-recognition task outputs. - New Pipelines Registered:
face_openface3_embedding,face_laion_clip,voice_emotion_baseline(with combined speech-transcription + emotion-recognition tasks). - CLI Enhancements:
videoannotator pipelinesnow supports--json,--detailed, and markdown table output. - API Enhancements:
/api/v1/pipelinesand/api/v1/pipelines/{name}now return full metadata includingdisplay_nameand all taxonomy arrays. - Standard Error Envelope: Introduced
APIErrorwith consistent JSON structure (error.code,error.message,error.hint) across pipeline + job endpoints. - Health Enrichment:
/api/v1/system/healthnow includes pipeline count, capped name list, uptime_seconds, and explicit embedded job queue status. - Error Handling Tests: Added unit test ensuring 404 pipeline detail uses standardized envelope.
- CLI Emotion Validation: Added
videoannotator validate-emotioncommand for schema checking.emotion.jsonoutputs. - Output Naming Conventions Spec: Canonical file naming patterns documented at
docs/specs/output_naming_conventions.md(frozen for v1.2.x). - Emotion Validator Utility: Lightweight schema validator in
src/validation/emotion_validator.pywith tests ensuring emotion JSON conformance.
- Deprecated Single
categoryField: Replaced by multi-dimensional taxonomy (no longer emitted in API; remove any downstream reliance on it). - Documentation Alignment: README and release notes now direct users to
/api/v1/pipelinesanddocs/pipelines_spec.mdinstead of hard-coded lists. - Canonical Discovery: All pipeline listings and attributes should be consumed from the API or generated spec, not ad hoc YAML enumeration in user code.
- CLI Versioning: CLI now derives version from single source
src/version.py(removed hardcoded API version strings). - OpenFace 3.0 Import Safety: Converted eager OpenFace imports to lazy loading in
openface3_pipelineto prevent argparse side-effects and enable test collection without OpenFace installed.
- If prior tooling referenced
category, map logic to one or more of:tasks,modalities, orpipeline_familydepending on intent. - Update any scripts that enumerated pipelines manually to call:
videoannotator pipelines --jsonfor stable machine parsing. - To regenerate the pipeline spec after adding/editing metadata: run the provided generation script (see header comments in
scripts/generate_pipeline_specs.py). - Emotion analysis consumers should validate outputs against the documented schema instead of reverse-engineering per-pipeline fields.
- These changes prepare the groundwork for richer capability/resource descriptors planned for v1.3.0 without introducing breaking runtime behaviors in existing pipelines.
- All additions are backward compatible except for removal of the legacy
categoryfield; no other API contracts changed.
Date: 2025-09-17 (post initial 1.2.1 feature merge)
Added:
- Optional legacy API key validation helper (
validate_optional_api_key) enforcing 401 on explicitly invalidva_style keys while preserving anonymous access for endpoints that allowed it.
Changed:
- Replaced runtime and test console emojis with ASCII tags (
[OK],[WARNING],[ERROR]) inversion.py,coco_validator.py, person tracking pipeline logging, and integration test prints for Windows console compatibility. - Injected conditional auth dependency into job endpoints (no behavior change for anonymous requests unless an invalid key is supplied).
Documentation:
- Appended "Technical Debt & Deferred Stabilization Items" section to
docs/archive/development/roadmap_v1.3.0.mdenumerating deferred heavier tasks (BatchStatus semantics, retry backoff policy, pipeline config defaults, synthetic video fixtures, storage lifecycle cleanup, Whisper CUDA fallback test adjustments, error envelope taxonomy, registry extensions, residual emoji cleanup, auth follow-up tests).
Testing / Validation:
- Targeted integration tests confirm: invalid API key now returns 401; anonymous job submission paths unaffected; no remaining emoji assumptions in modified tests.
Backward Compatibility:
- No breaking API changes; only invalid provided API keys now correctly rejected. Anonymous behavior unchanged where previously permitted.
Rationale:
- Scope intentionally limited to low-risk hardening and Windows-safe output formatting ahead of broader v1.3.0 feature work.
- 🎯 Modern FastAPI Server: Complete REST API with interactive documentation at
/docs - ⚡ Integrated Background Processing: Built-in job processing system - no separate worker processes needed
- 🛠️ Modern CLI Interface: Comprehensive
uv run videoannotatorcommand-line tools for server and job management - 📊 Real-time Job Status: Live job tracking with detailed progress updates and results retrieval
- 🔄 Async Job Processing: Handle multiple video processing jobs simultaneously
- 🌐 Cross-platform API: RESTful endpoints compatible with Python, JavaScript, R, and any HTTP client
- 🏗️ API-First Design: All pipelines accessible through standardized HTTP endpoints
- 📋 Job Management System: Complete job lifecycle with submit → status → results workflow
- 🔧 Configuration API: Validate and manage pipeline configurations via API
- 📁 File Management: Secure video upload, processing, and result file downloads
- 🔐 Authentication Ready: JWT token infrastructure for secure API access
- 📦 uv Package Manager: Migrated from pip to uv for 10x faster dependency management
- 🧹 Ruff Integration: Modern linting and formatting with Ruff (replaces Black, isort, flake8)
- 🐳 Fixed Docker Support: Resolved build issues with proper file copying and modern license formats
- 📖 DeepWiki Integration: Interactive documentation available at deepwiki.com/InfantLab/VideoAnnotator
# Submit video processing job
POST /api/v1/jobs/
# Monitor job status
GET /api/v1/jobs/{job_id}
# Retrieve detailed results
GET /api/v1/jobs/{job_id}/results
# Download specific pipeline outputs
GET /api/v1/jobs/{job_id}/results/files/{pipeline}# Health check and server info
GET /health
GET /api/v1/debug/server-info
# List available pipelines
GET /api/v1/pipelines
# Configuration validation
POST /api/v1/config/validate# Start integrated API server
uv run videoannotator server --port 8000
# Job management via CLI
uv run videoannotator job submit video.mp4 --pipelines scene,person,face
uv run videoannotator job status <job_id>
uv run videoannotator job results <job_id>
uv run videoannotator job list --status completed
# System information
uv run videoannotator info
uv run videoannotator pipelines --detailed- 📖 Complete Documentation Refresh: Updated all docs for v1.2.0 with modern API patterns
- 🧭 Navigation System: Added consistent navigation bars across all documentation files
- 🎮 Interactive Examples: Updated demo_commands.md with modern CLI and API usage patterns
- 🔗 Cross-references: Fixed all internal documentation links with proper relative paths
- 📋 API Reference: Complete API documentation with request/response examples
- Replaced: Old
python demo.pypatterns → Modernuv run videoannotatorCLI - Updated: Direct pipeline usage → API-first architecture examples
- Enhanced: Configuration examples with modern YAML structure
- Improved: Getting started guide with 30-second setup process
- ⚡ Fast Package Management: uv provides 10-100x faster dependency resolution
- 🧹 Unified Tooling: Single Ruff command replaces multiple linting/formatting tools
- 🏗️ Modern Build System: Updated pyproject.toml with modern license format and dependency groups
- 🐳 Container Optimization: Fixed Docker builds with proper source file copying
- 🔄 Integrated Processing: Background job processing runs within API server process
- 📊 Status Tracking: Real-time job status updates with detailed pipeline progress
- 🗄️ Database Integration: SQLite-based job storage with full CRUD operations
- 🔐 Security Framework: JWT authentication ready for production deployment
- CLI Interface: Legacy
python demo.pyreplaced withuv run videoannotatorcommands - Configuration: Updated to API-first workflow - direct pipeline usage now for development only
- Dependencies: Requires uv package manager for optimal performance
# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh # Linux/Mac
powershell -c "irm https://astral.sh/uv/install.ps1 | iex" # Windows
# Update existing installation
uv sync # Fast dependency installation
uv sync --extra dev # Include development dependencies
# Start using modern API server
uv run videoannotator server # Replaces old direct processing- ✅ Pipeline Architecture: All pipelines remain fully functional with same output formats
- ✅ Configuration Files: Existing YAML configs work with new API system
- ✅ Output Formats: JSON schemas unchanged - existing analysis code continues working
- ✅ Docker Support: Updated containers with same functionality
- 🚀 Single Command Startup:
uv run videoannotator serverstarts complete system - 📊 Health Monitoring: Built-in health endpoints for system monitoring
- 🔄 Graceful Shutdowns: Proper cleanup of background processes and resources
- 📱 API Documentation: Auto-generated OpenAPI/Swagger documentation
- 🐳 Container Support: Fixed Docker builds for both CPU and GPU deployment
- ⚡ Fast Startup: Models load on-demand, reducing initial startup time
- 🔄 Concurrent Processing: Handle multiple video jobs simultaneously
- 💾 Resource Management: Proper cleanup prevents memory leaks
- 🛡️ Error Recovery: Robust error handling with detailed status reporting
- ✅ Comprehensive API Testing: Full test coverage for job management and processing workflows
- ✅ Integration Testing: End-to-end tests with real video processing
- ✅ Docker Validation: Verified container builds and deployments
- ✅ Documentation Accuracy: All examples tested and validated for v1.2.0
- 🧹 Modern Code Quality: Ruff-based linting and formatting with consistent style
- 📋 Type Safety: Maintained mypy type checking across codebase
- 📊 Test Coverage: High test coverage maintained across API and processing layers
- PyTorch Meta Tensor Errors: Fixed "Cannot copy out of meta tensor" errors in face analysis and audio pipelines by implementing proper
to_empty()fallback handling - Person Pipeline Model Corruption: Added robust error recovery for "'Conv' object has no attribute 'bn'" errors with automatic model reinitialization
- Batch Processing Stability: Enhanced error handling and recovery mechanisms across all pipelines
- Logging System: Suppressed verbose debug output from ByteTracker, YOLO, and numba for cleaner batch processing logs
- Performance Optimization: Pre-initialize all pipelines during setup instead of lazy loading for each video, significantly improving batch processing speed
- GPU Memory Management: Added proper cleanup methods with CUDA cache clearing and resource management
- Error Recovery: Implemented automatic model reinitialization when corruption is detected during processing
- Pipeline Initialization: Models now load once during VideoAnnotator initialization rather than per-video for better batch performance
- Memory Management: Added destructor and cleanup methods to prevent GPU memory leaks during batch processing
- PersonIdentityManager for consistent person identification across pipelines
- Automatic labeling system with size-based and spatial heuristics for person role detection
- Face-to-person linking across all face analysis pipelines using IoU matching
- Person identity configuration via
configs/person_identity.yaml - Comprehensive test suite for person identity functionality in
tests/test_phase2_integration.py - Command-line tools for person labeling and validation in
scripts/directory
- OpenFace 3.0 pipeline with comprehensive facial behavior analysis
- 98-point facial landmarks (2D and 3D coordinates)
- Facial Action Units (AUs) intensity and presence detection
- Head pose estimation with rotation and translation parameters
- Gaze direction tracking and eye movement analysis
- COCO format output for annotation tool compatibility
- Demo scripts showcasing full OpenFace 3.0 capabilities
- LAION Face pipeline with CLIP-based face analysis and emotion detection
- LAION Voice pipeline with advanced voice emotion recognition
- 40+ emotion categories for comprehensive emotional analysis
- Multimodal emotion analysis combining face and voice modalities
- High-precision embeddings for research applications
- All face analysis pipelines now support person identity linking
- Person tracking pipeline exports consistent person IDs in COCO format
- Cross-pipeline data sharing through standardized person tracks files
- COCO format compliance with industry-standard annotation fields
- Configuration system extended with person identity settings
- Testing framework enhanced with integration and performance tests
- Documentation consolidation: PersonID phase completion files merged into main documentation
- File organization: Legacy backup files and duplicates removed
- Test structure: All tests properly organized in
tests/directory with pytest framework
- Legacy file cleanup: Removed backup files and duplicates (
speech_pipeline_backup.py, etc.) - Documentation consistency: Updated all docs to reflect current implementation status
- Test organization: Moved standalone test files to proper test directory structure
- Initial release of modernized VideoAnnotator
- Complete pipeline architecture implementation
- Comprehensive documentation and examples
- Full testing suite with unit, integration, and performance tests
- Docker support for development and production
- CI/CD pipeline with automated testing and deployment
- Basic video annotation capabilities
- Jupyter notebook examples
- Initial audio processing features
- Improved video processing performance
- Updated dependencies
- Various bug fixes and stability improvements
- Face detection and analysis
- Person tracking capabilities
- Data visualization tools
- Refactored code organization
- Updated documentation
- Memory usage optimization
- Cross-platform compatibility
- Initial project structure
- Basic video processing
- Scene detection capabilities
- Audio extraction
- Data annotation framework
The v1.0.0 release introduces significant architectural changes. Here's how to migrate:
Old (v0.x):
# Direct pipeline initialization
from src.processors.video_processor import VideoProcessor
processor = VideoProcessor(config_dict)New (v1.0.0):
# Modern pipeline architecture
from src.pipelines import SceneDetectionPipeline
pipeline = SceneDetectionPipeline(config)Old:
# Direct method calls
results = processor.process_video(video_path)New:
# Standardized pipeline interface
results = pipeline.process(video_path, start_time=0, end_time=None)Old:
# Python dictionary configuration
config = {
'video_settings': {'fps': 30},
'audio_settings': {'sample_rate': 16000}
}New:
# YAML configuration
video:
fps: 30
audio:
sample_rate: 16000Old:
python process_video.py --video video.mp4 --output output/New:
python main.py --input video.mp4 --output output/ --config configs/default.yaml- Pipeline Architecture: Complete rewrite of processing pipelines
- Configuration System: Moved from Python dictionaries to YAML files
- CLI Interface: New unified command-line interface
- Output Formats: Standardized output schemas
- Dependencies: Updated to modern ML libraries
- Legacy processor classes will be removed in v2.0.0
- Python dictionary configuration deprecated in favor of YAML
- Old CLI scripts will be removed in v2.0.0
- Update Dependencies:
pip install -r requirements.txt - Convert Configuration: Use new YAML format
- Update Code: Migrate to new pipeline architecture
- Test Integration: Run comprehensive tests
- Update Documentation: Review API changes
For technical specifications, see the Pipeline Specs.
Special thanks to all contributors who helped shape VideoAnnotator:
- Development Team - Core architecture and implementation
- Research Team - Algorithm development and optimization
- Documentation Team - Comprehensive documentation and examples
- Bug reports and feature requests
- Code contributions and improvements
- Documentation improvements
- Testing and validation
This project builds upon the excellent work of:
- BabyJokes - Original research foundation
- Open source computer vision and machine learning communities
- Contributors to the libraries and tools we depend on
For more information about releases and changes, see the GitHub Releases page.