Skip to content

Commit 013634a

Browse files
InfantLabclaude
andcommitted
release: v1.4.2 — JOSS review version
Migrate CLIP to open_clip (LAION-2B ViT-B-32), update deprecated HuggingFace auth parameters, harden GUID handling, remove superseded voice_emotion_baseline pipeline, and add JOSS cover letter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent d24bae7 commit 013634a

14 files changed

Lines changed: 97 additions & 103 deletions

File tree

.devcontainer/devcontainer.json

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -16,24 +16,11 @@
1616

1717
},
1818
"forwardPorts": [
19-
18011,
20-
18012,
21-
18013,
22-
18014,
23-
18015,
24-
19011,
25-
19012,
26-
19013,
27-
19014,
28-
19015
19+
18011
2920
],
3021
"portsAttributes": {
3122
"18011": {
32-
"label": "VideoAnnotator API (default)",
33-
"onAutoForward": "notify"
34-
},
35-
"19011": {
36-
"label": "Video Annotation Viewer",
23+
"label": "VideoAnnotator API",
3724
"onAutoForward": "notify"
3825
}
3926
},

CHANGELOG.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1515
- Benchmark results and performance validation
1616
- Additional contributor documentation improvements
1717

18+
## [1.4.2] - 2026-03-04
19+
20+
### JOSS Review Version
21+
22+
This release accompanies the JOSS submission of VideoAnnotator and its companion project Video Annotation Viewer.
23+
24+
#### Changed
25+
26+
- **CLIP migration**: Migrated scene-classification pipeline from `clip` to `open_clip`, using the LAION-2B pretrained `ViT-B-32` model for improved availability and reproducibility.
27+
- **HuggingFace auth**: Updated diarization and Whisper pipelines to use the current `token` parameter instead of the deprecated `use_auth_token`.
28+
- **Devcontainer**: Simplified forwarded-port list to the single default API port (18011).
29+
30+
#### Fixed
31+
32+
- **Database GUID handling**: Added defensive `try/except` in the `GUID` type decorator to gracefully handle malformed UUID values.
33+
- **Diarization init**: Wrapped model loading in explicit error handling with a clear log message on failure.
34+
35+
#### Removed
36+
37+
- **Voice emotion baseline**: Removed `voice_emotion_baseline` pipeline metadata and associated tests (superseded by LAION EmoNet voice pipeline).
38+
39+
#### Documentation
40+
41+
- Added JOSS cover letter (`paper/cover_letter.md`).
42+
- Updated paper bibliography version to v1.4.2.
43+
1844
## [1.4.1] - 2025-12-26
1945

2046
### Release Quality, Docs, and Developer Experience

CITATION.cff

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,5 +28,5 @@ authors:
2828
orcid: "https://orcid.org/0000-0001-5846-3444"
2929
license: "MIT"
3030
repository-code: "https://github.com/InfantLab/VideoAnnotator"
31-
version: "1.4.1"
32-
date-released: "2025-12-18"
31+
version: "1.4.2"
32+
date-released: "2026-03-04"

paper/cover_letter.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
**Cover letter — JOSS submission**
2+
**VideoAnnotator: an extensible, reproducible toolkit for automated video annotation in behavioral research**
3+
4+
Dear Editor,
5+
6+
We are pleased to submit *VideoAnnotator* for consideration by the Journal of Open Source Software. This is an open-source Python toolkit that provides a unified, locally deployed framework for automated multi-modal video annotation — covering person tracking, facial analysis, scene detection, and audio processing — aimed at behavioral, social, and health researchers.
7+
8+
We believe the submission addresses the JOSS review criteria as follows:
9+
10+
- **Open license**: The software is released under the MIT license.
11+
- **Repository and archival**: The source is hosted on GitHub at [InfantLab/VideoAnnotator](https://github.com/InfantLab/VideoAnnotator) and we will generate a versioned Zenodo DOI upon acceptance.
12+
- **Contribution and community guidelines**: The repository includes `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, issue templates, and `CITATION.cff`.
13+
- **Automated tests and CI**: A pytest suite of 74 test files (unit, integration, and performance) runs via GitHub Actions on Ubuntu, Windows, and macOS with Python 3.12, alongside ruff linting, mypy type-checking, Trivy security scanning, and Codecov reporting.
14+
- **Functionality documentation**: Full API documentation and usage guides are provided in the repository and rendered online.
15+
- **Statement of need and state of the field**: The paper includes both sections, positioning VideoAnnotator relative to existing tools (ELAN, Datavyu, DeepLabCut, Py-Feat, OpenFace, openSMILE, PySceneDetect, pyannote) and explaining the gap it fills.
16+
- **References**: All key upstream models and comparable tools are cited in the bibliography.
17+
- **Research application**: The paper includes a research-impact statement describing current use at Stellenbosch University and the University of Oxford for caregiver–child interaction studies under the Global Parenting Initiative.
18+
- **AI disclosure**: Included per JOSS policy.
19+
20+
We would also like to note that a parallel JOSS submission is being prepared for the companion project, **Video Annotation Viewer** ([InfantLab/video-annotation-viewer](https://github.com/InfantLab/video-annotation-viewer)), which provides the interactive browser-based interface for reviewing and validating VideoAnnotator outputs. The two packages are designed to work together but are independently installable and have distinct codebases.
21+
22+
This is our first submission to JOSS, so we very much appreciate any guidance you can offer throughout the review process. We are happy to address any feedback promptly.
23+
24+
Thank you for your time and consideration.
25+
26+
Sincerely,
27+
28+
Caspar Addyman (corresponding author), Jeremiah Ishaya, Irene Uwerikowe, Daniel Stamate, Jamie Lachman, and Mark Tomlinson

paper/paper.bib

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ @misc{videoannotator
33
author = {Addyman, Caspar and Ishaya, Jeremiah and Uwerikowe, Irene and Stamate, Daniel and Lachman, Jamie and Tomlinson, Mark},
44
year = {2026},
55
howpublished = {\url{https://github.com/InfantLab/VideoAnnotator}},
6-
note = {Version v1.4.1}
6+
note = {Version v1.4.2}
77
}
88

99
@inproceedings{openface3,
@@ -113,7 +113,7 @@ @software{viewer
113113
author = {Addyman, Caspar and Uwerikowe, Irene and Ishaya, Jeremiah and Stamate, Daniel and Lachman, Jamie and Tomlinson, Mark},
114114
year = {2026},
115115
url = {https://github.com/InfantLab/video-annotation-viewer},
116-
version = {0.4.2}
116+
version = {0.6.2}
117117
}
118118

119119
@book{observer,

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "videoannotator"
7-
version = "1.4.1"
7+
version = "1.4.2"
88
description = "A modern, modular toolkit for analyzing, processing, and visualizing human interaction videos"
99
readme = "README.md"
1010
license = "MIT"

src/videoannotator/database/models.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,11 @@ def process_result_value(self, value, dialect):
5353
"""Convert database values back into UUID instances."""
5454
if value is None:
5555
return value
56-
else:
57-
if not isinstance(value, uuid.UUID):
58-
return uuid.UUID(value)
56+
if isinstance(value, uuid.UUID):
57+
return value
58+
try:
59+
return uuid.UUID(value)
60+
except (ValueError, AttributeError):
5961
return value
6062

6163

src/videoannotator/pipelines/audio_processing/diarization_pipeline.py

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -65,14 +65,18 @@ def initialize(self) -> None:
6565

6666
self.logger.info(f"Loading diarization model: {self.config['model']}")
6767

68-
if hf_token:
69-
self.diarization_model = PyAnnotePipeline.from_pretrained(
70-
self.config["model"], use_auth_token=hf_token
71-
)
72-
else:
73-
self.diarization_model = PyAnnotePipeline.from_pretrained(
74-
self.config["model"]
75-
)
68+
try:
69+
if hf_token:
70+
self.diarization_model = PyAnnotePipeline.from_pretrained(
71+
self.config["model"], token=hf_token
72+
)
73+
else:
74+
self.diarization_model = PyAnnotePipeline.from_pretrained(
75+
self.config["model"]
76+
)
77+
except Exception as e:
78+
self.logger.error(f"Failed to load diarization model: {e}")
79+
raise
7680

7781
self.is_initialized = True
7882
self.logger.info("DiarizationPipeline initialized")

src/videoannotator/pipelines/audio_processing/whisper_base_pipeline.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,7 @@ def _load_hf_whisper_model(
255255
# Load processor
256256
processor_kwargs = {"cache_dir": cache_dir}
257257
if auth_token:
258-
processor_kwargs["use_auth_token"] = auth_token
258+
processor_kwargs["token"] = auth_token
259259

260260
self.whisper_processor = WhisperProcessor.from_pretrained(
261261
model_id, **processor_kwargs
@@ -264,7 +264,7 @@ def _load_hf_whisper_model(
264264
# Load model
265265
model_kwargs = {"cache_dir": cache_dir}
266266
if auth_token:
267-
model_kwargs["use_auth_token"] = auth_token
267+
model_kwargs["token"] = auth_token
268268

269269
# Add FP16 if requested and on GPU
270270
if self.config.get("use_fp16", True) and self.device.type == "cuda":

src/videoannotator/pipelines/scene_detection/scene_pipeline.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
SCENEDETECT_AVAILABLE = False
3030

3131
try:
32-
import clip
32+
import open_clip
3333
import torch
3434

3535
CLIP_AVAILABLE = True
@@ -59,7 +59,7 @@ def __init__(self, config: dict[str, Any] | None = None):
5959
"office",
6060
"playground",
6161
],
62-
"clip_model": "ViT-B/32",
62+
"clip_model": "ViT-B-32",
6363
"use_gpu": True,
6464
"keyframe_extraction": "middle", # Extract keyframe from middle of scene
6565
}
@@ -70,6 +70,7 @@ def __init__(self, config: dict[str, Any] | None = None):
7070
self.logger = logging.getLogger(__name__)
7171
self.clip_model = None
7272
self.clip_preprocess = None
73+
self.clip_tokenizer = None
7374
self.device = None
7475

7576
def process(
@@ -230,7 +231,7 @@ def _classify_scenes(
230231

231232
# Prepare text prompts
232233
text_prompts = [f"a {prompt}" for prompt in self.config["scene_prompts"]]
233-
text = clip.tokenize(text_prompts).to(self.device)
234+
text = self.clip_tokenizer(text_prompts).to(self.device)
234235

235236
classified_segments = []
236237
cap = cv2.VideoCapture(video_path)
@@ -312,9 +313,14 @@ def _initialize_clip(self):
312313
self.device = (
313314
"cuda" if self.config["use_gpu"] and torch.cuda.is_available() else "cpu"
314315
)
315-
self.clip_model, self.clip_preprocess = clip.load(
316-
self.config["clip_model"], device=self.device
316+
self.clip_model, _, self.clip_preprocess = (
317+
open_clip.create_model_and_transforms(
318+
self.config["clip_model"],
319+
pretrained="laion2b_s34b_b79k",
320+
device=self.device,
321+
)
317322
)
323+
self.clip_tokenizer = open_clip.get_tokenizer(self.config["clip_model"])
318324
self.logger.info(
319325
f"CLIP model loaded: {self.config['clip_model']} on {self.device}"
320326
)
@@ -417,6 +423,7 @@ def cleanup(self) -> None:
417423

418424
self.clip_model = None
419425
self.clip_preprocess = None
426+
self.clip_tokenizer = None
420427
self.device = None
421428
self.is_initialized = False
422429
self.logger.info("Scene Detection Pipeline cleaned up")

0 commit comments

Comments
 (0)