Skip to content

Commit 6629003

Browse files
InfantLabclaude
andcommitted
docs: finalize JOSS paper draft, bibliography, and submission checklist
Add paper/checklist.md tracking JOSS submission requirements. Update paper.md and paper.bib with final references and formatting. Update CITATION.cff with author ORCIDs, README with badges/structure, and devcontainer.json configuration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent b24b007 commit 6629003

6 files changed

Lines changed: 329 additions & 129 deletions

File tree

.devcontainer/devcontainer.json

Lines changed: 64 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,66 @@
11
{
2-
"name": "VideoAnnotator (GPU)",
3-
"build": {
4-
"dockerfile": "../Dockerfile.gpu",
5-
"context": "..",
6-
"args": {
7-
"SKIP_IMAGE_UV_SYNC": "false",
8-
"SKIP_TORCH_INSTALL": "false"
9-
}
10-
},
11-
"runArgs": [
12-
"--gpus",
13-
"all"
14-
],
15-
"features": {},
16-
"forwardPorts": [
17-
18011,
18-
18012,
19-
18013,
20-
18014,
21-
18015,
22-
19011,
23-
19012,
24-
19013,
25-
19014,
26-
19015
27-
],
28-
"portsAttributes": {
29-
"18011": {
30-
"label": "VideoAnnotator API (default)",
31-
"onAutoForward": "notify"
32-
},
33-
"19011": {
34-
"label": "Video Annotation Viewer",
35-
"onAutoForward": "notify"
36-
}
37-
},
38-
"postCreateCommand": "uv sync && uv sync --extra dev && HADOLINT_DEST_DIR=/usr/local/bin bash scripts/install_hadolint.sh && uv run pre-commit install",
39-
"containerEnv": {
40-
"UV_LINK_MODE": "copy"
41-
},
42-
"customizations": {
43-
"vscode": {
44-
"settings": {
45-
"python.defaultInterpreterPath": ".venv/bin/python",
46-
"python.formatting.provider": "none",
47-
"[python]": {
48-
"editor.defaultFormatter": "astral-sh.ruff",
49-
"editor.formatOnSave": true,
50-
"editor.codeActionsOnSave": {
51-
"source.organizeImports": true
52-
}
53-
}
54-
},
55-
"extensions": [
56-
"astral-sh.ruff",
57-
"ms-python.python",
58-
"ms-python.vscode-pylance",
59-
"GitHub.copilot-chat"
60-
]
61-
}
62-
}
2+
"name": "VideoAnnotator (GPU)",
3+
"build": {
4+
"dockerfile": "../Dockerfile.gpu",
5+
"context": "..",
6+
"args": {
7+
"SKIP_IMAGE_UV_SYNC": "true",
8+
"SKIP_TORCH_INSTALL": "true"
9+
}
10+
},
11+
"runArgs": [
12+
"--gpus",
13+
"all"
14+
],
15+
"features": {
16+
17+
},
18+
"forwardPorts": [
19+
18011,
20+
18012,
21+
18013,
22+
18014,
23+
18015,
24+
19011,
25+
19012,
26+
19013,
27+
19014,
28+
19015
29+
],
30+
"portsAttributes": {
31+
"18011": {
32+
"label": "VideoAnnotator API (default)",
33+
"onAutoForward": "notify"
34+
},
35+
"19011": {
36+
"label": "Video Annotation Viewer",
37+
"onAutoForward": "notify"
38+
}
39+
},
40+
"postCreateCommand": "uv sync \u0026\u0026 uv sync --extra dev \u0026\u0026 HADOLINT_DEST_DIR=/usr/local/bin bash scripts/install_hadolint.sh \u0026\u0026 uv run pre-commit install",
41+
"containerEnv": {
42+
"UV_LINK_MODE": "copy"
43+
},
44+
"customizations": {
45+
"vscode": {
46+
"settings": {
47+
"python.defaultInterpreterPath": ".venv/bin/python",
48+
"python.formatting.provider": "none",
49+
"[python]": {
50+
"editor.defaultFormatter": "astral-sh.ruff",
51+
"editor.formatOnSave": true,
52+
"editor.codeActionsOnSave": {
53+
"source.organizeImports": true
54+
}
55+
}
56+
},
57+
"extensions": [
58+
"astral-sh.ruff",
59+
"ms-python.python",
60+
"ms-python.vscode-pylance",
61+
"GitHub.copilot-chat"
62+
]
63+
}
64+
},
65+
"shutdownAction": "stopContainer"
6366
}

CITATION.cff

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,18 +9,23 @@ authors:
99
- family-names: Ishaya
1010
given-names: Jeremiah
1111
affiliation: "Institute for Life Course Health Research (ILCHR), Stellenbosch University"
12+
orcid: "https://orcid.org/0000-0002-9014-9372"
1213
- family-names: Uwerikowe
1314
given-names: Irene
1415
affiliation: "Institute for Life Course Health Research (ILCHR), Stellenbosch University"
16+
orcid: "https://orcid.org/0000-0002-1293-7349"
1517
- family-names: Stamate
1618
given-names: Daniel
1719
affiliation: "Department of Computing, Goldsmiths, University of London"
20+
orcid: "https://orcid.org/0000-0001-8565-6890"
1821
- family-names: Lachman
1922
given-names: Jamie
2023
affiliation: "Department of Social Policy and Intervention (DISP), University of Oxford"
24+
orcid: "https://orcid.org/0000-0001-9475-9218"
2125
- family-names: Tomlinson
2226
given-names: Mark
2327
affiliation: "Institute for Life Course Health Research (ILCHR), Stellenbosch University"
28+
orcid: "https://orcid.org/0000-0001-5846-3444"
2429
license: "MIT"
2530
repository-code: "https://github.com/InfantLab/VideoAnnotator"
2631
version: "1.4.1"

README.md

Lines changed: 32 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -110,29 +110,35 @@ Additional Specs:
110110
- CLI Validation: `uv run videoannotator validate-emotion path/to/file.emotion.json` returns non-zero exit on failure
111111
Client tools (e.g. the Video Annotation Viewer) should rely on those sources or the `/api/v1/pipelines` endpoint rather than hard-coding pipeline assumptions.
112112

113-
### **Person Tracking Pipeline**
113+
### Person Tracking (1 pipeline)
114114

115-
- **Technology**: YOLO11 + ByteTrack multi-object tracking
116-
- **Outputs**: Bounding boxes, pose keypoints, persistent person IDs
117-
- **Use cases**: Movement analysis, social interaction tracking, activity recognition
115+
| Pipeline | Technology | Outputs | Stability |
116+
|----------|-----------|---------|-----------|
117+
| **Person Tracking & Pose** | YOLO11 + ByteTrack | COCO bounding boxes, 17-point pose keypoints, persistent person IDs | beta |
118118

119-
### **Face Analysis Pipeline**
119+
### Face Analysis (3 pipelines)
120120

121-
- **Technology**: [OpenFace 3.0](https://github.com/CMU-MultiComp-Lab/OpenFace-3.0), LAION Face ([LAION](https://laion.ai/)), OpenCV backends
122-
- **Outputs**: 68-point landmarks, emotions, action units, gaze direction, head pose
123-
- **Use cases**: Emotional analysis, attention tracking, facial expression studies
121+
| Pipeline | Technology | Outputs | Stability |
122+
|----------|-----------|---------|-----------|
123+
| **Face Analysis** | DeepFace (TensorFlow/OpenCV) | Emotion labels, age/gender, action units | stable |
124+
| **LAION CLIP Face Embedding** | LAION CLIP-derived model | 512-D semantic embeddings, zero-shot attribute & emotion tagging | experimental |
125+
| **OpenFace3 Face Embedding** | OpenFace 3.0 (ONNX/PyTorch) | 512-D face embeddings for recognition or clustering | experimental |
124126

125-
### **Scene Detection Pipeline**
127+
### Scene Detection (1 pipeline)
126128

127-
- **Technology**: PySceneDetect + CLIP environment classification
128-
- **Outputs**: Scene boundaries, environment labels, temporal segmentation
129-
- **Use cases**: Context analysis, setting classification, behavioral context
129+
| Pipeline | Technology | Outputs | Stability |
130+
|----------|-----------|---------|-----------|
131+
| **Scene Detection** | PySceneDetect + CLIP | Scene boundaries, environment classification, temporal segmentation | beta |
130132

131-
### **Audio Processing Pipeline**
133+
### Audio Processing (4 pipelines + 1 combined)
132134

133-
- **Technology**: OpenAI Whisper + pyannote speaker diarization
134-
- **Outputs**: Speech transcripts, speaker identification, voice emotions
135-
- **Use cases**: Conversation analysis, language development, vocal behavior
135+
| Pipeline | Technology | Outputs | Stability |
136+
|----------|-----------|---------|-----------|
137+
| **Speech Recognition** | OpenAI Whisper | WebVTT transcripts with word-level timestamps | stable |
138+
| **Speaker Diarization** | pyannote.audio | RTTM speaker turns with timestamps | stable |
139+
| **Audio Processing** | Whisper + pyannote (combined) | WebVTT transcripts + RTTM speaker turns | beta |
140+
| **LAION Empathic Voice** | LAION Empathic Insight + Whisper embeddings | Emotion segments, empathic scores, emotion timeline | stable |
141+
| **Voice Emotion Baseline** | Spectral CNN over Whisper embeddings | _(planned — not yet implemented)_ | experimental |
136142

137143
## 💡 Why VideoAnnotator?
138144

@@ -275,8 +281,11 @@ docker run -p 18011:18011 --gpus all videoannotator:dev
275281

276282
- **FastAPI** - High-performance REST API with automatic documentation
277283
- **YOLO11** - State-of-the-art object detection and pose estimation
278-
- **OpenFace 3.0** - Comprehensive facial behavior analysis
284+
- **DeepFace / OpenFace 3.0 / LAION CLIP** - Facial analysis, embeddings, and emotion recognition
279285
- **Whisper** - Robust speech recognition and transcription
286+
- **pyannote.audio** - Speaker diarization and segmentation
287+
- **LAION Empathic Insight** - Voice emotion analysis from Whisper embeddings
288+
- **PySceneDetect + CLIP** - Scene boundary detection and environment classification
280289
- **PyTorch** - GPU-accelerated machine learning inference
281290

282291
### **Performance Characteristics**
@@ -340,9 +349,13 @@ MIT License - Full terms in [LICENSE](LICENSE)
340349

341350
Built with and grateful to:
342351

343-
- **[YOLO & Ultralytics](https://ultralytics.com/)** - Object detection and tracking
344-
- **[OpenFace 3.0](https://github.com/CMU-MultiComp-Lab/OpenFace-3.0)** - Facial behavior analysis
352+
- **[YOLO & Ultralytics](https://ultralytics.com/)** - Object detection, tracking, and pose estimation
353+
- **[DeepFace](https://github.com/serengil/deepface)** - Face detection and emotion recognition
354+
- **[OpenFace 3.0](https://github.com/CMU-MultiComp-Lab/OpenFace-3.0)** - Facial behavior analysis and embeddings
355+
- **[LAION](https://laion.ai/)** - CLIP face embeddings and empathic voice emotion models
345356
- **[OpenAI Whisper](https://github.com/openai/whisper)** - Speech recognition
357+
- **[pyannote.audio](https://github.com/pyannote/pyannote-audio)** - Speaker diarization
358+
- **[PySceneDetect](https://www.scenedetect.com/)** - Scene boundary detection
346359
- **[FastAPI](https://github.com/tiangolo/fastapi)** - Modern web framework
347360
- **[PyTorch](https://pytorch.org/)** - Machine learning infrastructure
348361

paper/checklist.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# JOSS Submission Checklist for VideoAnnotator
2+
3+
Based on https://joss.readthedocs.io/en/latest/submitting.html and
4+
https://joss.readthedocs.io/en/latest/review_checklist.html (checked 2026-02-27).
5+
6+
---
7+
8+
## Paper (paper.md)
9+
10+
- [x] Word count within 750–1750 range (~1255 words)
11+
- [x] Summary section — describes all 10 pipelines across 4 modalities
12+
- [x] Statement of need section
13+
- [x] State of the field section — covers ELAN, Datavyu, DeepLabCut, YOLO, Py-Feat, OpenFace, openSMILE, PySceneDetect, pyannote
14+
- [x] Software design section — four architectural layers with trade-offs
15+
- [x] Research impact statement section — GPI/Stellenbosch/Oxford context, pilot corpus
16+
- [x] AI usage disclosure section — Copilot and Claude; code/test/docs scope; human-review assertion
17+
- [x] Quality control section
18+
- [x] Statement of limitations section
19+
- [x] Acknowledgements with funding
20+
- [x] YAML frontmatter (title, tags, authors, affiliations, date, bibliography)
21+
- [x] Corresponding author marked (`corresponding: true`)
22+
- [x] All 6 authors have ORCIDs
23+
24+
## Bibliography (paper.bib)
25+
26+
- [x] Correct entry types (`@inproceedings`, `@article`, `@software`, `@book`)
27+
- [x] DOIs present where available
28+
- [x] Full venue names (not abbreviated)
29+
- [x] 14 references covering all cited tools and upstream models
30+
- [x] Companion project (Video Annotation Viewer) cited
31+
32+
## Repository & Metadata
33+
34+
- [x] OSI-approved license (MIT) with plain-text LICENSE file
35+
- [x] Open repository on GitHub (InfantLab/VideoAnnotator)
36+
- [x] Public issue tracker (GitHub Issues)
37+
- [x] 6+ months public history (Sep 2023 – present, 2.5 years, 236 commits)
38+
- [x] Multiple releases (v0.5alpha through v1.4.1, 8 tags)
39+
- [x] CITATION.cff present, matches paper metadata, all ORCIDs included
40+
- [x] Dependency management (pyproject.toml + uv.lock)
41+
42+
## Documentation
43+
44+
- [x] Installation docs (uv, Docker, DevContainer)
45+
- [x] Usage examples (examples/ directory, 7+ scripts)
46+
- [x] API and usage documentation (docs/ directory)
47+
- [x] README with badges, quick start, architecture, pipeline tables
48+
- [x] Docker support (CPU/GPU/Dev Dockerfiles + compose)
49+
50+
## Testing & CI
51+
52+
- [x] Automated tests (pytest, 64 test files, ~94% pass rate)
53+
- [x] CI/CD pipeline (GitHub Actions: test matrix, lint, type check, security scan, build)
54+
55+
## Community Guidelines
56+
57+
- [x] CONTRIBUTING.md (476 lines — setup, style, PR process, release process)
58+
- [x] CODE_OF_CONDUCT.md (Contributor Covenant v2.1 + research ethics)
59+
- [x] SECURITY.md (disclosure process, response timeline, compliance)
60+
61+
---
62+
63+
## Before submission
64+
65+
- [x] v1.4.1 git tag created locally
66+
- [ ] Push tag to remote: `git push origin v1.4.1`
67+
- [ ] Create GitHub Release for v1.4.1
68+
- [ ] Improve git contributor attribution (only 1 contributor visible in history)
69+
70+
## After acceptance
71+
72+
- [ ] Create a Zenodo archive and obtain a DOI for the archived version
73+
- [ ] Update the JOSS review issue with the version number and archive DOI
74+
- [ ] Ensure the GitHub Release matches the tagged version

0 commit comments

Comments
 (0)