Skip to content

Commit 0fa884b

Browse files
committed
docs(archive): record transcription statistics and processing metrics
1 parent c3b462a commit 0fa884b

1 file changed

Lines changed: 21 additions & 0 deletions

File tree

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Transcription Archive Statistics (May 2026)
2+
3+
This document captures a snapshot of the archive's transcription metrics during the implementation of the asynchronous `zdots-ctx` + `whisper.cpp` pipeline.
4+
5+
## ⏱️ Video Runtime vs. Processing Time (The Hardware)
6+
Based on logs from the background worker running the `max-accuracy` (Large v3) Whisper model on an Apple M4 GPU:
7+
* **Sample Video Duration:** 865.6 seconds (~14.5 minutes).
8+
* **Processing Time:** Finished transcription and diarization in 213.6 seconds (~3.5 minutes).
9+
* **The Ratio:** The system transcribes audio at roughly **4x real-time speed**. For every 4 minutes of video, it takes 1 minute to generate a high-accuracy text record.
10+
11+
## 📚 Video Duration vs. Transcript Length (The Content)
12+
Calculated across the `_data` models before the final batch of 40 videos completed:
13+
* **Known Transcribed Video Duration:** 16 hours, 42 minutes (across the 94 videos that have exact durations logged).
14+
* **Total Transcript Length:** **247,896 words** across 159 transcript files.
15+
* **The Density Ratio:** On average, interviews generate **~247 words per minute of video**.
16+
17+
*Perspective:* The archive is currently sitting at roughly the equivalent of **4 to 5 full-length non-fiction books** worth of high-density technical conversation. Once the final batch of 40 videos finishes processing, the archive is expected to cross the 300,000-word mark.
18+
19+
## Future Milestones
20+
- Re-review previously transcribed interviews (legacy transcripts) for quality control and phonetic accuracy against the newer Whisper v3 baseline.
21+
- Verify the completion and ingestion of the SCMC and general archive videos.

0 commit comments

Comments
 (0)