You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+68-33Lines changed: 68 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
1
# yt-transcribe
2
2
3
-
CLI tool that extracts transcripts from YouTube videos. Fetches existing captions when available; otherwise downloads audio and transcribes locally with [Whisper](https://github.com/SYSTRAN/faster-whisper).
3
+
CLI tool that transcribes YouTube videos and local audio/video files. For YouTube, it fetches existing captions when available; otherwise downloads audio and transcribes locally with [Whisper](https://github.com/SYSTRAN/faster-whisper). Local files always go through Whisper directly.
4
4
5
5
## Requirements
6
6
7
7
- Python 3.11+
8
8
-[uv](https://docs.astral.sh/uv/)
9
9
-[just](https://just.systems/)
10
-
-[ffmpeg](https://ffmpeg.org/) — required when Whisper is used (no captions available)
10
+
-[ffmpeg](https://ffmpeg.org/) — required by `yt-dlp`when downloading YouTube audio (not needed for local files, which are decoded by faster-whisper's bundled FFmpeg)
11
11
12
12
### macOS
13
13
@@ -43,7 +43,7 @@ Then restart your terminal so the new PATH entries take effect.
43
43
> **Note:**`faster-whisper` requires the [Microsoft Visual C++ Redistributable](https://aka.ms/vs/17/release/vc_redist.x64.exe) (x64). Install it if you see a DLL error on first Whisper run.
44
44
45
45
## Setup
46
-
46
+
""
47
47
```bash
48
48
cd transcribe
49
49
just install
@@ -52,6 +52,11 @@ just install
52
52
## Usage
53
53
54
54
```bash
55
+
# Transcribe a local audio or video file (any format FFmpeg supports)
56
+
just run recording.mp3
57
+
just run /path/to/interview.mp4
58
+
just run ~/Downloads/lecture.m4a
59
+
55
60
# Fetch captions if available, otherwise run Whisper — saves to ~/Downloads by default
|`--output`|`-o`| — | Save to this exact path (overrides `output_dir` in config) |
122
127
|`--model`|`-m`| from config | Whisper model size |
123
128
|`--language`|`-l`| auto-detect | Override language, e.g. `en`, `fr`, `de`. Omit to auto-detect — useful only when detection gets it wrong or the video has mixed-language content. |
124
-
|`--force-whisper`|`-w`| off | Skip caption lookup, always use Whisper |
129
+
|`--force-whisper`|`-w`| off | Skip caption lookup, always use Whisper (ignored for local files — Whisper is always used) |
125
130
126
131
By default the transcript is **saved to `~/Downloads`** using the video title as the filename. Change `output_dir` in config to save elsewhere. Use `--print` to get stdout behaviour instead.
127
132
128
133
CLI flags always override config values.
129
134
135
+
## Whisper performance
136
+
137
+
Measured on a 4m 52s audio clip (Polish speech, CPU, `int8`):
138
+
139
+
| Model | Elapsed | Seconds per audio-minute | Time for 1h video |
140
+
|---|---|---|---|
141
+
|`tiny`| 15.4s | 3.2s |~3 min |
142
+
|`base`| 25.4s | 5.2s |~5 min |
143
+
|`small`| 70.1s | 14.4s |~14 min |
144
+
|`turbo`| 90.8s | 18.7s |~19 min |
145
+
146
+
`medium` and `large-v3` skipped — extrapolate the trend.
147
+
148
+
-`tiny` is nearly 6× faster than `turbo` on CPU
149
+
-`turbo` is the default because it trades that speed for much better accuracy — especially on non-English audio
150
+
- For quick drafts or batch jobs where accuracy matters less, `base` is a good middle ground
151
+
130
152
## Whisper models
131
153
132
154
Model weights are downloaded from HuggingFace on first use and cached at `~/.cache/huggingface/hub/` (macOS/Linux) or `%USERPROFILE%\.cache\huggingface\hub\` (Windows). Subsequent runs use the cached copy — no re-download. Override with the `HF_HUB_CACHE` environment variable.
@@ -143,35 +165,37 @@ Model weights are downloaded from HuggingFace on first use and cached at `~/.cac
- The video is too new for auto-generation to finish
183
207
- YouTube rate-limits the request
184
208
209
+
### Supported file formats
210
+
211
+
faster-whisper uses [PyAV](https://github.com/PyAV-Org/PyAV) (bundled FFmpeg) to decode audio, so any format FFmpeg handles is accepted — no system ffmpeg installation required for local files.
0 commit comments