You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
3
+
This file provides guidance to Claude Code when working with code in this repository.
4
4
5
5
## Commands
6
6
7
7
```bash
8
8
just install # create venv and install dependencies via uv
9
-
just run <url># transcribe a YouTube URL or bare video ID
9
+
just run <source># transcribe a YouTube URL, video ID, or local file
10
10
just config --show # print current config
11
11
just config --edit # open config in $EDITOR
12
12
just models # list available Whisper model sizes
13
+
just test# unit tests (no network)
14
+
just smoke # integration tests (requires network)
13
15
```
14
16
15
-
There are no tests and no linter configured.
17
+
## Source files
16
18
17
-
## Architecture
19
+
-**`src/transcribe/cli.py`** — all CLI logic via Typer. Two commands: `run` and `config`. `run` routes to `_transcribe_youtube` or `_run_whisper` directly for local files.
20
+
-**`src/transcribe/config.py`** — loads `~/.config/yt-transcribe/config.toml` via `platformdirs`. Merges user TOML over hardcoded `_DEFAULTS`.
18
21
19
-
Two source files under `src/transcribe/`:
22
+
## Documentation
20
23
21
-
-**`cli.py`** — all CLI logic via Typer. Two commands: `run` (transcribe) and `config` (show/edit config file). The `run` command tries YouTube captions first (`fetch_youtube_captions`), falls back to Whisper (`transcribe_with_whisper`). Output path resolution order: `--output` flag → `output_dir` in config → `~/Downloads`.
22
-
23
-
-**`config.py`** — loads `~/.config/yt-transcribe/config.toml` (path determined by `platformdirs.user_config_dir`). Merges user TOML over hardcoded `_DEFAULTS`. Creates the file with documented defaults on `config --edit` if it doesn't exist yet.
-`faster-whisper` and `yt-dlp` are lazy imports — only loaded when captions are unavailable, so caption-only runs have no heavy dependency startup cost.
28
-
- Audio is downloaded to `tempfile.TemporaryDirectory()` and deleted automatically after transcription.
-`--print` suppresses all Rich console output and writes only the transcript to stdout — safe for piping.
31
-
- Config path is platform-specific via `platformdirs`: `~/Library/Application Support/yt-transcribe/` on macOS, `~/.config/yt-transcribe/` on Linux, `%LOCALAPPDATA%\yt-transcribe\` on Windows.
29
+
-`faster-whisper` and `yt-dlp` are lazy imports — only loaded when captions are unavailable.
30
+
- Audio is downloaded to `tempfile.TemporaryDirectory()` and deleted automatically.
31
+
-`unique_path()` appends `(1)`, `(2)`, … to avoid overwriting existing files.
32
+
-`--print` suppresses all Rich output and writes only the transcript to stdout.
33
+
- Output path resolution: `--output` flag → `output_dir` in config → `~/Downloads`.
34
+
-`YOUTUBE_COOKIES_FILE` env var enables authenticated caption/download requests.
CLI tool that transcribes YouTube videos and local audio/video files. For YouTube, it fetches existing captions when available; otherwise downloads audio and transcribes locally with [Whisper](https://github.com/SYSTRAN/faster-whisper). Local files always go through Whisper directly.
3
+
CLI tool that transcribes YouTube videos and local audio/video files. For YouTube, it fetches existing captions when available; otherwise downloads audio and transcribes locally with [Whisper](https://github.com/SYSTRAN/faster-whisper).
4
4
5
5
## Requirements
6
6
7
7
- Python 3.11+
8
8
-[uv](https://docs.astral.sh/uv/)
9
9
-[just](https://just.systems/)
10
-
-[ffmpeg](https://ffmpeg.org/) — required by `yt-dlp`when downloading YouTube audio (not needed for local files, which are decoded by faster-whisper's bundled FFmpeg)
10
+
-[ffmpeg](https://ffmpeg.org/) — required when downloading YouTube audio (not needed for local files)
For other distros replace `apt install ffmpeg` with your package manager (`dnf`, `pacman`, etc.). The `just` binary can also be downloaded from its [GitHub releases](https://github.com/casey/just/releases).
32
-
33
26
### Windows
34
27
35
28
```powershell
@@ -38,200 +31,31 @@ winget install Casey.Just
38
31
winget install ffmpeg
39
32
```
40
33
41
-
Then restart your terminal so the new PATH entries take effect.
42
-
43
34
> **Note:**`faster-whisper` requires the [Microsoft Visual C++ Redistributable](https://aka.ms/vs/17/release/vc_redist.x64.exe) (x64). Install it if you see a DLL error on first Whisper run.
44
35
45
36
## Setup
46
-
""
37
+
47
38
```bash
48
39
cd transcribe
49
40
just install
50
41
```
51
42
52
-
## Usage
43
+
## Quick start
53
44
54
45
```bash
55
-
# Transcribe a local audio or video file (any format FFmpeg supports)
56
-
just run recording.mp3
57
-
just run /path/to/interview.mp4
58
-
just run ~/Downloads/lecture.m4a
59
-
60
-
# Fetch captions if available, otherwise run Whisper — saves to ~/Downloads by default
46
+
# Transcribe a YouTube video (captions if available, Whisper otherwise)
61
47
just run "https://youtube.com/watch?v=VIDEO_ID"
62
48
63
-
#Save to a specific file
64
-
just run "https://youtube.com/watch?v=VIDEO_ID" --output transcript.txt
49
+
#Transcribe a local file
50
+
just run recording.mp3
65
51
66
-
# Print to stdout (all status output suppressed — safe to pipe)
52
+
# Print to stdout (safe to pipe)
67
53
just run "https://youtube.com/watch?v=VIDEO_ID" --print
68
-
69
-
# Copy to clipboard (macOS: pbcopy, Linux: xclip, Windows: clip)
70
-
just run "https://youtube.com/watch?v=VIDEO_ID" --print | pbcopy
71
-
just run "https://youtube.com/watch?v=VIDEO_ID" --print | xclip -selection clipboard
72
-
just run "https://youtube.com/watch?v=VIDEO_ID" --print | clip
73
-
74
-
# Force Whisper even if captions exist
75
-
just run "https://youtube.com/watch?v=VIDEO_ID" --force-whisper
76
-
77
-
# Use a specific Whisper model
78
-
just run "https://youtube.com/watch?v=VIDEO_ID" --model large-v3
79
-
80
-
# Force language (auto-detected by default)
81
-
just run "https://youtube.com/watch?v=VIDEO_ID" --language de
82
-
83
-
# See all model options
84
-
just models
85
54
```
86
55
87
-
## Config
88
-
89
-
Defaults are stored in a platform-specific config file:
**`output_dir`** — when set, every transcription is auto-saved to `<output_dir>/<video title>.<output_extension>` without needing `--output`. Useful for batch use. Supports `~` expansion.
120
-
121
-
## Options
122
-
123
-
| Flag | Short | Default | Description |
124
-
|---|---|---|---|
125
-
|`--print`|`-p`| off | Print to stdout instead of saving; suppresses all status output (safe to pipe) |
126
-
|`--output`|`-o`| — | Save to this exact path (overrides `output_dir` in config) |
127
-
|`--model`|`-m`| from config | Whisper model size |
128
-
|`--language`|`-l`| auto-detect | Override language, e.g. `en`, `fr`, `de`. Omit to auto-detect — useful only when detection gets it wrong or the video has mixed-language content. |
129
-
|`--force-whisper`|`-w`| off | Skip caption lookup, always use Whisper (ignored for local files — Whisper is always used) |
130
-
131
-
By default the transcript is **saved to `~/Downloads`** using the video title as the filename. Change `output_dir` in config to save elsewhere. Use `--print` to get stdout behaviour instead.
132
-
133
-
CLI flags always override config values.
134
-
135
-
## Whisper performance
136
-
137
-
Measured on a 4m 52s audio clip (Polish speech, CPU, `int8`):
138
-
139
-
| Model | Elapsed | Seconds per audio-minute | Time for 1h video |
140
-
|---|---|---|---|
141
-
|`tiny`| 15.4s | 3.2s |~3 min |
142
-
|`base`| 25.4s | 5.2s |~5 min |
143
-
|`small`| 70.1s | 14.4s |~14 min |
144
-
|`turbo`| 90.8s | 18.7s |~19 min |
145
-
146
-
`medium` and `large-v3` skipped — extrapolate the trend.
147
-
148
-
-`tiny` is nearly 6× faster than `turbo` on CPU
149
-
-`turbo` is the default because it trades that speed for much better accuracy — especially on non-English audio
150
-
- For quick drafts or batch jobs where accuracy matters less, `base` is a good middle ground
151
-
152
-
## Whisper models
153
-
154
-
Model weights are downloaded from HuggingFace on first use and cached at `~/.cache/huggingface/hub/` (macOS/Linux) or `%USERPROFILE%\.cache\huggingface\hub\` (Windows). Subsequent runs use the cached copy — no re-download. Override with the `HF_HUB_CACHE` environment variable.
155
-
156
-
| Model | Size | Speed | Accuracy |
157
-
|---|---|---|---|
158
-
|`tiny`|~75 MB | fastest | lowest |
159
-
|`base`|~140 MB | fast | decent |
160
-
|`small`|~460 MB | moderate | good |
161
-
|`medium`|~1.5 GB | slow | better |
162
-
|`turbo`|~800 MB | fast | best for size — **default**|
│ (any available lang) │ prefers manual over auto-generated
176
-
└────────────┬───────────┘ │
177
-
│ │
178
-
┌──────────────┴─────────────┐ │
179
-
│ Captions found? │ │
180
-
▼ ▼ │
181
-
Yes: done No: fallback │
182
-
│ │
183
-
┌──────▼──────┐ │
184
-
│ Download │ ◄── yt-dlp + ffmpeg
185
-
│ audio │ best quality stream
186
-
└──────┬──────┘ │
187
-
│ │
188
-
└──────────┬──────────┘
189
-
│
190
-
┌──────▼──────┐
191
-
│ Whisper │ ◄── faster-whisper
192
-
│ transcribe │ CTranslate2, CPU/GPU
193
-
└──────┬──────┘
194
-
│ temp audio deleted (YouTube only)
195
-
▼
196
-
┌─────────────────────────────┐
197
-
│ --output / output_dir / stdout │
198
-
└─────────────────────────────┘
199
-
```
200
-
201
-
### Caption lookup
202
-
203
-
Uses [`youtube-transcript-api`](https://github.com/jdepoix/youtube-transcript-api) to fetch captions from YouTube's internal API — no audio download needed, instant. Prefers manually uploaded captions over auto-generated ones. Falls back to Whisper when:
204
-
205
-
- The video owner disabled captions
206
-
- The video is too new for auto-generation to finish
207
-
- YouTube rate-limits the request
208
-
209
-
### Supported file formats
210
-
211
-
faster-whisper uses [PyAV](https://github.com/PyAV-Org/PyAV) (bundled FFmpeg) to decode audio, so any format FFmpeg handles is accepted — no system ffmpeg installation required for local files.
| Video (audio extracted) |`.mp4``.mkv``.mov``.avi``.webm``.ts`|
219
-
220
-
### Whisper transcription
221
-
222
-
1.**Download** — `yt-dlp` fetches the best available audio stream to a temp directory.
223
-
2.**Transcribe** — `faster-whisper` runs the Whisper model with `vad_filter=True` to skip silent segments. Language is auto-detected unless overridden.
`faster-whisper` uses [CTranslate2](https://github.com/OpenNMT/CTranslate2) under the hood — 4–8× faster than OpenAI's original Whisper on CPU using `int8` quantization.
56
+
Transcripts are saved to `~/Downloads` by default using the video title as the filename.
0 commit comments