Skip to content

Commit 27c0f98

Browse files
committed
Transcribing a single video works
0 parents  commit 27c0f98

7 files changed

Lines changed: 517 additions & 0 deletions

File tree

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
.venv/
2+
__pycache__/
3+
*.py[cod]
4+
dist/
5+
*.egg-info/
6+
.env
7+
uv.lock

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.11

README.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# yt-transcribe
2+
3+
CLI tool that extracts transcripts from YouTube videos. Fetches existing captions when available; otherwise downloads audio and transcribes locally with [Whisper](https://github.com/SYSTRAN/faster-whisper).
4+
5+
## Requirements
6+
7+
- Python 3.11+
8+
- [uv](https://docs.astral.sh/uv/)`curl -LsSf https://astral.sh/uv/install.sh | sh`
9+
- [just](https://just.systems/)`brew install just`
10+
11+
## Setup
12+
13+
```bash
14+
cd transcribe
15+
just install
16+
```
17+
18+
## Usage
19+
20+
```bash
21+
# Fetch captions if available, otherwise run Whisper — saves to current dir by default
22+
just run "https://youtube.com/watch?v=VIDEO_ID"
23+
24+
# Save to a specific file
25+
just run "https://youtube.com/watch?v=VIDEO_ID" --output transcript.txt
26+
27+
# Print to stdout (all status output suppressed — safe to pipe)
28+
just run "https://youtube.com/watch?v=VIDEO_ID" --print
29+
just run "https://youtube.com/watch?v=VIDEO_ID" --print | pbcopy
30+
31+
# Force Whisper even if captions exist
32+
just run "https://youtube.com/watch?v=VIDEO_ID" --force-whisper
33+
34+
# Use a specific Whisper model
35+
just run "https://youtube.com/watch?v=VIDEO_ID" --model large-v3
36+
37+
# Force language (auto-detected by default)
38+
just run "https://youtube.com/watch?v=VIDEO_ID" --language de
39+
40+
# See all model options
41+
just models
42+
```
43+
44+
## Config
45+
46+
Defaults are stored in `~/.config/yt-transcribe/config.toml`. On first run with `just config --edit` the file is created with all options documented inline.
47+
48+
```bash
49+
just config # show config file path
50+
just config --show # print current config
51+
just config --edit # open in $EDITOR
52+
```
53+
54+
Default config:
55+
56+
```toml
57+
[defaults]
58+
model = "turbo" # tiny | base | small | medium | turbo | large-v3
59+
language = "" # empty = auto-detect per video
60+
output_dir = "" # if set, transcripts are auto-saved here (uses video title as filename)
61+
output_extension = "txt"
62+
63+
[whisper]
64+
device = "cpu" # cpu | cuda (use cuda if you have a GPU)
65+
compute_type = "int8" # int8 (fast CPU) | float16 (GPU) | float32 (precise)
66+
beam_size = 5 # higher = more accurate, slower (1–10)
67+
vad_filter = true # skip silent segments (recommended)
68+
```
69+
70+
**`output_dir`** — when set, every transcription is auto-saved to `<output_dir>/<video title>.<output_extension>` without needing `--output`. Useful for batch use.
71+
72+
## Options
73+
74+
| Flag | Short | Default | Description |
75+
|---|---|---|---|
76+
| `--print` | `-p` | off | Print to stdout instead of saving; suppresses all status output (safe to pipe) |
77+
| `--output` | `-o` || Save to this exact path (overrides `output_dir` in config) |
78+
| `--model` | `-m` | from config | Whisper model size |
79+
| `--language` | `-l` | auto-detect | Override language, e.g. `en`, `fr`, `de`. Omit to auto-detect — useful only when detection gets it wrong or the video has mixed-language content. |
80+
| `--force-whisper` | `-w` | off | Skip caption lookup, always use Whisper |
81+
82+
By default the transcript is **saved to a file** — to `output_dir` from config if set, otherwise to the current directory, using the video title as the filename. Use `--print` to get stdout behaviour instead.
83+
84+
CLI flags always override config values.
85+
86+
## Whisper models
87+
88+
Model weights are downloaded from HuggingFace on first use and cached at `~/.cache/huggingface/hub/`. Subsequent runs use the cached copy — no re-download. Override the location with the `HF_HUB_CACHE` environment variable.
89+
90+
| Model | Size | Speed | Accuracy |
91+
|---|---|---|---|
92+
| `tiny` | ~75 MB | fastest | lowest |
93+
| `base` | ~140 MB | fast | decent |
94+
| `small` | ~460 MB | moderate | good |
95+
| `medium` | ~1.5 GB | slow | better |
96+
| `turbo` | ~800 MB | fast | best for size — **default** |
97+
| `large-v3` | ~3 GB | slowest | highest |
98+
99+
## How it works
100+
101+
```
102+
┌─────────────────────┐
103+
│ YouTube URL / ID │
104+
└──────────┬──────────┘
105+
│ extract video ID
106+
107+
┌────────────────────────┐
108+
│ Fetch YouTube captions │ ◄── youtube-transcript-api
109+
│ (any available lang) │ prefers manual over auto-generated
110+
└────────────┬───────────┘
111+
112+
┌───────────────┴──────────────┐
113+
│ Captions found? │
114+
▼ ▼
115+
Yes: done No: fallback
116+
117+
┌──────▼──────┐
118+
│ Download │ ◄── yt-dlp
119+
│ audio │ best quality stream
120+
└──────┬──────┘
121+
122+
┌──────▼──────┐
123+
│ Whisper │ ◄── faster-whisper
124+
│ transcribe │ CTranslate2, CPU/GPU
125+
└──────┬──────┘
126+
│ audio deleted from temp dir
127+
128+
┌─────────────────────────────┐
129+
│ --output / output_dir / stdout │
130+
└─────────────────────────────┘
131+
```
132+
133+
### Caption lookup
134+
135+
Uses [`youtube-transcript-api`](https://github.com/jdepoix/youtube-transcript-api) to fetch captions from YouTube's internal API — no audio download needed, instant. Prefers manually uploaded captions over auto-generated ones. Falls back to Whisper when:
136+
137+
- The video owner disabled captions
138+
- The video is too new for auto-generation to finish
139+
- YouTube rate-limits the request
140+
141+
### Whisper transcription
142+
143+
1. **Download**`yt-dlp` fetches the best available audio stream to a temp directory.
144+
2. **Transcribe**`faster-whisper` runs the Whisper model with `vad_filter=True` to skip silent segments. Language is auto-detected unless overridden.
145+
3. **Cleanup** — temp audio file deleted automatically.
146+
147+
`faster-whisper` uses [CTranslate2](https://github.com/OpenNMT/CTranslate2) under the hood — 4–8× faster than OpenAI's original Whisper on CPU using `int8` quantization.
148+
149+
## Dependencies
150+
151+
| Package | Purpose |
152+
|---|---|
153+
| [`youtube-transcript-api`](https://github.com/jdepoix/youtube-transcript-api) | Fetch YouTube captions |
154+
| [`yt-dlp`](https://github.com/yt-dlp/yt-dlp) | Download audio |
155+
| [`faster-whisper`](https://github.com/SYSTRAN/faster-whisper) | Local speech-to-text |
156+
| [`typer`](https://typer.tiangolo.com/) | CLI framework |
157+
| [`rich`](https://rich.readthedocs.io/) | Terminal output |

justfile

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Install dependencies and set up the venv
2+
install:
3+
uv sync
4+
5+
# Transcribe a YouTube video
6+
# Usage: just run <url> [--model turbo] [--force-whisper] [--output file.txt]
7+
[positional-arguments]
8+
run *args:
9+
@uv run transcribe run "$@"
10+
11+
# Show, edit, or init the config file
12+
# Usage: just config [--edit] [--show]
13+
[positional-arguments]
14+
config *args:
15+
uv run transcribe config "$@"
16+
17+
# List available Whisper models
18+
models:
19+
@echo "tiny (~75MB) fastest, lowest accuracy"
20+
@echo "base (~140MB) fast, decent"
21+
@echo "small (~460MB) balanced"
22+
@echo "medium (~1.5GB) good"
23+
@echo "turbo (~800MB) fast + accurate [default]"
24+
@echo "large-v3 (~3GB) highest accuracy"

pyproject.toml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
[project]
2+
name = "yt-transcribe"
3+
version = "0.1.0"
4+
description = "YouTube video transcription CLI"
5+
requires-python = ">=3.11"
6+
dependencies = [
7+
"youtube-transcript-api>=1.2.4",
8+
"yt-dlp",
9+
"faster-whisper>=1.2.0",
10+
"typer>=0.12.0",
11+
"rich>=13.0.0",
12+
]
13+
14+
[project.scripts]
15+
transcribe = "transcribe.cli:app"
16+
17+
[build-system]
18+
requires = ["hatchling"]
19+
build-backend = "hatchling.build"
20+
21+
[tool.hatch.build.targets.wheel]
22+
packages = ["src/transcribe"]

0 commit comments

Comments
 (0)