Skip to content

Commit ec8e000

Browse files
committed
Docs refactored, docker test for linux
1 parent d0c2eb2 commit ec8e000

17 files changed

Lines changed: 555 additions & 254 deletions

CLAUDE.md

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,36 @@
11
# CLAUDE.md
22

3-
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
3+
This file provides guidance to Claude Code when working with code in this repository.
44

55
## Commands
66

77
```bash
88
just install # create venv and install dependencies via uv
9-
just run <url> # transcribe a YouTube URL or bare video ID
9+
just run <source> # transcribe a YouTube URL, video ID, or local file
1010
just config --show # print current config
1111
just config --edit # open config in $EDITOR
1212
just models # list available Whisper model sizes
13+
just test # unit tests (no network)
14+
just smoke # integration tests (requires network)
1315
```
1416

15-
There are no tests and no linter configured.
17+
## Source files
1618

17-
## Architecture
19+
- **`src/transcribe/cli.py`** — all CLI logic via Typer. Two commands: `run` and `config`. `run` routes to `_transcribe_youtube` or `_run_whisper` directly for local files.
20+
- **`src/transcribe/config.py`** — loads `~/.config/yt-transcribe/config.toml` via `platformdirs`. Merges user TOML over hardcoded `_DEFAULTS`.
1821

19-
Two source files under `src/transcribe/`:
22+
## Documentation
2023

21-
- **`cli.py`** — all CLI logic via Typer. Two commands: `run` (transcribe) and `config` (show/edit config file). The `run` command tries YouTube captions first (`fetch_youtube_captions`), falls back to Whisper (`transcribe_with_whisper`). Output path resolution order: `--output` flag → `output_dir` in config → `~/Downloads`.
22-
23-
- **`config.py`** — loads `~/.config/yt-transcribe/config.toml` (path determined by `platformdirs.user_config_dir`). Merges user TOML over hardcoded `_DEFAULTS`. Creates the file with documented defaults on `config --edit` if it doesn't exist yet.
24+
- [docs/USER_MANUAL.md](docs/USER_MANUAL.md) — usage, options, config reference
25+
- [docs/TECHNICAL.md](docs/TECHNICAL.md) — architecture, flow, performance, dependencies
2426

2527
## Key behaviours
2628

27-
- `faster-whisper` and `yt-dlp` are lazy imports — only loaded when captions are unavailable, so caption-only runs have no heavy dependency startup cost.
28-
- Audio is downloaded to `tempfile.TemporaryDirectory()` and deleted automatically after transcription.
29-
- `unique_path()` appends `(1)`, `(2)`, … to avoid silently overwriting existing files.
30-
- `--print` suppresses all Rich console output and writes only the transcript to stdout — safe for piping.
31-
- Config path is platform-specific via `platformdirs`: `~/Library/Application Support/yt-transcribe/` on macOS, `~/.config/yt-transcribe/` on Linux, `%LOCALAPPDATA%\yt-transcribe\` on Windows.
29+
- `faster-whisper` and `yt-dlp` are lazy imports — only loaded when captions are unavailable.
30+
- Audio is downloaded to `tempfile.TemporaryDirectory()` and deleted automatically.
31+
- `unique_path()` appends `(1)`, `(2)`, … to avoid overwriting existing files.
32+
- `--print` suppresses all Rich output and writes only the transcript to stdout.
33+
- Output path resolution: `--output` flag → `output_dir` in config → `~/Downloads`.
34+
- `YOUTUBE_COOKIES_FILE` env var enables authenticated caption/download requests.
35+
36+
## No linter or formatter configured.

Dockerfile.smoke

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
ARG PYTHON_VERSION=3.13
2+
FROM python:${PYTHON_VERSION}-slim
3+
4+
# Install uv
5+
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /usr/local/bin/
6+
7+
# Install git (needed to clone)
8+
RUN apt-get update && apt-get install -y --no-install-recommends git && rm -rf /var/lib/apt/lists/*
9+
10+
WORKDIR /app
11+
12+
# Clone the repository
13+
RUN git clone https://github.com/virtuecoder/transcribe.git .
14+
15+
# Install dependencies (including dev group for pytest)
16+
RUN uv sync --group dev
17+
18+
# Run smoke tests by default
19+
CMD ["uv", "run", "pytest", "tests/test_smoke.py", "-m", "network", "-v"]

README.md

Lines changed: 12 additions & 188 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
# yt-transcribe
22

3-
CLI tool that transcribes YouTube videos and local audio/video files. For YouTube, it fetches existing captions when available; otherwise downloads audio and transcribes locally with [Whisper](https://github.com/SYSTRAN/faster-whisper). Local files always go through Whisper directly.
3+
CLI tool that transcribes YouTube videos and local audio/video files. For YouTube, it fetches existing captions when available; otherwise downloads audio and transcribes locally with [Whisper](https://github.com/SYSTRAN/faster-whisper).
44

55
## Requirements
66

77
- Python 3.11+
88
- [uv](https://docs.astral.sh/uv/)
99
- [just](https://just.systems/)
10-
- [ffmpeg](https://ffmpeg.org/) — required by `yt-dlp` when downloading YouTube audio (not needed for local files, which are decoded by faster-whisper's bundled FFmpeg)
10+
- [ffmpeg](https://ffmpeg.org/) — required when downloading YouTube audio (not needed for local files)
1111

1212
### macOS
1313

@@ -18,18 +18,11 @@ brew install uv just ffmpeg
1818
### Linux (Debian/Ubuntu)
1919

2020
```bash
21-
# uv
2221
curl -LsSf https://astral.sh/uv/install.sh | sh
23-
24-
# just
2522
curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash -s -- --to /usr/local/bin
26-
27-
# ffmpeg
2823
sudo apt install ffmpeg
2924
```
3025

31-
For other distros replace `apt install ffmpeg` with your package manager (`dnf`, `pacman`, etc.). The `just` binary can also be downloaded from its [GitHub releases](https://github.com/casey/just/releases).
32-
3326
### Windows
3427

3528
```powershell
@@ -38,200 +31,31 @@ winget install Casey.Just
3831
winget install ffmpeg
3932
```
4033

41-
Then restart your terminal so the new PATH entries take effect.
42-
4334
> **Note:** `faster-whisper` requires the [Microsoft Visual C++ Redistributable](https://aka.ms/vs/17/release/vc_redist.x64.exe) (x64). Install it if you see a DLL error on first Whisper run.
4435
4536
## Setup
46-
""
37+
4738
```bash
4839
cd transcribe
4940
just install
5041
```
5142

52-
## Usage
43+
## Quick start
5344

5445
```bash
55-
# Transcribe a local audio or video file (any format FFmpeg supports)
56-
just run recording.mp3
57-
just run /path/to/interview.mp4
58-
just run ~/Downloads/lecture.m4a
59-
60-
# Fetch captions if available, otherwise run Whisper — saves to ~/Downloads by default
46+
# Transcribe a YouTube video (captions if available, Whisper otherwise)
6147
just run "https://youtube.com/watch?v=VIDEO_ID"
6248

63-
# Save to a specific file
64-
just run "https://youtube.com/watch?v=VIDEO_ID" --output transcript.txt
49+
# Transcribe a local file
50+
just run recording.mp3
6551

66-
# Print to stdout (all status output suppressed — safe to pipe)
52+
# Print to stdout (safe to pipe)
6753
just run "https://youtube.com/watch?v=VIDEO_ID" --print
68-
69-
# Copy to clipboard (macOS: pbcopy, Linux: xclip, Windows: clip)
70-
just run "https://youtube.com/watch?v=VIDEO_ID" --print | pbcopy
71-
just run "https://youtube.com/watch?v=VIDEO_ID" --print | xclip -selection clipboard
72-
just run "https://youtube.com/watch?v=VIDEO_ID" --print | clip
73-
74-
# Force Whisper even if captions exist
75-
just run "https://youtube.com/watch?v=VIDEO_ID" --force-whisper
76-
77-
# Use a specific Whisper model
78-
just run "https://youtube.com/watch?v=VIDEO_ID" --model large-v3
79-
80-
# Force language (auto-detected by default)
81-
just run "https://youtube.com/watch?v=VIDEO_ID" --language de
82-
83-
# See all model options
84-
just models
8554
```
8655

87-
## Config
88-
89-
Defaults are stored in a platform-specific config file:
90-
91-
| Platform | Path |
92-
|---|---|
93-
| macOS | `~/Library/Application Support/yt-transcribe/config.toml` |
94-
| Linux | `~/.config/yt-transcribe/config.toml` |
95-
| Windows | `%LOCALAPPDATA%\yt-transcribe\config.toml` |
96-
97-
```bash
98-
just config # show config file path
99-
just config --show # print current config
100-
just config --edit # open in $EDITOR (or notepad on Windows)
101-
```
102-
103-
Default config:
104-
105-
```toml
106-
[defaults]
107-
model = "turbo" # tiny | base | small | medium | turbo | large-v3
108-
language = "" # empty = auto-detect per video
109-
output_dir = "~/Downloads" # transcripts are auto-saved here (uses video title as filename)
110-
output_extension = "txt"
111-
112-
[whisper]
113-
device = "cpu" # cpu | cuda (use cuda if you have a GPU)
114-
compute_type = "int8" # int8 (fast CPU) | float16 (GPU) | float32 (precise)
115-
beam_size = 5 # higher = more accurate, slower (1–10)
116-
vad_filter = true # skip silent segments (recommended)
117-
```
118-
119-
**`output_dir`** — when set, every transcription is auto-saved to `<output_dir>/<video title>.<output_extension>` without needing `--output`. Useful for batch use. Supports `~` expansion.
120-
121-
## Options
122-
123-
| Flag | Short | Default | Description |
124-
|---|---|---|---|
125-
| `--print` | `-p` | off | Print to stdout instead of saving; suppresses all status output (safe to pipe) |
126-
| `--output` | `-o` || Save to this exact path (overrides `output_dir` in config) |
127-
| `--model` | `-m` | from config | Whisper model size |
128-
| `--language` | `-l` | auto-detect | Override language, e.g. `en`, `fr`, `de`. Omit to auto-detect — useful only when detection gets it wrong or the video has mixed-language content. |
129-
| `--force-whisper` | `-w` | off | Skip caption lookup, always use Whisper (ignored for local files — Whisper is always used) |
130-
131-
By default the transcript is **saved to `~/Downloads`** using the video title as the filename. Change `output_dir` in config to save elsewhere. Use `--print` to get stdout behaviour instead.
132-
133-
CLI flags always override config values.
134-
135-
## Whisper performance
136-
137-
Measured on a 4m 52s audio clip (Polish speech, CPU, `int8`):
138-
139-
| Model | Elapsed | Seconds per audio-minute | Time for 1h video |
140-
|---|---|---|---|
141-
| `tiny` | 15.4s | 3.2s | ~3 min |
142-
| `base` | 25.4s | 5.2s | ~5 min |
143-
| `small` | 70.1s | 14.4s | ~14 min |
144-
| `turbo` | 90.8s | 18.7s | ~19 min |
145-
146-
`medium` and `large-v3` skipped — extrapolate the trend.
147-
148-
- `tiny` is nearly 6× faster than `turbo` on CPU
149-
- `turbo` is the default because it trades that speed for much better accuracy — especially on non-English audio
150-
- For quick drafts or batch jobs where accuracy matters less, `base` is a good middle ground
151-
152-
## Whisper models
153-
154-
Model weights are downloaded from HuggingFace on first use and cached at `~/.cache/huggingface/hub/` (macOS/Linux) or `%USERPROFILE%\.cache\huggingface\hub\` (Windows). Subsequent runs use the cached copy — no re-download. Override with the `HF_HUB_CACHE` environment variable.
155-
156-
| Model | Size | Speed | Accuracy |
157-
|---|---|---|---|
158-
| `tiny` | ~75 MB | fastest | lowest |
159-
| `base` | ~140 MB | fast | decent |
160-
| `small` | ~460 MB | moderate | good |
161-
| `medium` | ~1.5 GB | slow | better |
162-
| `turbo` | ~800 MB | fast | best for size — **default** |
163-
| `large-v3` | ~3 GB | slowest | highest |
164-
165-
## How it works
166-
167-
```
168-
┌─────────────────────┐ ┌──────────────────────┐
169-
│ YouTube URL / ID │ │ Local audio/video │
170-
└──────────┬──────────┘ └───────────┬──────────┘
171-
│ extract video ID │
172-
▼ │
173-
┌────────────────────────┐ │
174-
│ Fetch YouTube captions │ ◄── youtube-transcript-api
175-
│ (any available lang) │ prefers manual over auto-generated
176-
└────────────┬───────────┘ │
177-
│ │
178-
┌──────────────┴─────────────┐ │
179-
│ Captions found? │ │
180-
▼ ▼ │
181-
Yes: done No: fallback │
182-
│ │
183-
┌──────▼──────┐ │
184-
│ Download │ ◄── yt-dlp + ffmpeg
185-
│ audio │ best quality stream
186-
└──────┬──────┘ │
187-
│ │
188-
└──────────┬──────────┘
189-
190-
┌──────▼──────┐
191-
│ Whisper │ ◄── faster-whisper
192-
│ transcribe │ CTranslate2, CPU/GPU
193-
└──────┬──────┘
194-
│ temp audio deleted (YouTube only)
195-
196-
┌─────────────────────────────┐
197-
│ --output / output_dir / stdout │
198-
└─────────────────────────────┘
199-
```
200-
201-
### Caption lookup
202-
203-
Uses [`youtube-transcript-api`](https://github.com/jdepoix/youtube-transcript-api) to fetch captions from YouTube's internal API — no audio download needed, instant. Prefers manually uploaded captions over auto-generated ones. Falls back to Whisper when:
204-
205-
- The video owner disabled captions
206-
- The video is too new for auto-generation to finish
207-
- YouTube rate-limits the request
208-
209-
### Supported file formats
210-
211-
faster-whisper uses [PyAV](https://github.com/PyAV-Org/PyAV) (bundled FFmpeg) to decode audio, so any format FFmpeg handles is accepted — no system ffmpeg installation required for local files.
212-
213-
Common formats:
214-
215-
| Type | Extensions |
216-
|---|---|
217-
| Audio | `.mp3` `.m4a` `.aac` `.wav` `.flac` `.ogg` `.opus` `.wma` `.aiff` |
218-
| Video (audio extracted) | `.mp4` `.mkv` `.mov` `.avi` `.webm` `.ts` |
219-
220-
### Whisper transcription
221-
222-
1. **Download**`yt-dlp` fetches the best available audio stream to a temp directory.
223-
2. **Transcribe**`faster-whisper` runs the Whisper model with `vad_filter=True` to skip silent segments. Language is auto-detected unless overridden.
224-
3. **Cleanup** — temp audio file deleted automatically.
225-
226-
`faster-whisper` uses [CTranslate2](https://github.com/OpenNMT/CTranslate2) under the hood — 4–8× faster than OpenAI's original Whisper on CPU using `int8` quantization.
56+
Transcripts are saved to `~/Downloads` by default using the video title as the filename.
22757

228-
## Dependencies
58+
## Documentation
22959

230-
| Package | Purpose |
231-
|---|---|
232-
| [`youtube-transcript-api`](https://github.com/jdepoix/youtube-transcript-api) | Fetch YouTube captions |
233-
| [`yt-dlp`](https://github.com/yt-dlp/yt-dlp) | Download audio |
234-
| [`faster-whisper`](https://github.com/SYSTRAN/faster-whisper) | Local speech-to-text |
235-
| [`typer`](https://typer.tiangolo.com/) | CLI framework |
236-
| [`rich`](https://rich.readthedocs.io/) | Terminal output |
237-
| [`platformdirs`](https://platformdirs.readthedocs.io/) | Platform-appropriate config paths |
60+
- [User Manual](docs/USER_MANUAL.md) — all options, config, examples, clipboard usage
61+
- [Technical Details](docs/TECHNICAL.md) — architecture, dependencies, performance benchmarks

0 commit comments

Comments
 (0)