Skip to content

Commit f22f57f

Browse files
maxitoonclaude
andcommitted
Fix live transcription with incremental chunking and add docs
Live transcription previously re-transcribed the entire growing recording file every 0.5s, making it progressively slower and unable to keep up. Now uses sox trim to extract only new audio chunks (5s minimum), transcribes each chunk independently, and displays text incrementally. - Add whisper-transcribe-with-download.sh to repo (was external) - Update Makefile to reference local script instead of ../ - Rewrite README with full usage docs, prerequisites, and architecture - Add CLAUDE.md with shell script architecture notes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent b36921a commit f22f57f

4 files changed

Lines changed: 497 additions & 147 deletions

File tree

CLAUDE.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Overview
6+
7+
Local-first audio/video transcription tool using whisper-cli (whisper.cpp). Provides a Python CLI that wraps OpenAI Whisper for transcription and yt-dlp for YouTube audio downloading. Outputs in txt, srt, vtt, or json formats.
8+
9+
## Commands
10+
11+
```bash
12+
make dev # Install dev deps + editable install
13+
make test # Run pytest with coverage
14+
make lint # ruff check + mypy
15+
make format # black + isort
16+
make transcribe # Run shell-based transcription (interactive)
17+
18+
# Run a single test
19+
pytest tests/test_engine.py -v
20+
pytest tests/test_cli.py::test_transcribe_command -v
21+
22+
# CLI usage (after `pip install -e .`)
23+
whisper-transcriber transcribe <file> --model base --format txt
24+
whisper-transcriber youtube <url> --model base --format srt
25+
whisper-transcriber models
26+
```
27+
28+
## Architecture
29+
30+
### Shell script (primary workflow)
31+
- **`whisper-transcribe-with-download.sh`** — Interactive menu-driven script. Handles live mic recording, YouTube downloads, and file transcription using `whisper-cli`, `sox`, `yt-dlp`, and `ffmpeg`.
32+
- `original_live_transcription()` — Incremental chunk-based live transcription. Records via `rec` (sox), polls every 2s, extracts new audio chunks with `sox trim` when 5+ seconds of new audio is available, transcribes each chunk independently with `whisper-cli`, and displays text incrementally. On Ctrl+C, performs a final full-file transcription for accuracy.
33+
34+
### Python CLI
35+
The CLI (`click`-based) has three commands: `transcribe` (local file), `youtube` (download + transcribe), and `models` (list available models).
36+
37+
- **`src/transcriber/cli.py`** - Click CLI with commands. The `youtube` command delegates to `downloader` then calls `transcribe` internally.
38+
- **`src/transcriber/engine.py`** - `TranscriptionEngine` wraps the Python `whisper` library. Handles model loading, transcription, and saving results in multiple formats (txt/srt/vtt/json). Python Whisper (`openai-whisper`, `torch`) is an optional dependency; the primary workflow uses `whisper-cli` via the shell script.
39+
- **`src/transcriber/downloader.py`** - `YouTubeDownloader` wraps `yt-dlp` for audio extraction. Also optional import guarded.
40+
41+
Both `engine.py` and `downloader.py` use try/except imports with `*_AVAILABLE` flags so the package can be installed without `torch` or `yt-dlp`.
42+
43+
## Key Details
44+
45+
- Python >=3.8, line length 88 (black), isort profile "black"
46+
- mypy strict mode enabled (`disallow_untyped_defs`, etc.)
47+
- Package installed from `src/` layout via setuptools
48+
- Entry point: `whisper-transcriber` CLI → `transcriber.cli:main`
49+
- External tool dependencies: `whisper-cli`, `ffmpeg`, `sox`, `yt-dlp`
50+
- Whisper model files (ggml-*.bin) expected in `~/whisper-models/`

Makefile

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,21 +40,20 @@ clean: ## Clean build artifacts
4040

4141
transcribe: ## Run the main transcription script
4242
@echo "🎙️ Starting Whisper Transcription..."
43-
@if [ ! -f "../whisper-transcribe-with-download.sh" ]; then \
44-
echo "❌ Main script not found in parent directory!"; \
43+
@if [ ! -f "./whisper-transcribe-with-download.sh" ]; then \
44+
echo "❌ Main script not found!"; \
4545
echo ""; \
4646
echo "📋 Setup Instructions:"; \
4747
echo "1. Install whisper-cli: https://github.com/ggerganov/whisper.cpp"; \
4848
echo "2. Install dependencies: brew install ffmpeg sox yt-dlp"; \
49-
echo "3. Place whisper-transcribe-with-download.sh in the parent directory"; \
50-
echo "4. Download Whisper models to ~/whisper-models/"; \
49+
echo "3. Download Whisper models to ~/whisper-models/"; \
5150
echo ""; \
5251
echo "💡 Or run: make quick-setup"; \
5352
exit 1; \
5453
fi
5554
@echo "Make sure whisper-cli and dependencies are installed!"
5655
@echo ""
57-
../whisper-transcribe-with-download.sh
56+
./whisper-transcribe-with-download.sh
5857

5958
quick-setup: ## Quick setup and run
6059
@echo "🔧 Quick Setup Guide:"

README.md

Lines changed: 69 additions & 142 deletions
Original file line numberDiff line numberDiff line change
@@ -1,185 +1,112 @@
11
# Whisper Transcriber
22

3-
A local-first transcription tool using OpenAI Whisper with YouTube download capabilities.
3+
Local-first audio/video transcription tool powered by [whisper-cli](https://github.com/ggerganov/whisper.cpp) (whisper.cpp). Features an interactive shell script for live microphone transcription, YouTube downloads, and file-based transcription, plus a Python CLI for programmatic use.
44

55
## Features
66

7-
- 🎵 **YouTube Audio Download**: Extract audio from YouTube videos
8-
- 🎙️ **High-Quality Transcription**: Powered by OpenAI Whisper
9-
- 📁 **Local Processing**: Everything runs on your machine
10-
- 🔧 **Multiple Scripts**: Various configurations for different use cases
7+
- **Live microphone transcription** with incremental chunk-based output (text appears every ~5 seconds while you speak)
8+
- **YouTube download + transcribe** — paste a URL, get a transcript
9+
- **File transcription** — supports Zoom recordings, WhatsApp audio, and any audio/video file
10+
- **Multiple output formats** — txt, srt, vtt, json
11+
- **Multi-language support** — English, French, auto-detect
12+
- **Multiple Whisper models** — base, small, medium, large
1113

12-
## Quick Start
13-
14-
### Prerequisites
15-
16-
- **whisper-cli** (command-line Whisper tool)
17-
- **yt-dlp** (YouTube downloader)
18-
- **ffmpeg** (audio processing)
19-
- **sox** (for live recording)
20-
21-
### Installation
22-
23-
**1. Install whisper-cli:**
24-
```bash
25-
# Download from: https://github.com/ggerganov/whisper.cpp
26-
# Follow installation instructions for your platform
27-
```
28-
29-
**2. Install system dependencies:**
14+
## Prerequisites
3015

31-
**macOS:**
32-
```bash
33-
brew install ffmpeg sox yt-dlp
34-
```
16+
Install the following system dependencies:
3517

36-
**Ubuntu/Debian:**
3718
```bash
38-
sudo apt install ffmpeg sox yt-dlp
39-
```
19+
# macOS (Homebrew)
20+
brew install whisper-cpp ffmpeg sox yt-dlp
4021

41-
**Windows:**
42-
```bash
43-
# Install via Chocolatey or download manually:
44-
choco install ffmpeg sox yt-dlp
22+
# The whisper-cli binary must be available in PATH
4523
```
4624

47-
**3. Download Whisper models:**
48-
```bash
49-
# The script will prompt you to download models to ~/whisper-models/
50-
# Download: ggml-base.en.bin, ggml-small.bin, ggml-medium.bin, ggml-large.bin
51-
```
25+
Download at least one Whisper model to `~/whisper-models/`:
5226

53-
**4. Run transcription:**
5427
```bash
55-
# From the whisper-transcriber directory
56-
../whisper-transcribe-with-download.sh
28+
# Example: download the base English model
29+
curl -L -o ~/whisper-models/ggml-base.en.bin \
30+
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
5731

58-
# Or create a convenient symlink:
59-
ln -s ../whisper-transcribe-with-download.sh transcribe.sh
60-
./transcribe.sh
32+
# For multi-language support, also grab small or medium:
33+
curl -L -o ~/whisper-models/ggml-small.bin \
34+
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin
6135
```
6236

63-
## Usage Examples
37+
## Quick Start
6438

65-
### Interactive Mode
6639
```bash
67-
# Run the main script (interactive menu)
68-
../whisper-transcribe-with-download.sh
69-
70-
# Choose from options:
71-
# 1) 🔴 ORIGINAL Live Recording + Live Transcript
72-
# 2) 🎥 YouTube Video + Transcript
73-
# 3) 📥 YouTube Video Download Only
74-
# 4) 💼 Zoom Recording + Transcript
75-
# 5) 💬 WhatsApp Audio + Transcript
76-
# 6) 📁 Other Audio/Video File + Transcript
77-
```
40+
# Interactive transcription menu
41+
make transcribe
7842

79-
### YouTube Transcription
80-
```bash
81-
# The script will prompt for URL and handle download + transcription
82-
# Choose option 2 for "YouTube Video + Transcript"
43+
# Or run directly
44+
./whisper-transcribe-with-download.sh
8345
```
8446

85-
### Live Transcription
86-
```bash
87-
# Records from microphone and shows real-time transcription
88-
# Choose option 1 for "ORIGINAL Live Recording + Live Transcript"
89-
```
47+
This launches an interactive menu with these options:
9048

91-
## Main Script
49+
1. **Live Recording** — Record from microphone with real-time transcription
50+
2. **YouTube Video** — Download and transcribe a YouTube video
51+
3. **YouTube Download Only** — Download video without transcription
52+
4. **Zoom Recording** — Transcribe a Zoom recording file
53+
5. **WhatsApp Audio** — Transcribe a WhatsApp voice message
54+
6. **Other File** — Transcribe any audio/video file
9255

93-
This project uses the comprehensive `whisper-transcribe-with-download.sh` script which provides:
56+
Transcripts are saved to `~/Desktop/Transcripts/`. Audio downloads are saved to `~/whisper-downloads/` and auto-cleaned after 7 days.
9457

95-
- 🎙️ **Live transcription** with real-time text display
96-
- 🎥 **YouTube download + transcription** in one command
97-
- 📁 **Local file transcription** (Zoom, WhatsApp, audio/video files)
98-
- 🧹 **Automatic cleanup** of old files (7-day retention)
58+
## How Live Transcription Works
9959

100-
See the [scripts documentation](docs/scripts.md) for detailed usage.
60+
The live recording mode (option 1) uses incremental chunk-based transcription:
10161

102-
## Development
62+
1. `rec` (from sox) records audio from your microphone in the background
63+
2. Every 2 seconds, the script checks if at least 5 seconds of new audio is available
64+
3. New audio is extracted with `sox trim` and transcribed with `whisper-cli`
65+
4. Only the new text is displayed — no re-processing of already-transcribed audio
66+
5. On Ctrl+C, a final full-file transcription is performed for maximum accuracy and saved to disk
10367

104-
The project is currently focused on the main `whisper-transcribe-with-download.sh` script. The Python modules in `src/` provide a foundation for future development.
68+
## Python CLI
69+
70+
A Python CLI is also available for programmatic use:
10571

10672
```bash
107-
# Install Python dependencies (for future development)
108-
make dev
73+
# Install
74+
pip install -e .
10975

110-
# Run tests
111-
make test
76+
# Transcribe a local file
77+
whisper-transcriber transcribe <file> --model base --format txt
11278

113-
# Format code
114-
make format
79+
# Download and transcribe a YouTube video
80+
whisper-transcriber youtube <url> --model base --format srt
11581

116-
# Run linting
117-
make lint
82+
# List available models
83+
whisper-transcriber models
11884
```
11985

120-
## Architecture
121-
122-
- **Main Script**: `../whisper-transcribe-with-download.sh` - Complete transcription solution
123-
- **Python Modules**: `src/transcriber/` - Future development framework
124-
- **Documentation**: Complete guides and examples
125-
126-
See [architecture.md](docs/architecture.md) for detailed design.
127-
128-
## Contributing
129-
130-
1. Fork the repository
131-
2. Create a feature branch
132-
3. Make your changes
133-
4. Add tests if applicable
134-
5. Submit a pull request
86+
> **Note:** The Python CLI requires `openai-whisper` and `torch` as optional dependencies. The shell script (recommended) uses `whisper-cli` instead and has no Python dependencies.
13587
136-
See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
137-
138-
## Configuration
139-
140-
The application uses environment variables and configuration files:
141-
142-
- **Environment variables**: Model selection, device settings
143-
- **Config files**: Copy `env.example` to `.env` and customize
144-
- **CLI arguments**: Runtime options
88+
## Development
14589

14690
```bash
147-
cp env.example .env
148-
# Edit .env with your preferred settings
91+
make dev # Install dev dependencies + editable install
92+
make test # Run pytest with coverage
93+
make lint # ruff check + mypy
94+
make format # black + isort
14995
```
15096

151-
## License
97+
## Project Structure
15298

153-
MIT - see [LICENSE](LICENSE) file.
154-
155-
## Troubleshooting
156-
157-
### Common Issues
158-
159-
**whisper-cli not found:**
160-
- Download and install whisper-cli from: https://github.com/ggerganov/whisper.cpp
161-
- Ensure it's in your PATH or provide full path to executable
162-
163-
**FFmpeg/Sox not found:**
164-
- Ensure FFmpeg and Sox are installed and in your PATH
165-
- On macOS: `brew install ffmpeg sox`
166-
- On Ubuntu: `sudo apt install ffmpeg sox`
167-
168-
**yt-dlp not found:**
169-
- Install yt-dlp: `pip install yt-dlp` or `brew install yt-dlp`
170-
- Update regularly: `yt-dlp -U`
171-
172-
**Whisper models missing:**
173-
- Download models to `~/whisper-models/` directory
174-
- Available models: `ggml-base.en.bin`, `ggml-small.bin`, `ggml-medium.bin`, `ggml-large.bin`
175-
- The script will guide you through model selection
99+
```
100+
whisper-transcriber/
101+
whisper-transcribe-with-download.sh # Main interactive script (shell)
102+
src/transcriber/
103+
cli.py # Python CLI (click-based)
104+
engine.py # Whisper engine wrapper
105+
downloader.py # YouTube downloader (yt-dlp wrapper)
106+
tests/
107+
Makefile
108+
```
176109

177-
**Memory issues:**
178-
- Use smaller models (base, small) for limited RAM
179-
- Close other applications during transcription
180-
- Ensure adequate disk space for downloads
110+
## License
181111

182-
**YouTube download fails:**
183-
- Check internet connection stability
184-
- Some videos may be region-restricted or private
185-
- Try different video quality settings
112+
MIT

0 commit comments

Comments
 (0)