|
1 | 1 | # Whisper Transcriber |
2 | 2 |
|
3 | | -A local-first transcription tool using OpenAI Whisper with YouTube download capabilities. |
| 3 | +Local-first audio/video transcription tool powered by [whisper-cli](https://github.com/ggerganov/whisper.cpp) (whisper.cpp). Features an interactive shell script for live microphone transcription, YouTube downloads, and file-based transcription, plus a Python CLI for programmatic use. |
4 | 4 |
|
5 | 5 | ## Features |
6 | 6 |
|
7 | | -- 🎵 **YouTube Audio Download**: Extract audio from YouTube videos |
8 | | -- 🎙️ **High-Quality Transcription**: Powered by OpenAI Whisper |
9 | | -- 📁 **Local Processing**: Everything runs on your machine |
10 | | -- 🔧 **Multiple Scripts**: Various configurations for different use cases |
| 7 | +- **Live microphone transcription** with incremental chunk-based output (text appears every ~5 seconds while you speak) |
| 8 | +- **YouTube download + transcribe** — paste a URL, get a transcript |
| 9 | +- **File transcription** — supports Zoom recordings, WhatsApp audio, and any audio/video file |
| 10 | +- **Multiple output formats** — txt, srt, vtt, json |
| 11 | +- **Multi-language support** — English, French, auto-detect |
| 12 | +- **Multiple Whisper models** — base, small, medium, large |
11 | 13 |
|
12 | | -## Quick Start |
13 | | - |
14 | | -### Prerequisites |
15 | | - |
16 | | -- **whisper-cli** (command-line Whisper tool) |
17 | | -- **yt-dlp** (YouTube downloader) |
18 | | -- **ffmpeg** (audio processing) |
19 | | -- **sox** (for live recording) |
20 | | - |
21 | | -### Installation |
22 | | - |
23 | | -**1. Install whisper-cli:** |
24 | | -```bash |
25 | | -# Download from: https://github.com/ggerganov/whisper.cpp |
26 | | -# Follow installation instructions for your platform |
27 | | -``` |
28 | | - |
29 | | -**2. Install system dependencies:** |
| 14 | +## Prerequisites |
30 | 15 |
|
31 | | -**macOS:** |
32 | | -```bash |
33 | | -brew install ffmpeg sox yt-dlp |
34 | | -``` |
| 16 | +Install the following system dependencies: |
35 | 17 |
|
36 | | -**Ubuntu/Debian:** |
37 | 18 | ```bash |
38 | | -sudo apt install ffmpeg sox yt-dlp |
39 | | -``` |
| 19 | +# macOS (Homebrew) |
| 20 | +brew install whisper-cpp ffmpeg sox yt-dlp |
40 | 21 |
|
41 | | -**Windows:** |
42 | | -```bash |
43 | | -# Install via Chocolatey or download manually: |
44 | | -choco install ffmpeg sox yt-dlp |
| 22 | +# The whisper-cli binary must be available in PATH |
45 | 23 | ``` |
46 | 24 |
|
47 | | -**3. Download Whisper models:** |
48 | | -```bash |
49 | | -# The script will prompt you to download models to ~/whisper-models/ |
50 | | -# Download: ggml-base.en.bin, ggml-small.bin, ggml-medium.bin, ggml-large.bin |
51 | | -``` |
| 25 | +Download at least one Whisper model to `~/whisper-models/`: |
52 | 26 |
|
53 | | -**4. Run transcription:** |
54 | 27 | ```bash |
55 | | -# From the whisper-transcriber directory |
56 | | -../whisper-transcribe-with-download.sh |
| 28 | +# Example: download the base English model |
| 29 | +curl -L -o ~/whisper-models/ggml-base.en.bin \ |
| 30 | + https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin |
57 | 31 |
|
58 | | -# Or create a convenient symlink: |
59 | | -ln -s ../whisper-transcribe-with-download.sh transcribe.sh |
60 | | -./transcribe.sh |
| 32 | +# For multi-language support, also grab small or medium: |
| 33 | +curl -L -o ~/whisper-models/ggml-small.bin \ |
| 34 | + https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin |
61 | 35 | ``` |
62 | 36 |
|
63 | | -## Usage Examples |
| 37 | +## Quick Start |
64 | 38 |
|
65 | | -### Interactive Mode |
66 | 39 | ```bash |
67 | | -# Run the main script (interactive menu) |
68 | | -../whisper-transcribe-with-download.sh |
69 | | - |
70 | | -# Choose from options: |
71 | | -# 1) 🔴 ORIGINAL Live Recording + Live Transcript |
72 | | -# 2) 🎥 YouTube Video + Transcript |
73 | | -# 3) 📥 YouTube Video Download Only |
74 | | -# 4) 💼 Zoom Recording + Transcript |
75 | | -# 5) 💬 WhatsApp Audio + Transcript |
76 | | -# 6) 📁 Other Audio/Video File + Transcript |
77 | | -``` |
| 40 | +# Interactive transcription menu |
| 41 | +make transcribe |
78 | 42 |
|
79 | | -### YouTube Transcription |
80 | | -```bash |
81 | | -# The script will prompt for URL and handle download + transcription |
82 | | -# Choose option 2 for "YouTube Video + Transcript" |
| 43 | +# Or run directly |
| 44 | +./whisper-transcribe-with-download.sh |
83 | 45 | ``` |
84 | 46 |
|
85 | | -### Live Transcription |
86 | | -```bash |
87 | | -# Records from microphone and shows real-time transcription |
88 | | -# Choose option 1 for "ORIGINAL Live Recording + Live Transcript" |
89 | | -``` |
| 47 | +This launches an interactive menu with these options: |
90 | 48 |
|
91 | | -## Main Script |
| 49 | +1. **Live Recording** — Record from microphone with real-time transcription |
| 50 | +2. **YouTube Video** — Download and transcribe a YouTube video |
| 51 | +3. **YouTube Download Only** — Download video without transcription |
| 52 | +4. **Zoom Recording** — Transcribe a Zoom recording file |
| 53 | +5. **WhatsApp Audio** — Transcribe a WhatsApp voice message |
| 54 | +6. **Other File** — Transcribe any audio/video file |
92 | 55 |
|
93 | | -This project uses the comprehensive `whisper-transcribe-with-download.sh` script which provides: |
| 56 | +Transcripts are saved to `~/Desktop/Transcripts/`. Audio downloads are saved to `~/whisper-downloads/` and auto-cleaned after 7 days. |
94 | 57 |
|
95 | | -- 🎙️ **Live transcription** with real-time text display |
96 | | -- 🎥 **YouTube download + transcription** in one command |
97 | | -- 📁 **Local file transcription** (Zoom, WhatsApp, audio/video files) |
98 | | -- 🧹 **Automatic cleanup** of old files (7-day retention) |
| 58 | +## How Live Transcription Works |
99 | 59 |
|
100 | | -See the [scripts documentation](docs/scripts.md) for detailed usage. |
| 60 | +The live recording mode (option 1) uses incremental chunk-based transcription: |
101 | 61 |
|
102 | | -## Development |
| 62 | +1. `rec` (from sox) records audio from your microphone in the background |
| 63 | +2. Every 2 seconds, the script checks if at least 5 seconds of new audio is available |
| 64 | +3. New audio is extracted with `sox trim` and transcribed with `whisper-cli` |
| 65 | +4. Only the new text is displayed — no re-processing of already-transcribed audio |
| 66 | +5. On Ctrl+C, a final full-file transcription is performed for maximum accuracy and saved to disk |
103 | 67 |
|
104 | | -The project is currently focused on the main `whisper-transcribe-with-download.sh` script. The Python modules in `src/` provide a foundation for future development. |
| 68 | +## Python CLI |
| 69 | + |
| 70 | +A Python CLI is also available for programmatic use: |
105 | 71 |
|
106 | 72 | ```bash |
107 | | -# Install Python dependencies (for future development) |
108 | | -make dev |
| 73 | +# Install |
| 74 | +pip install -e . |
109 | 75 |
|
110 | | -# Run tests |
111 | | -make test |
| 76 | +# Transcribe a local file |
| 77 | +whisper-transcriber transcribe <file> --model base --format txt |
112 | 78 |
|
113 | | -# Format code |
114 | | -make format |
| 79 | +# Download and transcribe a YouTube video |
| 80 | +whisper-transcriber youtube <url> --model base --format srt |
115 | 81 |
|
116 | | -# Run linting |
117 | | -make lint |
| 82 | +# List available models |
| 83 | +whisper-transcriber models |
118 | 84 | ``` |
119 | 85 |
|
120 | | -## Architecture |
121 | | - |
122 | | -- **Main Script**: `../whisper-transcribe-with-download.sh` - Complete transcription solution |
123 | | -- **Python Modules**: `src/transcriber/` - Future development framework |
124 | | -- **Documentation**: Complete guides and examples |
125 | | - |
126 | | -See [architecture.md](docs/architecture.md) for detailed design. |
127 | | - |
128 | | -## Contributing |
129 | | - |
130 | | -1. Fork the repository |
131 | | -2. Create a feature branch |
132 | | -3. Make your changes |
133 | | -4. Add tests if applicable |
134 | | -5. Submit a pull request |
| 86 | +> **Note:** The Python CLI requires `openai-whisper` and `torch` as optional dependencies. The shell script (recommended) uses `whisper-cli` instead and has no Python dependencies. |
135 | 87 |
|
136 | | -See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines. |
137 | | - |
138 | | -## Configuration |
139 | | - |
140 | | -The application uses environment variables and configuration files: |
141 | | - |
142 | | -- **Environment variables**: Model selection, device settings |
143 | | -- **Config files**: Copy `env.example` to `.env` and customize |
144 | | -- **CLI arguments**: Runtime options |
| 88 | +## Development |
145 | 89 |
|
146 | 90 | ```bash |
147 | | -cp env.example .env |
148 | | -# Edit .env with your preferred settings |
| 91 | +make dev # Install dev dependencies + editable install |
| 92 | +make test # Run pytest with coverage |
| 93 | +make lint # ruff check + mypy |
| 94 | +make format # black + isort |
149 | 95 | ``` |
150 | 96 |
|
151 | | -## License |
| 97 | +## Project Structure |
152 | 98 |
|
153 | | -MIT - see [LICENSE](LICENSE) file. |
154 | | - |
155 | | -## Troubleshooting |
156 | | - |
157 | | -### Common Issues |
158 | | - |
159 | | -**whisper-cli not found:** |
160 | | -- Download and install whisper-cli from: https://github.com/ggerganov/whisper.cpp |
161 | | -- Ensure it's in your PATH or provide full path to executable |
162 | | - |
163 | | -**FFmpeg/Sox not found:** |
164 | | -- Ensure FFmpeg and Sox are installed and in your PATH |
165 | | -- On macOS: `brew install ffmpeg sox` |
166 | | -- On Ubuntu: `sudo apt install ffmpeg sox` |
167 | | - |
168 | | -**yt-dlp not found:** |
169 | | -- Install yt-dlp: `pip install yt-dlp` or `brew install yt-dlp` |
170 | | -- Update regularly: `yt-dlp -U` |
171 | | - |
172 | | -**Whisper models missing:** |
173 | | -- Download models to `~/whisper-models/` directory |
174 | | -- Available models: `ggml-base.en.bin`, `ggml-small.bin`, `ggml-medium.bin`, `ggml-large.bin` |
175 | | -- The script will guide you through model selection |
| 99 | +``` |
| 100 | +whisper-transcriber/ |
| 101 | + whisper-transcribe-with-download.sh # Main interactive script (shell) |
| 102 | + src/transcriber/ |
| 103 | + cli.py # Python CLI (click-based) |
| 104 | + engine.py # Whisper engine wrapper |
| 105 | + downloader.py # YouTube downloader (yt-dlp wrapper) |
| 106 | + tests/ |
| 107 | + Makefile |
| 108 | +``` |
176 | 109 |
|
177 | | -**Memory issues:** |
178 | | -- Use smaller models (base, small) for limited RAM |
179 | | -- Close other applications during transcription |
180 | | -- Ensure adequate disk space for downloads |
| 110 | +## License |
181 | 111 |
|
182 | | -**YouTube download fails:** |
183 | | -- Check internet connection stability |
184 | | -- Some videos may be region-restricted or private |
185 | | -- Try different video quality settings |
| 112 | +MIT |
0 commit comments