Fix live transcription with incremental chunking and add docs

maxitoon · claude · maxitoon · commit f22f57f6615c · 2026-02-12T05:37:49.000+01:00
Live transcription previously re-transcribed the entire growing recording
file every 0.5s, making it progressively slower and unable to keep up.
Now uses sox trim to extract only new audio chunks (5s minimum), transcribes
each chunk independently, and displays text incrementally.

- Add whisper-transcribe-with-download.sh to repo (was external)
- Update Makefile to reference local script instead of ../
- Rewrite README with full usage docs, prerequisites, and architecture
- Add CLAUDE.md with shell script architecture notes

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,50 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Overview
+
+Local-first audio/video transcription tool using whisper-cli (whisper.cpp). Provides a Python CLI that wraps OpenAI Whisper for transcription and yt-dlp for YouTube audio downloading. Outputs in txt, srt, vtt, or json formats.
+
+## Commands
+
+```bash
+make dev          # Install dev deps + editable install
+make test         # Run pytest with coverage
+make lint         # ruff check + mypy
+make format       # black + isort
+make transcribe   # Run shell-based transcription (interactive)
+
+# Run a single test
+pytest tests/test_engine.py -v
+pytest tests/test_cli.py::test_transcribe_command -v
+
+# CLI usage (after `pip install -e .`)
+whisper-transcriber transcribe <file> --model base --format txt
+whisper-transcriber youtube <url> --model base --format srt
+whisper-transcriber models
+```
+
+## Architecture
+
+### Shell script (primary workflow)
+- **`whisper-transcribe-with-download.sh`** — Interactive menu-driven script. Handles live mic recording, YouTube downloads, and file transcription using `whisper-cli`, `sox`, `yt-dlp`, and `ffmpeg`.
+  - `original_live_transcription()` — Incremental chunk-based live transcription. Records via `rec` (sox), polls every 2s, extracts new audio chunks with `sox trim` when 5+ seconds of new audio is available, transcribes each chunk independently with `whisper-cli`, and displays text incrementally. On Ctrl+C, performs a final full-file transcription for accuracy.
+
+### Python CLI
+The CLI (`click`-based) has three commands: `transcribe` (local file), `youtube` (download + transcribe), and `models` (list available models).
+
+- **`src/transcriber/cli.py`** - Click CLI with commands. The `youtube` command delegates to `downloader` then calls `transcribe` internally.
+- **`src/transcriber/engine.py`** - `TranscriptionEngine` wraps the Python `whisper` library. Handles model loading, transcription, and saving results in multiple formats (txt/srt/vtt/json). Python Whisper (`openai-whisper`, `torch`) is an optional dependency; the primary workflow uses `whisper-cli` via the shell script.
+- **`src/transcriber/downloader.py`** - `YouTubeDownloader` wraps `yt-dlp` for audio extraction. Also optional import guarded.
+
+Both `engine.py` and `downloader.py` use try/except imports with `*_AVAILABLE` flags so the package can be installed without `torch` or `yt-dlp`.
+
+## Key Details
+
+- Python >=3.8, line length 88 (black), isort profile "black"
+- mypy strict mode enabled (`disallow_untyped_defs`, etc.)
+- Package installed from `src/` layout via setuptools
+- Entry point: `whisper-transcriber` CLI → `transcriber.cli:main`
+- External tool dependencies: `whisper-cli`, `ffmpeg`, `sox`, `yt-dlp`
+- Whisper model files (ggml-*.bin) expected in `~/whisper-models/`
diff --git a/Makefile b/Makefile
@@ -40,21 +40,20 @@ clean: ## Clean build artifacts
 
 transcribe: ## Run the main transcription script
 	@echo "🎙️  Starting Whisper Transcription..."
-	@if [ ! -f "../whisper-transcribe-with-download.sh" ]; then \
-		echo "❌ Main script not found in parent directory!"; \
+	@if [ ! -f "./whisper-transcribe-with-download.sh" ]; then \
+		echo "❌ Main script not found!"; \
 		echo ""; \
 		echo "📋 Setup Instructions:"; \
 		echo "1. Install whisper-cli: https://github.com/ggerganov/whisper.cpp"; \
 		echo "2. Install dependencies: brew install ffmpeg sox yt-dlp"; \
-		echo "3. Place whisper-transcribe-with-download.sh in the parent directory"; \
-		echo "4. Download Whisper models to ~/whisper-models/"; \
+		echo "3. Download Whisper models to ~/whisper-models/"; \
 		echo ""; \
 		echo "💡 Or run: make quick-setup"; \
 		exit 1; \
 	fi
 	@echo "Make sure whisper-cli and dependencies are installed!"
 	@echo ""
-	../whisper-transcribe-with-download.sh
+	./whisper-transcribe-with-download.sh
 
 quick-setup: ## Quick setup and run
 	@echo "🔧 Quick Setup Guide:"
diff --git a/README.md b/README.md
@@ -1,185 +1,112 @@
 # Whisper Transcriber
 
-A local-first transcription tool using OpenAI Whisper with YouTube download capabilities.
+Local-first audio/video transcription tool powered by [whisper-cli](https://github.com/ggerganov/whisper.cpp) (whisper.cpp). Features an interactive shell script for live microphone transcription, YouTube downloads, and file-based transcription, plus a Python CLI for programmatic use.
 
 ## Features
 
-- 🎵 **YouTube Audio Download**: Extract audio from YouTube videos
-- 🎙️ **High-Quality Transcription**: Powered by OpenAI Whisper
-- 📁 **Local Processing**: Everything runs on your machine
-- 🔧 **Multiple Scripts**: Various configurations for different use cases
+- **Live microphone transcription** with incremental chunk-based output (text appears every ~5 seconds while you speak)
+- **YouTube download + transcribe** — paste a URL, get a transcript
+- **File transcription** — supports Zoom recordings, WhatsApp audio, and any audio/video file
+- **Multiple output formats** — txt, srt, vtt, json
+- **Multi-language support** — English, French, auto-detect
+- **Multiple Whisper models** — base, small, medium, large
 
-## Quick Start
-
-### Prerequisites
-
-- **whisper-cli** (command-line Whisper tool)
-- **yt-dlp** (YouTube downloader)
-- **ffmpeg** (audio processing)
-- **sox** (for live recording)
-
-### Installation
-
-**1. Install whisper-cli:**
-```bash
-# Download from: https://github.com/ggerganov/whisper.cpp
-# Follow installation instructions for your platform
-```
-
-**2. Install system dependencies:**
+## Prerequisites
 
-**macOS:**
-```bash
-brew install ffmpeg sox yt-dlp
-```
+Install the following system dependencies:
 
-**Ubuntu/Debian:**
 ```bash
-sudo apt install ffmpeg sox yt-dlp
-```
+# macOS (Homebrew)
+brew install whisper-cpp ffmpeg sox yt-dlp
 
-**Windows:**
-```bash
-# Install via Chocolatey or download manually:
-choco install ffmpeg sox yt-dlp
+# The whisper-cli binary must be available in PATH
 ```
 
-**3. Download Whisper models:**
-```bash
-# The script will prompt you to download models to ~/whisper-models/
-# Download: ggml-base.en.bin, ggml-small.bin, ggml-medium.bin, ggml-large.bin
-```
+Download at least one Whisper model to `~/whisper-models/`:
 
-**4. Run transcription:**
 ```bash
-# From the whisper-transcriber directory
-../whisper-transcribe-with-download.sh
+# Example: download the base English model
+curl -L -o ~/whisper-models/ggml-base.en.bin \
+  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
 
-# Or create a convenient symlink:
-ln -s ../whisper-transcribe-with-download.sh transcribe.sh
-./transcribe.sh
+# For multi-language support, also grab small or medium:
+curl -L -o ~/whisper-models/ggml-small.bin \
+  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin
 ```
 
-## Usage Examples
+## Quick Start
 
-### Interactive Mode
 ```bash
-# Run the main script (interactive menu)
-../whisper-transcribe-with-download.sh
-
-# Choose from options:
-# 1) 🔴 ORIGINAL Live Recording + Live Transcript
-# 2) 🎥 YouTube Video + Transcript
-# 3) 📥 YouTube Video Download Only
-# 4) 💼 Zoom Recording + Transcript
-# 5) 💬 WhatsApp Audio + Transcript
-# 6) 📁 Other Audio/Video File + Transcript
-```
+# Interactive transcription menu
+make transcribe
 
-### YouTube Transcription
-```bash
-# The script will prompt for URL and handle download + transcription
-# Choose option 2 for "YouTube Video + Transcript"
+# Or run directly
+./whisper-transcribe-with-download.sh
 ```
 
-### Live Transcription
-```bash
-# Records from microphone and shows real-time transcription
-# Choose option 1 for "ORIGINAL Live Recording + Live Transcript"
-```
+This launches an interactive menu with these options:
 
-## Main Script
+1. **Live Recording** — Record from microphone with real-time transcription
+2. **YouTube Video** — Download and transcribe a YouTube video
+3. **YouTube Download Only** — Download video without transcription
+4. **Zoom Recording** — Transcribe a Zoom recording file
+5. **WhatsApp Audio** — Transcribe a WhatsApp voice message
+6. **Other File** — Transcribe any audio/video file
 
-This project uses the comprehensive `whisper-transcribe-with-download.sh` script which provides:
+Transcripts are saved to `~/Desktop/Transcripts/`. Audio downloads are saved to `~/whisper-downloads/` and auto-cleaned after 7 days.
 
-- 🎙️ **Live transcription** with real-time text display
-- 🎥 **YouTube download + transcription** in one command
-- 📁 **Local file transcription** (Zoom, WhatsApp, audio/video files)
-- 🧹 **Automatic cleanup** of old files (7-day retention)
+## How Live Transcription Works
 
-See the [scripts documentation](docs/scripts.md) for detailed usage.
+The live recording mode (option 1) uses incremental chunk-based transcription:
 
-## Development
+1. `rec` (from sox) records audio from your microphone in the background
+2. Every 2 seconds, the script checks if at least 5 seconds of new audio is available
+3. New audio is extracted with `sox trim` and transcribed with `whisper-cli`
+4. Only the new text is displayed — no re-processing of already-transcribed audio
+5. On Ctrl+C, a final full-file transcription is performed for maximum accuracy and saved to disk
 
-The project is currently focused on the main `whisper-transcribe-with-download.sh` script. The Python modules in `src/` provide a foundation for future development.
+## Python CLI
+
+A Python CLI is also available for programmatic use:
 
 ```bash
-# Install Python dependencies (for future development)
-make dev
+# Install
+pip install -e .
 
-# Run tests
-make test
+# Transcribe a local file
+whisper-transcriber transcribe <file> --model base --format txt
 
-# Format code
-make format
+# Download and transcribe a YouTube video
+whisper-transcriber youtube <url> --model base --format srt
 
-# Run linting
-make lint
+# List available models
+whisper-transcriber models
 ```
 
-## Architecture
-
-- **Main Script**: `../whisper-transcribe-with-download.sh` - Complete transcription solution
-- **Python Modules**: `src/transcriber/` - Future development framework
-- **Documentation**: Complete guides and examples
-
-See [architecture.md](docs/architecture.md) for detailed design.
-
-## Contributing
-
-1. Fork the repository
-2. Create a feature branch
-3. Make your changes
-4. Add tests if applicable
-5. Submit a pull request
+> **Note:** The Python CLI requires `openai-whisper` and `torch` as optional dependencies. The shell script (recommended) uses `whisper-cli` instead and has no Python dependencies.
 
-See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
-
-## Configuration
-
-The application uses environment variables and configuration files:
-
-- **Environment variables**: Model selection, device settings
-- **Config files**: Copy `env.example` to `.env` and customize
-- **CLI arguments**: Runtime options
+## Development
 
 ```bash
-cp env.example .env
-# Edit .env with your preferred settings
+make dev       # Install dev dependencies + editable install
+make test      # Run pytest with coverage
+make lint      # ruff check + mypy
+make format    # black + isort
 ```
 
-## License
+## Project Structure
 
-MIT - see [LICENSE](LICENSE) file.
-
-## Troubleshooting
-
-### Common Issues
-
-**whisper-cli not found:**
-- Download and install whisper-cli from: https://github.com/ggerganov/whisper.cpp
-- Ensure it's in your PATH or provide full path to executable
-
-**FFmpeg/Sox not found:**
-- Ensure FFmpeg and Sox are installed and in your PATH
-- On macOS: `brew install ffmpeg sox`
-- On Ubuntu: `sudo apt install ffmpeg sox`
-
-**yt-dlp not found:**
-- Install yt-dlp: `pip install yt-dlp` or `brew install yt-dlp`
-- Update regularly: `yt-dlp -U`
-
-**Whisper models missing:**
-- Download models to `~/whisper-models/` directory
-- Available models: `ggml-base.en.bin`, `ggml-small.bin`, `ggml-medium.bin`, `ggml-large.bin`
-- The script will guide you through model selection
+```
+whisper-transcriber/
+  whisper-transcribe-with-download.sh  # Main interactive script (shell)
+  src/transcriber/
+    cli.py          # Python CLI (click-based)
+    engine.py       # Whisper engine wrapper
+    downloader.py   # YouTube downloader (yt-dlp wrapper)
+  tests/
+  Makefile
+```
 
-**Memory issues:**
-- Use smaller models (base, small) for limited RAM
-- Close other applications during transcription
-- Ensure adequate disk space for downloads
+## License
 
-**YouTube download fails:**
-- Check internet connection stability
-- Some videos may be region-restricted or private
-- Try different video quality settings
+MIT
diff --git a/whisper-transcribe-with-download.sh b/whisper-transcribe-with-download.sh