A professional-grade, local, web-based tool for generating, translating, and editing subtitles for videos using state-of-the-art AI models.
AI Subtitle Studio is a standalone tool designed to streamline the subtitle creation process for content creators. It leverages extensive AI capabilities to:
- Transcribe audio from videos using OpenAI's Whisper (or Faster-Whisper) models.
- Translate subtitles into multiple languages using Helsinki-NLP models.
- Refine and polish text using Qwen large language models (optional).
- Edit subtitles in a modern, Google-style web interface with real-time video seeking.
All processing happens locally on your machine, ensuring privacy and zero data leakage.
This project is built using the following technologies:
- Backend / Web Framework: Python, Flask, Flask-SocketIO
- AI / ML:
- ASR:
openai-whisper,faster-whisper - Translation:
transformers(Helsinki-NLP/OPUS-MT) - LLM Refinement:
transformers(Qwen/Qwen3-4B) - Deep Learning Framework:
PyTorch(GPU accelerated)
- ASR:
- Utilities:
ffmpeg(Audio processing),yt-dlp(Video downloading) - Frontend: HTML5, Tailwind CSS, Google Material Design (Custom CSS), JavaScript (Vanilla)
- Python 3.10+
- FFmpeg installed and added to system PATH.
- (Optional) NVIDIA GPU with drivers installed for acceleration.
-
Clone the repository:
git clone https://github.com/your-repo/AI-Subtitle-Studio.git cd AI-Subtitle-Studio -
Create a Conda environment (Recommended):
conda create -n ai-subtitle python=3.10 conda activate ai-subtitle
-
Install Dependencies:
pip install -r requirements.txt
Note: Ensure you install the GPU version of PyTorch if you have a compatible NVIDIA card: PyTorch Get Started
-
Start the Server:
# chmod +x run.sh ./run.shOr manually:
python src/ai_subtitle_generator/web/server.py
-
Access the Interface: Open your browser and navigate to:
http://localhost:7860 -
Generate Subtitles:
- Paste a YouTube URL or upload a local video file.
- Select your Model (e.g.,
medium,large-v2). - Choose Display Mode (Original/Translated/Bilingual).
- Click "Generate Subtitles".
This application is packaged with a lightweight, GPU-ready Docker image based on python:3.10-slim.
- Docker Engine
- NVIDIA Container Toolkit (for GPU acceleration)
-
Build the Image:
docker build -t ai-subtitle-studio .Note: The first build handles large AI dependencies (Torch, CUDA libs) so it may take some time depending on your network.
-
Run with GPU (Recommended):
docker run --gpus all -d -p 7860:7860 --name subtitle-studio ai-subtitle-studio
-
Run on CPU (Optional): If you don't have a GPU, you can run it in CPU mode (slower):
docker run -d -p 7860:7860 --name subtitle-studio ai-subtitle-studio
-
Access: Open
http://localhost:7860in your browser.
The core logic is modularized within the src/ai_subtitle_generator package:
- Class:
SubtitleGenerator - Method:
generate_subtitles - Logic:
- Extracts audio from the input video using
ffmpeg. - Slices audio into manageable segments (default 300s) to avoid memory issues.
- Loads the Whisper model (either
openai-whisperorfaster-whisper). - Iterates through chunks, transcribing speech to text with timestamps.
- Merges segment results into a unified WebVTT format.
- Extracts audio from the input video using
- Class:
SubtitleTranslator - Method:
translate_text/refine_subtitle - Logic:
- Uses
Helsinki-NLPmodels via Hugging Facetransformerspipelines for efficient translation between specific language pairs (e.g., En-Zh). - Supports LLM Refinement using
Qwen/Qwen3-4B: It constructs prompts tailored for subtitle correction (removing ASR errors, fixing grammar) and feeds the text to the local LLM for polish.
- Uses
-
Class:
VideoDownloader -
Logic:
- Wraps
yt-dlpto fetch videos from YouTube or other supported platforms. - Handles caching locally to prevent re-downloading the same content.
- Returns a distinct local filepath for processing.
- Wraps
-
Optimizes memory usage by loading models only when needed and using thread locks (
Lock) for model access to prevent race conditions during concurrent requests.
The backend exposes a RESTful API for integration with other services.
- Endpoint:
POST /fetch - Body:
{ "url": "https://youtube.com/watch?v=..." } - Response:
{ "video_id": "uuid", "message": "Download started" }
- Endpoint:
GET /fetch/status?video_id=<video_id> - Response:
{ "status": "Ready", "path": "./save/video.mp4", "error": null }
- Endpoint:
POST /upload - Form-Data:
file(Binary file) - Response:
{ "video_path": "./save/filename.mp4" }
- Endpoint:
POST /extract_async - Body:
{ "video_path": "./save/video.mp4", "model": "medium", // tiny, base, small, medium, large "language": "en", // optional, auto-detect if null "use_faster": true // use faster-whisper backend } - Response:
{ "job_id": "uuid" }
- Endpoint:
GET /extract/status?job_id=<job_id> - Response:
{ "state": "running", // queued, running, done, error "percent": 45, "message": "Transcribing...", "vtt_content": "WEBVTT..." // Included when state is 'done' }
- Endpoint:
POST /translate - Body:
{ "vtt_content": "WEBVTT...", "source_lang": "en", "target_lang": "zh", "video_path": "./save/video.mp4" } - Response:
{ "translated_vtt_content": "WEBVTT..." }
- Endpoint:
POST /api/translate - Body:
{ "text": "Hello World", "source_lang": "en", "target_lang": "zh", "use_qwen": false } - Response:
{ "translated_text": "你好世界" }
GET /models: List available Whisper models.GET /translation_pairs: List supported translation language pairs.