AI Subtitle Studio

A professional-grade, local, web-based tool for generating, translating, and editing subtitles for videos using state-of-the-art AI models.

Introduction

AI Subtitle Studio is a standalone tool designed to streamline the subtitle creation process for content creators. It leverages extensive AI capabilities to:

Transcribe audio from videos using OpenAI's Whisper (or Faster-Whisper) models.
Translate subtitles into multiple languages using Helsinki-NLP models.
Refine and polish text using Qwen large language models (optional).
Edit subtitles in a modern, Google-style web interface with real-time video seeking.

All processing happens locally on your machine, ensuring privacy and zero data leakage.

Tech Stack

This project is built using the following technologies:

Backend / Web Framework: Python, Flask, Flask-SocketIO
AI / ML:
- ASR: openai-whisper, faster-whisper
- Translation: transformers (Helsinki-NLP/OPUS-MT)
- LLM Refinement: transformers (Qwen/Qwen3-4B)
- Deep Learning Framework: PyTorch (GPU accelerated)
Utilities: ffmpeg (Audio processing), yt-dlp (Video downloading)
Frontend: HTML5, Tailwind CSS, Google Material Design (Custom CSS), JavaScript (Vanilla)

Installation

Prerequisites

Python 3.10+
FFmpeg installed and added to system PATH.
(Optional) NVIDIA GPU with drivers installed for acceleration.

Manual Setup (Conda/Pip)

Clone the repository:

git clone https://github.com/your-repo/AI-Subtitle-Studio.git
cd AI-Subtitle-Studio

Create a Conda environment (Recommended):

conda create -n ai-subtitle python=3.10
conda activate ai-subtitle

Install Dependencies:
```
pip install -r requirements.txt
```
Note: Ensure you install the GPU version of PyTorch if you have a compatible NVIDIA card: PyTorch Get Started

Usage

Running Locally

Start the Server:

# chmod +x run.sh
./run.sh

Or manually:

python src/ai_subtitle_generator/web/server.py

Access the Interface: Open your browser and navigate to: http://localhost:7860
Generate Subtitles:
- Paste a YouTube URL or upload a local video file.
- Select your Model (e.g., medium, large-v2).
- Choose Display Mode (Original/Translated/Bilingual).
- Click "Generate Subtitles".

Docker Support (Optimized)

This application is packaged with a lightweight, GPU-ready Docker image based on python:3.10-slim.

Prerequisites

Docker Engine
NVIDIA Container Toolkit (for GPU acceleration)

Build & Run

Build the Image:
```
docker build -t ai-subtitle-studio .
```
Note: The first build handles large AI dependencies (Torch, CUDA libs) so it may take some time depending on your network.

Run with GPU (Recommended):

docker run --gpus all -d -p 7860:7860 --name subtitle-studio ai-subtitle-studio

Run on CPU (Optional): If you don't have a GPU, you can run it in CPU mode (slower):
```
docker run -d -p 7860:7860 --name subtitle-studio ai-subtitle-studio
```
Access: Open http://localhost:7860 in your browser.

Implementation Details

The core logic is modularized within the src/ai_subtitle_generator package:

1. Audio Transcription (`transcriber.py`)

Class: SubtitleGenerator
Method: generate_subtitles
Logic:
- Extracts audio from the input video using ffmpeg.
- Slices audio into manageable segments (default 300s) to avoid memory issues.
- Loads the Whisper model (either openai-whisper or faster-whisper).
- Iterates through chunks, transcribing speech to text with timestamps.
- Merges segment results into a unified WebVTT format.

2. Machine Translation (`translator.py`)

Class: SubtitleTranslator
Method: translate_text / refine_subtitle
Logic:
- Uses Helsinki-NLP models via Hugging Face transformers pipelines for efficient translation between specific language pairs (e.g., En-Zh).
- Supports LLM Refinement using Qwen/Qwen3-4B: It constructs prompts tailored for subtitle correction (removing ASR errors, fixing grammar) and feeds the text to the local LLM for polish.

3. Video Downloading (`downloader.py`)

Class: VideoDownloader
Logic:
- Wraps yt-dlp to fetch videos from YouTube or other supported platforms.
- Handles caching locally to prevent re-downloading the same content.
- Returns a distinct local filepath for processing.
Optimizes memory usage by loading models only when needed and using thread locks (Lock) for model access to prevent race conditions during concurrent requests.

API Documentation

The backend exposes a RESTful API for integration with other services.

Video Management

1. Fetch Video (URL)

Endpoint: POST /fetch

Body:

{ "url": "https://youtube.com/watch?v=..." }

Response:

{ "video_id": "uuid", "message": "Download started" }

2. Check Fetch Status

Endpoint: GET /fetch/status?video_id=<video_id>

Response:

{ "status": "Ready", "path": "./save/video.mp4", "error": null }

3. Upload Video

Endpoint: POST /upload
Form-Data: file (Binary file)
Response:
```
{ "video_path": "./save/filename.mp4" }
```

Subtitle Extraction

1. Start Extraction (Async)

Endpoint: POST /extract_async

Body:

{
  "video_path": "./save/video.mp4",
  "model": "medium",      // tiny, base, small, medium, large
  "language": "en",       // optional, auto-detect if null
  "use_faster": true      // use faster-whisper backend
}

Response:
```
{ "job_id": "uuid" }
```

2. Check Extraction Status

Endpoint: GET /extract/status?job_id=<job_id>

Response:

{
  "state": "running",     // queued, running, done, error
  "percent": 45,
  "message": "Transcribing...",
  "vtt_content": "WEBVTT..." // Included when state is 'done'
}

Translation Services

1. Translate Subtitles (VTT)

Endpoint: POST /translate

Body:

{
  "vtt_content": "WEBVTT...",
  "source_lang": "en",
  "target_lang": "zh",
  "video_path": "./save/video.mp4"
}

Response:

{ "translated_vtt_content": "WEBVTT..." }

2. Translate Text (Raw)

Endpoint: POST /api/translate

Body:

{
  "text": "Hello World",
  "source_lang": "en",
  "target_lang": "zh",
  "use_qwen": false
}

Response:
```
{ "translated_text": "你好世界" }
```

System

GET /models: List available Whisper models.
GET /translation_pairs: List supported translation language pairs.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/ai_subtitle_generator		src/ai_subtitle_generator
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Subtitle Studio

Introduction

Tech Stack

Installation

Prerequisites

Manual Setup (Conda/Pip)

Usage

Running Locally

Docker Support (Optimized)

Prerequisites

Build & Run

Implementation Details

1. Audio Transcription (`transcriber.py`)

2. Machine Translation (`translator.py`)

3. Video Downloading (`downloader.py`)

API Documentation

Video Management

1. Fetch Video (URL)

2. Check Fetch Status

3. Upload Video

Subtitle Extraction

1. Start Extraction (Async)

2. Check Extraction Status

Translation Services

1. Translate Subtitles (VTT)

2. Translate Text (Raw)

System

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Subtitle Studio

Introduction

Tech Stack

Installation

Prerequisites

Manual Setup (Conda/Pip)

Usage

Running Locally

Docker Support (Optimized)

Prerequisites

Build & Run

Implementation Details

1. Audio Transcription (transcriber.py)

2. Machine Translation (translator.py)

3. Video Downloading (downloader.py)

API Documentation

Video Management

1. Fetch Video (URL)

2. Check Fetch Status

3. Upload Video

Subtitle Extraction

1. Start Extraction (Async)

2. Check Extraction Status

Translation Services

1. Translate Subtitles (VTT)

2. Translate Text (Raw)

System

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Audio Transcription (`transcriber.py`)

2. Machine Translation (`translator.py`)

3. Video Downloading (`downloader.py`)

Packages