Skip to content

Commit 8f6a847

Browse files
committed
refactor(whisper): modularize crate into folder-based modules
Refactor keyless-whisper crate to follow modern Rust module organization: - Split lib.rs into focused modules: config, device, transcriber, whisper - Modularize decode.rs into decode/ folder with submodules: constants, fallback, helpers, language, result, temperature - Modularize model.rs into model/ folder with submodules: files, loader, mel_filters, types - Modularize whisper.rs into whisper/ folder with submodules: construct, inference_thread, worker_thread, trait_impl, types - Move mel filter generation from preprocessing.rs to model/mel_filters.rs (model-specific logic) - Update README.md to document new module structure
1 parent 0d13597 commit 8f6a847

24 files changed

Lines changed: 1875 additions & 1699 deletions

keyless-whisper/README.md

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -190,11 +190,30 @@ Unit tests cover preprocessing and token filtering. Full model inference is exer
190190

191191
### Module structure
192192

193-
- `src/lib.rs`: threading, public API, event channels, rubato resampling
194-
- `src/model.rs`: model/tokenizer download and load
195-
- `src/preprocessing.rs`: mel filter generation and PCM→mel
196-
- `src/decode.rs`: token generation/decoding
197-
- `src/inference.rs`: end‑to‑end inference pipeline
193+
- `src/lib.rs`: Public API re-exports and module declarations
194+
- `src/config.rs`: Configuration types (`WhisperConfig`, `WhisperLoadPhase`, `PhaseState`)
195+
- `src/device.rs`: Device selection and caching (Metal > CUDA > CPU)
196+
- `src/transcriber.rs`: `RealtimeTranscriber` trait definition
197+
- `src/whisper.rs`: Main `Whisper` transcriber implementation
198+
- `whisper/construct.rs`: Construction and initialization
199+
- `whisper/inference_thread.rs`: Inference thread (runs Whisper model)
200+
- `whisper/worker_thread.rs`: Worker thread (resampling, accumulation, partial previews)
201+
- `whisper/trait_impl.rs`: Trait implementations (`RealtimeTranscriber`, `Drop`)
202+
- `whisper/types.rs`: Type definitions (`Whisper`, `WhisperCmd`, `InferReq`)
203+
- `src/model.rs`: Model loading and management
204+
- `model/loader.rs`: Model loading with progress callbacks
205+
- `model/mel_filters.rs`: Mel filter bank generation
206+
- `model/files.rs`: File detection and location helpers
207+
- `model/types.rs`: Model type definitions (`Model`, `WhisperModel`, `WhisperTokens`)
208+
- `src/decode.rs`: Token generation and text decoding
209+
- `decode/fallback.rs`: Temperature fallback decoding
210+
- `decode/language.rs`: Language detection from audio features
211+
- `decode/temperature.rs`: Single-temperature decoding
212+
- `decode/helpers.rs`: Helper functions (token decoding, repetition detection)
213+
- `decode/constants.rs`: Decoding constants (thresholds, temperatures)
214+
- `decode/result.rs`: Decoding result type with quality metrics
215+
- `src/preprocessing.rs`: PCM→mel spectrogram conversion (uses pre-generated mel filters)
216+
- `src/inference.rs`: End‑to‑end inference pipeline
198217

199218
### Platform support (macOS, Windows, Linux)
200219

keyless-whisper/src/config.rs

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
use std::path::PathBuf;
2+
3+
/// Loading phase identifiers for Whisper model initialization.
4+
#[derive(Clone, Copy, Debug)]
5+
pub enum WhisperLoadPhase {
6+
/// Resolving model source (local cache path or remote HF ID).
7+
ModelSource,
8+
/// Detecting model format (safetensors vs. gguf quantized).
9+
DetectFormat,
10+
/// Reading model configuration (config.json).
11+
ReadConfig,
12+
/// Parsing model configuration JSON.
13+
ParseConfig,
14+
/// Mapping model weights into memory (mmap or gguf loader).
15+
MapWeights,
16+
/// Constructing model layers/graph.
17+
ConstructModel,
18+
/// Loading tokenizer JSON.
19+
LoadTokenizer,
20+
/// Resolving special token IDs (SOT/EOT/etc.).
21+
ResolveTokens,
22+
/// Building mel filter bank.
23+
BuildMelFilters,
24+
}
25+
26+
impl WhisperLoadPhase {
27+
/// Return a user-facing label for this phase.
28+
pub fn as_label(&self) -> &'static str {
29+
match self {
30+
WhisperLoadPhase::ModelSource => "locating model",
31+
WhisperLoadPhase::DetectFormat => "detecting model format",
32+
WhisperLoadPhase::ReadConfig => "reading config",
33+
WhisperLoadPhase::ParseConfig => "parsing config",
34+
WhisperLoadPhase::MapWeights => "mapping weights",
35+
WhisperLoadPhase::ConstructModel => "constructing model",
36+
WhisperLoadPhase::LoadTokenizer => "loading tokenizer",
37+
WhisperLoadPhase::ResolveTokens => "resolving tokens",
38+
WhisperLoadPhase::BuildMelFilters => "building mel filters",
39+
}
40+
}
41+
}
42+
43+
/// Loading phase state (begin/end).
44+
#[derive(Clone, Copy, Debug)]
45+
pub enum PhaseState {
46+
/// Phase begins.
47+
Begin,
48+
/// Phase ends successfully.
49+
End,
50+
}
51+
52+
/// Configuration for the Whisper transcriber.
53+
///
54+
/// Specifies the model source, language hint, and source audio sample rate.
55+
/// The transcriber will handle model downloading and resampling automatically.
56+
#[derive(Clone, Debug)]
57+
pub struct WhisperConfig {
58+
/// Model source: either a Hugging Face model ID or a local directory path.
59+
///
60+
/// **Hugging Face model IDs** (auto-downloads if not cached):
61+
/// - "openai/whisper-tiny" (~75 MB, fastest, multilingual, supports auto-detection)
62+
/// - "openai/whisper-base" (~142 MB, balanced, multilingual, supports auto-detection)
63+
/// - "openai/whisper-small" (~466 MB, good accuracy, multilingual)
64+
/// - "openai/whisper-tiny.en" (~75 MB, fastest, English-only)
65+
/// - "openai/whisper-base.en" (~142 MB, balanced, English-only)
66+
/// - ...
67+
///
68+
/// **Local directory paths** (must contain config.json, model.safetensors, tokenizer.json):
69+
/// - Example: "/Users/name/.cache/keyless/models/whisper-tiny"
70+
///
71+
/// If not specified or if the path doesn't exist, defaults to "openai/whisper-tiny".
72+
pub model_path: PathBuf,
73+
74+
/// Language code for transcription (ISO 639-1, e.g., "en", "es", "fr").
75+
/// Providing a hint significantly improves accuracy vs. auto-detection.
76+
/// `None` = let Whisper auto-detect (slower and less accurate).
77+
pub language: Option<String>,
78+
79+
/// Sample rate of the incoming audio (e.g., 48000, 44100, 16000).
80+
/// The transcriber will resample to 16 kHz mono if necessary.
81+
pub source_sample_hz: u32,
82+
}

0 commit comments

Comments
 (0)