Hello, I encountered a performance inconsistency while testing the [large-v3](https://huggingface.co/openai/whisper-large-v3) model for speech transcription. Specifically, Whisper.cpp in CPU mode is noticeably slower than the Python implementation using CPU, which is unexpected given that C++ implementations are typically more efficient.
I would like to clarify whether this is due to technical limitations, implementation specifics, or configuration issues.
Test Setup & Conditions
- Test audio: Multiple
.mp3 files (10 rounds per test)
- Model:
large-v3 (same model used across implementations)
- Language: Chinese (
-l zh)
- Model loading time excluded from runtime measurement
- Each mode logs audio duration, processing time, RTF, and memory usage
- Whisper.cpp was invoked via Python using
subprocess.Popen, to programmatically measure execution time and capture stdout/stderr
- Same machine and environment were used for all tests (no container/OS-level changes)
Summary of Results (CPU mode)
| Mode |
Avg RTF (excluding load time) |
Notes |
| Whisper.cpp (CPU) |
Significantly higher |
with --no-gpu |
| Whisper (Python) |
Noticeably faster |
using torch + whisper |
Use python call whisper c++ cli:
whisper_cli_path = "/app/whisper.cpp/build/bin/whisper-cli"
cmd = [
whisper_cli_path,
"-f", audio_file,
"-m", "/app/whisper.cpp/models/ggml-large-v3.bin",
"-l", "zh",
"--no-gpu"
]
Use python whisper:
import whisper
model = whisper.load_model("large-v3", device="cpu")
result = model.transcribe(str(audio_path))
Suggested Labels
- performance
- question
- help wanted
Hello, I encountered a performance inconsistency while testing the
[large-v3](https://huggingface.co/openai/whisper-large-v3)model for speech transcription. Specifically, Whisper.cpp in CPU mode is noticeably slower than the Python implementation using CPU, which is unexpected given that C++ implementations are typically more efficient.I would like to clarify whether this is due to technical limitations, implementation specifics, or configuration issues.
Test Setup & Conditions
.mp3files (10 rounds per test)large-v3(same model used across implementations)-l zh)subprocess.Popen, to programmatically measure execution time and capture stdout/stderrSummary of Results (CPU mode)
--no-gputorch+whisperUse python call whisper c++ cli:
Use python whisper:
Suggested Labels