Context
RTranslator does real-time on-device translation, which requires fast speech recognition. FunASR (16.5K stars, MIT) provides models that are significantly faster than Whisper, especially on CPU — which matters for mobile devices.
Key advantages for mobile/on-device use
|
Whisper |
FunASR SenseVoice |
FunASR Paraformer |
| CPU speed |
Slow (~1x realtime) |
17x realtime |
15x realtime |
| Model size |
1.5GB (large) |
234MB |
220MB |
| Languages |
57 |
50+ |
zh/en |
| Architecture |
Autoregressive |
Non-autoregressive |
Non-autoregressive |
| Latency per 5s |
~5s on CPU |
<300ms on CPU |
<400ms on CPU |
The non-autoregressive architecture is the key difference — it processes the entire audio segment in a single forward pass, making it inherently faster and more suitable for real-time applications on resource-constrained devices.
Model options for mobile
- SenseVoice-Small (234MB): 50+ languages, best for general multilingual use
- Paraformer-zh (220MB): Best Chinese + English accuracy
Both models have ONNX exports available for mobile deployment:
- ONNX Runtime Mobile compatible
- funasr-onnx package for Python
- C++ runtime available
Resources
Happy to help with integration details if interested.
Context
RTranslator does real-time on-device translation, which requires fast speech recognition. FunASR (16.5K stars, MIT) provides models that are significantly faster than Whisper, especially on CPU — which matters for mobile devices.
Key advantages for mobile/on-device use
The non-autoregressive architecture is the key difference — it processes the entire audio segment in a single forward pass, making it inherently faster and more suitable for real-time applications on resource-constrained devices.
Model options for mobile
Both models have ONNX exports available for mobile deployment:
Resources
Happy to help with integration details if interested.